Creating an MLP in C++

The here presented neural net is implemented in a way that makes understanding its structure easy. This comes at the cost of actual runtime performance.

The goal of this model will be to accuratly predict the Iris dataset.

Most of the C++ code shown here does not abide by the rule of 3 (or 5). The code shown is simplified and should not be used in production. Find the full code here: Github.

The MLP we want to build should be able to accuratly predict the Iris dataset. As this task is not very complicated the required MLP does not have to be very complex. It will consist of only one hidden layer of size 30 and with a ReLU activation function. The output layer features the three possible output classes of the dataset as one-hot encoding. Therefore it has 3 outputs with a SoftMax as activation function. Finally the loss is calculated using CrossEntropy loss.

Structure of an Basic MLP

Classes we need to implement

We need to implement the following classes:

  1. Dense Layer
  2. CrossEntropy Loss
  3. SGD Optimizer
  4. Relu Activation
  5. Softmax Activation

As data structure to hold the memory that needs to be allocated we will use the Templated Tensor class.

Fully connected / Dense layer

Given:

  1. Input_size N
  2. Output_size M

Learnable parameters:

  1. Weights wN×M
  2. Bias bM

Data to save:

  1. previous input viN

Forward

Given input viN we get output voM using f(vi).

f(vi)=b+viw=vo
python
def forward(self, input_tensor):
    self.previous_input=input_tensor.copy()
    output = self.bias + np.matmul(input_tensor,self.weights)
    return output

This can be rewritten by appending b to w and creating wb(N+1)×M.

f(vi)=[vi1][wb]=v^iwb=vo
python
def forward(self, input_tensor):
    #add ones to the input in order to add the bias 
    input_tensor=np.c_[input_tensor,np.ones(input_tensor.shape[0])]
    self.previous_input=input_tensor.copy()
    #calculate the output. Here weights already include the bias
    output = np.matmul(input_tensor,self.weights)
    return output

Backward

Given error rM:

b(ri)=riwT

Then given an optimizer O:

Δw=viTrΔb=iNriwb=O(w,b,Δw,Δb)

python
def backward(self, error_tensor):
	output = np.matmul(error_tensor, self.weights.T)
	if self.optimizer is not None:
		self._gradient_weights = np.matmul(self.previous_input.T, error_tensor)
		self._gradient_bias = error_tensor.sum(axis=0)
		self.weights, self.bias = self.optimizer(self.weights, self.bias, self._gradient_weights, self._gradient_bias)
	return output

Or using wb:

python
def backward(self, error_tensor):
	output = np.matmul(error_tensor, self.weights.T)
	if self.optimizer is not None:
		self._gradient_weights = np.matmul(self.previous_input.T, error_tensor)
		self.weights = self.optimizer.calculate_update(self.weights, self._gradient_weights)
	return output

CrossEntropy Loss

Data to save:

  • previous input x

Forward

l is the label tensor consisting of one-hot encoded labels

f(x,l)=log(xl=1+ϵ)
python
def forward(self, prediction_tensor, label_tensor):
	self.previous_input=prediction_tensor.copy()
	# sum of each vector in prediction_tensor == 1
	# take the log of every prediction with label == 1 and sum them. eps for log(0) prevention
	loss = np.sum( - np.log( prediction_tensor[label_tensor==1] + np.finfo(float).eps) )
	return loss

Backward

l is the label tensor consisting of one-hot encoded labels

f(l)=lx+ϵ
python
def backward(self, label_tensor):
	return -label_tensor / (self.previous_input + np.finfo(float).eps)

SGD Optimizer

Given:

  1. learning_rate μ

Learnable parameters: None

Data to save: None

Update

wi+1=wiμΔwi
python
def update(self, weight_tensor, gradient_tensor):
	updated_weights = weight_tensor - self.learning_rate * gradient_tensor
	return updated_weights

The same function can be used for the bias

bi+1=biμΔbi

or both combined

wb,i+1=wb,iμΔwb,i

ReLu Activation

Given:

  1. Input tensor x

Learnable parameters: None

Data to save:

  1. Previous input viN

Forward

f(x)={xx00
python
def forward(self, input_tensor):
	self.previous_input=input_tensor.copy()
	#set every negative value to 0
	input_tensor[input_tensor<0]=0
	return input_tensor

Backward

Given error y and previous input x:

b(y)={yx00
python
def backward(self,error_tensor):
	#set every value where the input was negative to 0
	error_tensor[self.previous_input<0]=0
	return error_tensor

SoftMax Activation

Given:

  1. Input tensor x
  2. Learning rate μ

Learnable parameters: None

Data to save: None

Forward

f(x)=exp(x)(exp(x))
python
def forward(self, input_tensor): 
	# shift x to increase numerical stability x_new = x - max(x)   
	input_tensor = input_tensor - np.max(input_tensor)
	# calc. exp(x) here. Otherwise we would have to do it twice
	input_tensor = np.exp(input_tensor)
	# np.sum: axis=1 -> we want to get sum per input
	# * np.ones will make the sum of input to vec of size input
	input_tensor = input_tensor / (np.sum(input_tensor, axis=1, keepdims=True) * np.ones(input_tensor.shape))
	self.previous_output = input_tensor.copy()
	return input_tensor

Backward

b(y)=y(1y)
python
def backward(self, error_tensor):
	# calc inner sum
	row_sum = np.sum(error_tensor * self.previous_output, axis=1, keepdims=True)
	error_tensor -= row_sum
	error_tensor *= self.previous_output
	return error_tensor

Creating the Neural Net

We want a simple neural net consisting of 2 dense layers.

If we now define a Dataloader it will be easier to Handle our training data.

And finally the main function:

Training this model results in a train loss of 0.0077 and a validation loss of 0.0063 after 1000 iterations. All 45 validation data values have been predicted successfully.