Creating an MLP in C++
The here presented neural net is implemented in a way that makes understanding its structure easy. This comes at the cost of actual runtime performance.
The goal of this model will be to accuratly predict the Iris dataset.
Most of the C++ code shown here does not abide by the rule of 3 (or 5). The code shown is simplified and should not be used in production. Find the full code here: Github.
The MLP we want to build should be able to accuratly predict the Iris dataset. As this task is not very complicated the required MLP does not have to be very complex. It will consist of only one hidden layer of size 30 and with a ReLU activation function. The output layer features the three possible output classes of the dataset as one-hot encoding. Therefore it has 3 outputs with a SoftMax as activation function. Finally the loss is calculated using CrossEntropy loss.
Classes we need to implement
We need to implement the following classes:
- Dense Layer
- CrossEntropy Loss
- SGD Optimizer
- Relu Activation
- Softmax Activation
As data structure to hold the memory that needs to be allocated we will use the Templated Tensor class.
Fully connected / Dense layer
Given:
- Input_size N
- Output_size M
Learnable parameters:
- Weights
- Bias
Data to save:
- previous input
Forward
Given input we get output using .
def forward(self, input_tensor):
self.previous_input=input_tensor.copy()
output = self.bias + np.matmul(input_tensor,self.weights)
return outputThis can be rewritten by appending to and creating .
def forward(self, input_tensor):
#add ones to the input in order to add the bias
input_tensor=np.c_[input_tensor,np.ones(input_tensor.shape[0])]
self.previous_input=input_tensor.copy()
#calculate the output. Here weights already include the bias
output = np.matmul(input_tensor,self.weights)
return outputBackward
Given error :
Then given an optimizer :
def backward(self, error_tensor):
output = np.matmul(error_tensor, self.weights.T)
if self.optimizer is not None:
self._gradient_weights = np.matmul(self.previous_input.T, error_tensor)
self._gradient_bias = error_tensor.sum(axis=0)
self.weights, self.bias = self.optimizer(self.weights, self.bias, self._gradient_weights, self._gradient_bias)
return outputOr using :
def backward(self, error_tensor):
output = np.matmul(error_tensor, self.weights.T)
if self.optimizer is not None:
self._gradient_weights = np.matmul(self.previous_input.T, error_tensor)
self.weights = self.optimizer.calculate_update(self.weights, self._gradient_weights)
return outputCrossEntropy Loss
Data to save:
- previous input
Forward
is the label tensor consisting of one-hot encoded labels
def forward(self, prediction_tensor, label_tensor):
self.previous_input=prediction_tensor.copy()
# sum of each vector in prediction_tensor == 1
# take the log of every prediction with label == 1 and sum them. eps for log(0) prevention
loss = np.sum( - np.log( prediction_tensor[label_tensor==1] + np.finfo(float).eps) )
return lossBackward
is the label tensor consisting of one-hot encoded labels
def backward(self, label_tensor):
return -label_tensor / (self.previous_input + np.finfo(float).eps)SGD Optimizer
Given:
- learning_rate
Learnable parameters: None
Data to save: None
Update
def update(self, weight_tensor, gradient_tensor):
updated_weights = weight_tensor - self.learning_rate * gradient_tensor
return updated_weightsThe same function can be used for the bias
or both combined
ReLu Activation
Given:
- Input tensor
Learnable parameters: None
Data to save:
- Previous input
Forward
def forward(self, input_tensor):
self.previous_input=input_tensor.copy()
#set every negative value to 0
input_tensor[input_tensor<0]=0
return input_tensorBackward
Given error and previous input :
def backward(self,error_tensor):
#set every value where the input was negative to 0
error_tensor[self.previous_input<0]=0
return error_tensorSoftMax Activation
Given:
- Input tensor
- Learning rate
Learnable parameters: None
Data to save: None
Forward
def forward(self, input_tensor):
# shift x to increase numerical stability x_new = x - max(x)
input_tensor = input_tensor - np.max(input_tensor)
# calc. exp(x) here. Otherwise we would have to do it twice
input_tensor = np.exp(input_tensor)
# np.sum: axis=1 -> we want to get sum per input
# * np.ones will make the sum of input to vec of size input
input_tensor = input_tensor / (np.sum(input_tensor, axis=1, keepdims=True) * np.ones(input_tensor.shape))
self.previous_output = input_tensor.copy()
return input_tensorBackward
def backward(self, error_tensor):
# calc inner sum
row_sum = np.sum(error_tensor * self.previous_output, axis=1, keepdims=True)
error_tensor -= row_sum
error_tensor *= self.previous_output
return error_tensorCreating the Neural Net
We want a simple neural net consisting of 2 dense layers.
If we now define a Dataloader it will be easier to Handle our training data.
And finally the main function:
Training this model results in a train loss of 0.0077 and a validation loss of 0.0063 after 1000 iterations. All 45 validation data values have been predicted successfully.