Simple-to-use neural network framework written in C++ that utilizes CUDA cores on NVIDIA cards. In this example, the MNIST classification problem is solved using a multi-layer neural network with hyperbolic tangent as the activation function, mean squared error as loss function and stochastic gradient descent method as optimizer.
CPU vs GPU | TILED vs NAIVE |
---|---|
![]() |
![]() |
Deep neural networks consist of:
- input layer
-
$n$ hidden layers - output layer
They can be used in classification or regression problems.
Algorithm can be summarized as follows:
- Initalize weights and biases (using pseudo-random generator with uniform distribution between
$-0.5$ and$0.5$ ), - Prepare input data and divide them into batches,
- Perform the forward step,
- Calculate error,
- Perform the backward step,
- Update weights and biases,
- Return to step
$3$
To minimalize loss function, stochastic gradient descent method is implemented.
Loss function is called Mean Squared Error:
It's derivative is:
Formula for updated weights
-
$J(\theta)$ is cost function,$MSE$ , that algorithm is minimizing, -
$\eta$ is learning rate, -
$x$ and$y$ are input data.
Linear layer implements calculation:
Hyperbolic tangent is used as activation function:
CUDA cores are utilized to speed up the learning process. Training a neural network can be easily parallelized with the help of a GPU with thousands of simple cores, as it primarily involves a series of matrix multiplications.
Utilizing CUDA cores requires preparing kernels for the GPU to execute. Kernel implementing
__global__ void tanhForward(float* input_data, float* output_data, int size) {
int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < size)
output_data[index] = tanh(input_data[index]);
}
Using GPU can reduce training time relative to CPU by up to several times.
Moreover, matrix multiplication was implemented using tiled matrix multiplication algorithm. It greatly improves speed of multiplication, expecially for larger matrices:
...
Network net;
net.addLayer(new Linear(28 * 28, 30, BATCH_SIZE));
net.addLayer(new Tanh(30, BATCH_SIZE));
net.addLayer(new Linear(30, 10, BATCH_SIZE));
net.addLayer(new Tanh(10, BATCH_SIZE));
net.fit(mnist.images, mnist.labels, LEARNING_RATE, EPOCHS);
...
...
Epoch: 9, Loss: 0.027987
Time = 132[ms]
Epoch: 10, Loss: 0.027835
Time = 129[ms]
Average time: 138.100006
Start reading images, #10000 (28,28)
Done reading images
Evaluation Results:
Total Samples: 9984
Correct predictions: 8864
Accuracy: 88.7821%