GitHub - yrlu/Teaism: A full-fledged yet minimalistic CUDA-based convolutional neural network library from scratch in C++

A minimalistic CUDA-based convolutional neural network library.

Motivation

Convolutional neural networks (CNNs) are at the core of computer vision applications recently
Mobile/embedded platforms, e.g. quadrotors, demand fast and light-weighted CNN libraries. Modern deep learning libraries heavily depends on third-party libraries and hence are hard to be configured on mobile/embedded platforms (like Nvidia TX1). This effort aims at developing a full-fledged yet minimalistic CNN library that depends only on C++0x and CUDA 8.0 from scratch.

Library	Dependencies
Teaism	C/C++, CUDA
Caffe	C/C++, CUDA, cuDNN, BLAS, Boost, Opencv, etc.
Tensorflow	C/C++, CUDA, cuDNN, Python, Bazel, Numpy, etc.
Torch	C/C++, CUDA, BLAS, LuaJIT, LuaRocks, OpenBLAS, etc.

For educational purposes :)

Features

9 Layers implemented so as to reproduce LeNet, AlexNet, VGG, etc.
- data, conv, fc, pooling, Relu, LRN, dropout, softmax, cross-entropy loss
Model importer for importing trained Caffe models
Forward inference / backpropagation
Switching between CPU and GPU

Directories

basics/: Major header files / base classes, e.g., session.hpp, layer.hpp, tensor.cu, etc.
layers/: All the layer implementations.
tests/: All test cases. It is recommended to browse demo_cifar10.cu, demo_mlp.cu, tests_alexnet.cu and tests_cifar10.cu to learn how to use this library.
initializers/: Parameter initialization for convolutional and fully connected layers.
utils/: Some utility functions.
models/: Scripts for training models in Caffe and importing trained models.

Demos

Training on cifar10

Batchsize = 100, testing accuracy ~45% after training for 2400+ iterations with learning rate = 0.0002.

$ make demo_cifar10_training && ./demo_cifar10_training.o
iteration 2440 accuracy: 46/100 0.460000 
iteration time: 3801.9 ms 
1.620593e+00 


iteration 2441 accuracy: 42/100 0.420000 
iteration time: 3798.6 ms 
1.648575e+00 


iteration 2442 accuracy: 40/100 0.400000 
iteration time: 3813.1 ms 
1.725998e+00 


iteration 2443 accuracy: 38/100 0.380000 
iteration time: 3801.5 ms 
1.663968e+00 


iteration 2444 accuracy: 47/100 0.470000 
iteration time: 3794.4 ms 
1.611726e+00 


iteration 2445 accuracy: 44/100 0.440000 
iteration time: 3824.2 ms 
1.578671e+00 


iteration 2446 accuracy: 47/100 0.470000 
iteration time: 3808.8 ms

Import model and make inferences on Cifar10

$ make demo_cifar10 && ./demo_cifar10.o
Start demo cifar10 on GPU

datasets/cifar10/bmp_imgs/00006.bmp
network finished setup: 617.3 ms 
GPU memory usage: used = 346.250000, free = 7765.375000 MB, total = 8111.625000 MB
Loading weights ...
Loading conv: (5, 5, 3, 32): 
Loading bias: (1, 1, 1, 32): Loading conv: (5, 5, 32, 32): 
Loading bias: (1, 1, 1, 32): Loading conv: (5, 5, 32, 64): 
Loading bias: (1, 1, 1, 64): Loading fc: (1, 1, 64, 1024): 
Loading bias: (1, 1, 1, 64): Loading fc: (1, 1, 10, 64): 
Loading bias: (1, 1, 1, 10): data forward: 0.3 ms 
conv1 forward: 0.3 ms 
pool1 forward: 0.3 ms 
relu1 forward: 0.0 ms 
conv2 forward: 1.3 ms 
pool2 forward: 0.2 ms 
relu2 forward: 0.0 ms 
conv3 forward: 2.3 ms 
pool3 forward: 0.4 ms 
relu3 forward: 0.0 ms 
fc4 forward: 1.7 ms 
fc5 forward: 0.0 ms 
softmax forward: 0.1 ms 

Total forward time: 6.8 ms

Prediction: 
Airplane probability: 0.0000 
Automobile probability: 0.9993 
Bird probability: 0.0000 
Cat probability: 0.0000 
Deer probability: 0.0000 
Dog probability: 0.0000 
Frog probability: 0.0000 
Horse probability: 0.0005 
Ship probability: 0.0000 
Truck probability: 0.0001

Multilayer perceptron

$ make demo_mlp && ./demo_mlp.cu
The example shows counting how many ones in the input: 
{0,0} -> {0,0,1} 
{0,1} -> {0,1,0} 
{1,0} -> {0,1,0} 
{1,1} -> {1,0,0}
Network: input(2) - fc(3) - fc(3) - softmax - cross_entropy_loss
input: 
0,1
0,0
1,0
1,1

ground truth: 
0 1 0
1 0 0
0 1 0
0 0 1

Training (learning rate = 0.1) .. 

-----iteration 5000-------
test input: 
0,0
1,0
1,1
0,1
out activations:
0.978394 0.021566 0.000040 
0.009701 0.878047 0.112252 
0.000000 0.101604 0.898396 
0.009701 0.878047 0.112252

References

Convolution layer backprop slides

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
basics		basics
datasets		datasets
imgs		imgs
initializers		initializers
layers		layers
models		models
perf		perf
tests		tests
tmp/test		tmp/test
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

Features

Directories

Demos

References

About

Releases

Packages

Contributors 2

Languages

License

yrlu/Teaism

Folders and files

Latest commit

History

Repository files navigation

Motivation

Features

Directories

Demos

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages