autograph v0.1.1
Profiling
Currently requires nightly and feature "profile". Set the AUTOGRAPH_PROFILE environmental variable to 1 or True to produce a table of statistics for compute passes that are executed.
AUTOGRAPH_PROFILE=1 cargo +nightly run --feature profile
Rust GEMM
Improved performance on Neural Network MNIST example (Lenet5) by 5x.
- Implemented in Rust for u32, i32, f32
- bf16 not yet implemented
- Unrolled loops with crunchy
- Work per thread (1x1, 2x2, 4x4) micro tiles
- SplitK variant (256) for small m or n and large k
- Atomically accumulates with multiple work groups
Tensor
- Added Tensor::ones method.
Neural Networks
- Allowed SGD learning_rate = 1.0
- MeanPool
- Fixed correctness issues
- Cross Entropy Loss
- Sum
- Test accuracy improved to ~99% on Neural Network MNIST example (Lenet5)
Examples
- Added shuffling of training batches
Benchmark
Added Neural Network Benchmark to compare performance with other libraries. Training is now ~2.7x slower than tch (NVIDIA GeForce GTX 1060 with Max-Q Design) with similar test accuracy.
+-----------+------------+---------------+-----------------------+----------------------------------+
| Library | Best Epoch | Best Accuracy | Time To Best Accuracy | Mean Epoch Time to Best Accuracy |
+===========+============+===============+=======================+==================================+
| autograph | 69 | 99.04% | 127.38s | 1.85s |
+-----------+------------+---------------+-----------------------+----------------------------------+
| tch | 32 | 99.12% | 22.03s | 688.31ms |
+-----------+------------+---------------+-----------------------+----------------------------------+