Paper title:

TensorDash Exploiting Sparsity to Accelerate Deep Neural Network Training

Publication:

MICRO‘2020

Problem to solve:

The sparsity pattern during training is always dynamic.
During training, each tensor participates in two convolutions or operations.
Activations can be discarded after each layer during inference which is not the case during training where they are saved to be used by the backward pass.
Inference accelerators use narrow fixed-point arithmetic whereas training today is done predominantly using floating-point.
Training starts with randomly initialized values that keep evolving throughout the training process.

Major contribution:

TensorDash exploits naturally occurring sparsity during training which appears predominantly in the activations and the gradients. Sparsity is exploited dynamically and completely in hardware using a low-overhead hardware scheduler to advance MAC operations in time (earlier cycle) and space (another MAC unit) so that overall computation finishes earlier.
When incorporated into an accelerator based on Tensorcore processing units, TensorDash improves performance by 1.95× and energy efficiency by 1.5× (1.8× for compute units) on average over a set of deep learning models covering a wide range of applications.

Provide feedback

Saved searches