Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high. A key intuition that we develop is that the ratio of the operable volume around a sparse vector divided by the volume of the representational space decreases exponentially with dimensionality. We then analyze computationally efficient sparse networks containing both sparse weights and activations. Simulations on MNIST and the Google Speech Command Dataset show that such networks demonstrate significantly improved robustness and stability compared to dense networks, while maintaining competitive accuracy. We discuss the potential benefits of sparsity on accuracy, noise robustness, hyperparameter tuning, learning speed, computational efficiency, and power requirements.
This repository contains versioned code used to replicate the experiments in the paper. For the latest pytorch code for using the code in your own projects check the nupic.torch repository.
Below are instructions for reproducing all the charts and tables presented in the paper. There might be small differences due to randomness.
All the scripts in this directory were implemented using python 2.7.
Once python version 2.7 is installed and configured in your system, use the following command to install all required python libraries and dependencies:
python setup.py --user develop
And the following script to download Google Speech Commands Dataset:
cd projects/speech_commands/data
./download_speech_commands.sh
Alternatively, you may use docker to run the experiments in a container:
docker build -t htmpaper .
docker run -it htmpaper /bin/bash
All experiments have a common script called run_experiment.py
located in the root of each project used to train and test the models. Once the models are trained, you may use the other script in the specific project folder to generate the figures and data used in the paper. We recommend using GPU acceleration. Otherwise some of the training, particularly for Google Speech Commands, will take significantly longer.
Use the following command to train all the models from the project folder:
python run_experiment.py -c experiments_paper.cfg
Use python run_experiment.py -h
for more options.
The probability of matches to random binary vectors (with a active bits) as a function of dimensionality, for various levels of sparsity. The probability decreases exponentially with n. Black circles denote the observed frequency of a match (based on a large number of trials).
cd projects
python plot_numerical_results.py
Left: The probability of matches to random scalar vectors (with a non-zero components) as a function of dimensionality, for various levels of sparsity. The probability of false matches decreases exponentially with n. Note that the probability for a dense vector, a = n/2 stays relatively high, and does not decrease with dimensionality. Right: The impact of scale on vector matches with a fixed n = 1000. The larger the scaling discrepancy, the higher the probability of a false match
cd projects
python scalar_sdr.py
Network | Test score | Noise score |
---|---|---|
Dense CNN-1 | 99.14 ± 0.03 | 74,569 ± 3,200 |
Dense CNN-2 | 99.31 ± 0.06 | 97,040 ± 2,853 |
Sparse CNN-1 | 98.41 ± 0.08 | 100,306 ± 1,735 |
Sparse CNN-2 | 99.09 ± 0.05 | 103,764 ± 1,125 |
Dense CNN-2 SP3 | 99.13 ± 0.07 | 100,318 ± 2,762 |
Sparse CNN-2 D3 | 98.89 ± 0.13 | 102,328 ± 1,720 |
Sparse CNN-2 W1 | 98.20 ± 0.19 | 100,322 ± 2,082 |
Sparse CNN-2 DSW | 98.92 ± 0.09 | 70,566 ± 2,857 |
We show classification accuracies and total noise scores (the total number of correct classification for all noise levels). Results are averaged over 10 random seeds, ± one standard deviation. CNN-1 and CNN-2 indicate one or two convolutional layers, respectively
cd projects
python test_score_table.py -c mnist/experiments_paper.cfg
Network | Test score | Noise score |
---|---|---|
Dense CNN-2 (DR=0.0) | 96.37 ± 0.37 | 8,730 ± 471 |
Dense CNN-2 (DR=0.5) | 95.69 ± 0.48 | 7,681 ± 368 |
Sparse CNN-2 | 96.65 ± 0.21 | 11,233 ± 1013 |
Super-Sparse CNN-2 | 96.57 ± 0.16 | 10,752 ± 942 |
We show test and noise scores, averaged over 10 random seeds, ± one standard deviation. Dr corresponds to different dropout levels
cd projects
python test_score_table.py -c speech_commands/experiments_paper.cfg
A. Example MNIST images with varying levels of noise. B. Classification accuracy as a function of noise level.
cd projects/mnist
python analyze_noise.py -c experiments_paper.cfg
Network | L1 F | L1 Sparsity | L2 F | L2 Sparsity | L3 N | L3 Sparsity | Wt Sparsity |
---|---|---|---|---|---|---|---|
MNIST | |||||||
denseCNN1 | 30 | 100.0% | 1000 | 100.0% | 100.0% | ||
denseCNN2 | 30 | 100.0% | 30 | 100.0% | 1000 | 100.0% | 100.0% |
sparseCNN1 | 30 | 9.3% | 150 | 33.3% | 30.0% | ||
sparseCNN2 | 32 | 8.7% | 64 | 29.3% | 700 | 14.3% | 30.0% |
denseCNN2SP3 | 30 | 100.0% | 64 | 100.0% | 700 | 14.3% | 30.0% |
sparseCNN2D3 | 32 | 8.7% | 64 | 29.3% | 1000 | 100.0% | 100.0% |
sparseCNN2W1 | 32 | 8.7% | 64 | 29.3% | 700 | 14.3% | 100.0% |
sparseCNN2DSW | 32 | 8.7% | 64 | 29.3% | 1000 | 100.0% | 30.0% |
GSC | |||||||
denseCNN2 | 64 | 100.0% | 64 | 100.0% | 1000 | 100.0% | 100.0% |
sparseCNN2 | 64 | 9.5% | 64 | 12.5% | 1000 | 10.0% | 40.0% |
SuperSparseCNN2 | 64 | 9.5% | 64 | 12.5% | 1500 | 6.7% | 10.0% |
L1F and L2F denote the number of filters at the corresponding CNN layer. L1,2,3 sparsity indicates k/n, the percentage of outputs that were enforced to be non-zero. 100% indicates a special case where we defaulted to traditional ReLU activations. Wt sparsity indicates the percentage of weights that were non-zero. All parameters are available in the source code.
cd projects
python parameters_table.py -c mnist/experiments_paper.cfg
python parameters_table.py -c speech_commands/experiments_paper.cfg
src
: Contains all custom python libraries created for this paperprojects
: Contains all experiments and supporting scripts to plot figures and tables used in the paper
Dockerfile
: Create docker container suitable to run all experimentsrequirements.txt
: Python libraries required to run experimentssetup.py
: Python setup script. Call this file to install the library code
Library sources (src)
src/
├── expsuite.py # Multiprocess Experiments class
└── pytorch
├── audio_transforms.py # Collection of audio transformations
├── benchmark_utils.py # pytorch model benchmark utilities
├── dataset_utils.py # pytorch dataset utils
├── duty_cycle_metrics.py # Compute entropy and other dutycycle metrics
├── functions
│ └── k_winners.py # k-winner activation function
├── image_transforms.py # An image transform that adds noise to random pixels in the image
├── model_utils.py # pytorch utils used to train and test models
├── modules
│ ├── flatten.py # module used to flatten the input retaining batch dimension
│ ├── k_winners.py # k-winner activation modules
│ └── sparse_weights.py # Modules used to enforce weight sparsity
├── resnet_models.py # Modified resnet model from torchvision
├── sparse_net.py # A network with one or more hidden layers, which can be a sequence of k-sparse CNN followed by a sequence of k-sparse linear layer with optional dropout or batch-norm layers in between the layers
├── mnist_sparse_experiment.py # Sparse MNIST experiments
├── sparse_speech_experiment.py # Sparse Google Speech Commands experiments
└── speech_commands_dataset.py # "Google Speech Commands Dataset" as pytorch dataset
Experiments sources (projects)
-
MNIST (mnist)
run_experiment.py
: Main script used to run all experimentsexperiments_paper.cfg
: Complete list of network parameters used in the MNIST experimentsanalyze_noise.py
: Plot noise curves from experiment results
-
Google Speech Commands Dataset (speech_commands)
run_experiment.py
: Main script used to run all experimentsexperiments_paper.cfg
: Complete list of network parameters used in the "Google Speech Commands Dataset" experimentsdata
: Scripts to download and process "Google Speech Commands Dataset"
-
Other
plot_numerical_results.py
: Plot SDR numeric propertiesscalar_sdr.py
: Code used to compute the probability of matching scalar sparse vectorstest_score_table.py
: Prints test and noise scores tableparameters_table.py
: Prints key parameters table
"How Can We Be So Dense? The Benefits of Using Highly Sparse Representations"; arXiv:1903.11257 [cs.LG].