Name	Name	Last commit message	Last commit date
parent directory ..
figures	figures
projects	projects
src	src
.dockerignore	.dockerignore
Dockerfile	Dockerfile
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

Abstract

Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high. A key intuition that we develop is that the ratio of the operable volume around a sparse vector divided by the volume of the representational space decreases exponentially with dimensionality. We then analyze computationally efficient sparse networks containing both sparse weights and activations. Simulations on MNIST and the Google Speech Command Dataset show that such networks demonstrate significantly improved robustness and stability compared to dense networks, while maintaining competitive accuracy. We discuss the potential benefits of sparsity on accuracy, noise robustness, hyperparameter tuning, learning speed, computational efficiency, and power requirements.

Running experiments

This repository contains versioned code used to replicate the experiments in the paper. For the latest pytorch code for using the code in your own projects check the nupic.torch repository.

Below are instructions for reproducing all the charts and tables presented in the paper. There might be small differences due to randomness.

Prerequisites

All the scripts in this directory were implemented using python 2.7.

Once python version 2.7 is installed and configured in your system, use the following command to install all required python libraries and dependencies:

python setup.py --user develop

And the following script to download Google Speech Commands Dataset:

cd projects/speech_commands/data
./download_speech_commands.sh

Alternatively, you may use docker to run the experiments in a container:

docker build -t htmpaper .
docker run -it htmpaper /bin/bash

Training the models

All experiments have a common script called run_experiment.py located in the root of each project used to train and test the models. Once the models are trained, you may use the other script in the specific project folder to generate the figures and data used in the paper. We recommend using GPU acceleration. Otherwise some of the training, particularly for Google Speech Commands, will take significantly longer.

Use the following command to train all the models from the project folder:

python run_experiment.py -c experiments_paper.cfg

Use python run_experiment.py -h for more options.

Figures

Figure 2: Match probability for sparse vectors

The probability of matches to random binary vectors (with a active bits) as a function of dimensionality, for various levels of sparsity. The probability decreases exponentially with n. Black circles denote the observed frequency of a match (based on a large number of trials).

cd projects
python plot_numerical_results.py

Figure 3: Matching sparse scalar vectors: effect of scale

Left: The probability of matches to random scalar vectors (with a non-zero components) as a function of dimensionality, for various levels of sparsity. The probability of false matches decreases exponentially with n. Note that the probability for a dense vector, a = n/2 stays relatively high, and does not decrease with dimensionality. Right: The impact of scale on vector matches with a fixed n = 1000. The larger the scaling discrepancy, the higher the probability of a false match

cd projects
python scalar_sdr.py

Table 1: MNIST results for dense and sparse architectures

Network	Test score	Noise score
Dense CNN-1	99.14 ± 0.03	74,569 ± 3,200
Dense CNN-2	99.31 ± 0.06	97,040 ± 2,853

Sparse CNN-1	98.41 ± 0.08	100,306 ± 1,735
Sparse CNN-2	99.09 ± 0.05	103,764 ± 1,125

Dense CNN-2 SP3	99.13 ± 0.07	100,318 ± 2,762
Sparse CNN-2 D3	98.89 ± 0.13	102,328 ± 1,720
Sparse CNN-2 W1	98.20 ± 0.19	100,322 ± 2,082
Sparse CNN-2 DSW	98.92 ± 0.09	70,566 ± 2,857

We show classification accuracies and total noise scores (the total number of correct classification for all noise levels). Results are averaged over 10 random seeds, ± one standard deviation. CNN-1 and CNN-2 indicate one or two convolutional layers, respectively

cd projects
python test_score_table.py -c mnist/experiments_paper.cfg

Table 2: Classification on Google Speech Commands for a number of architectures

Network	Test score	Noise score
Dense CNN-2 (DR=0.0)	96.37 ± 0.37	8,730 ± 471
Dense CNN-2 (DR=0.5)	95.69 ± 0.48	7,681 ± 368
Sparse CNN-2	96.65 ± 0.21	11,233 ± 1013
Super-Sparse CNN-2	96.57 ± 0.16	10,752 ± 942

We show test and noise scores, averaged over 10 random seeds, ± one standard deviation. Dr corresponds to different dropout levels

cd projects
python test_score_table.py -c speech_commands/experiments_paper.cfg

Figure 5: MNIST Results With Noise

A. Example MNIST images with varying levels of noise. B. Classification accuracy as a function of noise level.

cd projects/mnist
python analyze_noise.py -c experiments_paper.cfg

Table 3: Key parameters for each network.

Network	L1 F	L1 Sparsity	L2 F	L2 Sparsity	L3 N	L3 Sparsity	Wt Sparsity
MNIST
denseCNN1	30	100.0%			1000	100.0%	100.0%
denseCNN2	30	100.0%	30	100.0%	1000	100.0%	100.0%

sparseCNN1	30	9.3%			150	33.3%	30.0%
sparseCNN2	32	8.7%	64	29.3%	700	14.3%	30.0%

denseCNN2SP3	30	100.0%	64	100.0%	700	14.3%	30.0%
sparseCNN2D3	32	8.7%	64	29.3%	1000	100.0%	100.0%
sparseCNN2W1	32	8.7%	64	29.3%	700	14.3%	100.0%
sparseCNN2DSW	32	8.7%	64	29.3%	1000	100.0%	30.0%

GSC
denseCNN2	64	100.0%	64	100.0%	1000	100.0%	100.0%
sparseCNN2	64	9.5%	64	12.5%	1000	10.0%	40.0%
SuperSparseCNN2	64	9.5%	64	12.5%	1500	6.7%	10.0%

L1F and L2F denote the number of filters at the corresponding CNN layer. L1,2,3 sparsity indicates k/n, the percentage of outputs that were enforced to be non-zero. 100% indicates a special case where we defaulted to traditional ReLU activations. Wt sparsity indicates the percentage of weights that were non-zero. All parameters are available in the source code.

cd projects
python parameters_table.py -c mnist/experiments_paper.cfg 
python parameters_table.py -c speech_commands/experiments_paper.cfg

Directory structure:

src : Contains all custom python libraries created for this paper
projects : Contains all experiments and supporting scripts to plot figures and tables used in the paper

File descriptions:

Dockerfile : Create docker container suitable to run all experiments
requirements.txt : Python libraries required to run experiments
setup.py : Python setup script. Call this file to install the library code

Library sources (src)

src/
├── expsuite.py                         # Multiprocess Experiments class
└── pytorch                                
    ├── audio_transforms.py             # Collection of audio transformations
    ├── benchmark_utils.py              # pytorch model benchmark utilities
    ├── dataset_utils.py                # pytorch dataset utils
    ├── duty_cycle_metrics.py           # Compute entropy and other dutycycle metrics
    ├── functions
    │   └── k_winners.py                # k-winner activation function
    ├── image_transforms.py             # An image transform that adds noise to random pixels in the image
    ├── model_utils.py                  # pytorch utils used to train and test models 
    ├── modules
    │   ├── flatten.py                  # module used to flatten the input retaining batch dimension
    │   ├── k_winners.py                # k-winner activation modules
    │   └── sparse_weights.py           # Modules used to enforce weight sparsity
    ├── resnet_models.py                # Modified resnet model from torchvision
    ├── sparse_net.py                   # A network with one or more hidden layers, which can be a sequence of k-sparse CNN followed by a sequence of k-sparse linear layer with optional dropout or batch-norm layers in between the layers
    ├── mnist_sparse_experiment.py      # Sparse MNIST experiments
    ├── sparse_speech_experiment.py     # Sparse Google Speech Commands experiments
    └── speech_commands_dataset.py      # "Google Speech Commands Dataset" as pytorch dataset

Experiments sources (projects)

MNIST (mnist)
- run_experiment.py: Main script used to run all experiments
- experiments_paper.cfg : Complete list of network parameters used in the MNIST experiments
- analyze_noise.py : Plot noise curves from experiment results
Google Speech Commands Dataset (speech_commands)
- run_experiment.py : Main script used to run all experiments
- experiments_paper.cfg : Complete list of network parameters used in the "Google Speech Commands Dataset" experiments
- data : Scripts to download and process "Google Speech Commands Dataset"
Other
- plot_numerical_results.py: Plot SDR numeric properties
- scalar_sdr.py: Code used to compute the probability of matching scalar sparse vectors
- test_score_table.py : Prints test and noise scores table
- parameters_table.py : Prints key parameters table

"How Can We Be So Dense? The Benefits of Using Highly Sparse Representations"; arXiv:1903.11257 [cs.LG].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how_can_we_be_so_dense

how_can_we_be_so_dense

README.md

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

Abstract

Running experiments

Prerequisites

Training the models

Figures

Figure 2: Match probability for sparse vectors

Figure 3: Matching sparse scalar vectors: effect of scale

Table 1: MNIST results for dense and sparse architectures

Table 2: Classification on Google Speech Commands for a number of architectures

Figure 5: MNIST Results With Noise

Table 3: Key parameters for each network.

Directory structure:

File descriptions:

Library sources (src)

Experiments sources (projects)

Files

how_can_we_be_so_dense

Directory actions

More options

Directory actions

More options

Latest commit

History

how_can_we_be_so_dense

Folders and files

parent directory

README.md

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

Abstract

Running experiments

Prerequisites

Training the models

Figures

Figure 2: Match probability for sparse vectors

Figure 3: Matching sparse scalar vectors: effect of scale

Table 1: MNIST results for dense and sparse architectures

Table 2: Classification on Google Speech Commands for a number of architectures

Figure 5: MNIST Results With Noise

Table 3: Key parameters for each network.

Directory structure:

File descriptions:

Library sources (src)

Experiments sources (projects)