Plausible kernelized bottleneck

Implementation of "Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks" by R. Pogodin and P. E. Latham (https://arxiv.org/abs/2006.07123)

Experiments

Scripts with reported experiments are stored in ./experiments_scripts/. The _grid.sh files run a loop over all setups, calling _single.sh that sets the hyperparameters and executes experiments.py.

experiment.py runs a single experiments with the given command line arguments; run python3 experiments.py --help for the list of arguments (or see utils.py, function parse_arguments).

3-layer MLP with 1024 neurons in each layer:

./experiments_scripts/run_mlp_experiments_grid.sh

Mean test accuracy over 5 trials (first row: method; cossim: cosine similarity kernel; Gaussian: Gaussian kernel; last layer: training of the last layer only; second row: additional modification of the method grp+div: grouping with divisive normalization; grp: grouping without divisive normalization):

	backprop		last layer		cossim			Gaussian
		grp+div		grp+div		grp	grp+div		grp	grp+div
MNIST	98.6	98.4	92.0	95.4	94.9	95.8	96.3	94.6	98.4	98.1
fMNIST	90.2	90.8	83.3	85.7	86.3	88.7	88.1	86.5	88.6	88.8
kMNIST	93.4	93.5	71.2	78.2	80.4	86.2	87.2	80.2	92.7	91.1
CIFAR10	60.0	60.3	39.2	38.0	51.1	52.5	47.6	41.4	48.4	46.4

VGG-like network:

./experiments_scripts/run_vgg_sgd_experiments_grid.sh

./experiments_scripts/run_vgg_adam_experiments_grid.sh

Mean test accuracy on CIFAR10 over 5 runs for a 7-layer conv nets (1x and 2x wide). Cossim: cosine similarity; divnorm: divisive normalization; bn: batchnorm. Empty entries: experiments for which we didn't find a satisfying set of parameters due to instabilities in the methods.

	backprop		pHSIC: cossim		pHSIC: Gaussian
		div	grp	grp+div	grp	grp+div
1x wide net + SGD	91.0	91.0	88.8	89.8		86.2
2x wide net + SGD	91.9	90.9	89.4	91.3		90.4
1x wide net + AdamW + batchnorm	94.1	94.3	91.3	90.1	89.9	89.4
2x wide net + AdamW + batchnorm	94.3	94.5	91.9	91.0	91.0	91.2

Mean test accuracy on CIFAR10 over 5 runs for a 7-layer conv nets (1x and 2x wide). FA: feedback alignment; sign sym.: sign symmetry; layer class.: layer-wise classification; divnorm: divisive normalization; bn: batchnorm. Empty entries: experiments for which we didn't find a satisfying set of parameters due to instabilities in the methods.

	FA	sign sym.	layer class.
				+FA
1x wide net + SGD			90.0
2x wide net + SGD			90.3
1x wide net + SGD + divnorm	80.4	89.5	90.5	81.0
2x wide net + SGD + divnorm	80.6	91.3	91.3	81.2
1x wide net + AdamW + bn	82.4	93.6	92.1	90.3
2x wide net + AdamW + bn	81.6	93.9	92.1	91.1

Requirements

The code was tested with the following setup:

OS: Debian GNU/Linux 9.12 (stretch)
CUDA: Driver Version: 418.87.01, CUDA Version: 10.1
Python 3.7.6
numpy 1.18.1
torch 1.6.0+cu101
torchvision 0.7.0+cu101

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
experiments_scripts		experiments_scripts
LICENSE		LICENSE
README.md		README.md
alt_feedback_layers.py		alt_feedback_layers.py
datasets.py		datasets.py
experiments.py		experiments.py
kernels.py		kernels.py
losses.py		losses.py
networks.py		networks.py
neurips2020_thumbnail.png		neurips2020_thumbnail.png
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plausible kernelized bottleneck

Experiments

3-layer MLP with 1024 neurons in each layer:

VGG-like network:

Requirements

About

Releases

Packages

Languages

License

romanpogodin/plausible-kernelized-bottleneck

Folders and files

Latest commit

History

Repository files navigation

Plausible kernelized bottleneck

Experiments

3-layer MLP with 1024 neurons in each layer:

VGG-like network:

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages