This realization of this repository has been supported by the Italian MUR PRIN project “Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond” (Prot. 2017WR7SHH).
We have improved our HAM and sHAM compressed matrices formats by using a Canonical Huffman code that does not require to access the Huffman tree to perform decoding.
We have tested the space occupancy, dot time and energy requirements of HAM and sHAM and compared them to other state of the art techniques:
- Compressed Sparse Column (CSC)
- Index Map (IM)
- Compressed Shared Elements Row (CSER)
The tests have been performed on the dense layers of two deep neural networks and on five benchmark matrices (check the paper for the references).
- VGG19, trained on the MNIST dataset.
- DeepDTA, trained on the DAVIS dataset.
Matrix | rows | columns | 1-sparsity | distinct values |
---|---|---|---|---|
orsreg_1 | 2205 | 2205 | 2.907e−3 | 111 |
SiNa | 5743 | 5743 | 6.027e−6 | 24317 |
Covtype | 581012 | 54 | 2.200e−1 | 6682 |
Census | 2458285 | 68 | 5.697e−1 | 45 |
ImageNet | 1262102 | 900 | 3.099e−1 | 824 |
-
Install
python3
,python3-pip
andpython3-venv
. -
Make sure that
python --version
starts by 3 or executealias python='python3'
in the shell. -
Create a virtual environment and activate it:
python3 -m venv /path/to/new/virtual/environment source /path/to/new/virtual/environment/bin/activate
-
Install the required dependencies:
pip install -r ./requirements.txt
The python script utils.py contains all the primitives used to build the compressed structures and to compute the associated space occupancy, matrix-vector multiplication time and energy requirements.
The main folder contains three directories:
-
c_dot, containing the source code written in C language and the executable files to perform the matrix-vector multiplication of our sparse formats.
-
nns, containing a Python script (experiments.py) to reproduce the experiments shown in the paper, with all the required data; the results in .csv format are stored in the result folder.
-
benchmark, organized as the nns directory.
In the main folder, there are also two .sh scripts to run the experiments (detailed below, in the Usage section).
We provide two simple .sh scripts to run the experiments:
- small_experiments.sh computes space occupancy, dot time and energy requirements for DeepDTA, orsreg_1 and SiNa
- full_experiments.sh computes space occupancy, dot time and energy requirements for all the matrices
Warning: Census and ImageNet might take some time and memory to build the compressed structures.
Compiling the .c files for the dot product can be easily done by running from the main directory the following:
gcc -w -fPIC -shared -o c_dot/ham_dot_partial.so c_dot/ham_dot_partial.c -pedantic -Wall -pthread
gcc -w -fPIC -shared -o c_dot/sham_dot_partial.so c_dot/sham_dot_partial.c -pedantic -Wall -pthread
In order to reproduce the experiments, you should download this .tar.gz file, containing all the matrices, and merge the repository with the downloaded data.