Skip to content
Paul Haase edited this page Nov 18, 2022 · 6 revisions

Important: For all following examples activate a python console in an environment with all required packages installed (see Installation Guide). The standard only specifies compression of the model weight parameters.

Overview

This software provides a python package called nnc, which works as a stand-alone model compression solution but can also be seamlessly integrated in existing python-based machine learning frameworks. Easy-to-use compression and decompression interfaces allow to achieve high compression without prior knowledge of the compression technologies. Interested users can achieve even higher compression using the advanced features.

Quickstart

A first example

An example model is provided at 'example/squeezenet1_1_pytorch_zoo.pt', which is SqueezeNet1_1 originally downloaded from the PyTorch (torchvision) model zoo (https://github.com/pytorch/vision). For compressing and decompressing the model with default settings do:

import nnc

nnc.compress_model('./example/squeezenet1_1_pytorch_zoo.pt', bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

This will create two files:

  • The compressed bitstream file at './example/bitstream_squeezenet1_1.nnc'
  • The reconstructed model file at './example/reconstructed_squeezenet1_1.pt'

PyTorch and TensorFlow Models

The software has a built-in support for PyTorch and TensorFlow models, which means that it can process arbitrary PyTorch/TensorFlow models (ending with .pt, .pth for PyTorch or .h5, .hdf5, .tf for TensforFlow) in the same fashion as in the exampled above. Provided that a model is stored at '/path/arbitrary_model.[pt, pth, h5, hdf5, tf]' it can be compressed and decompressed (default settings) as follows:

import nnc

nnc.compress_model('/path/arbitrary_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc')
nnc.decompress_model('/path/bitstream.nnc', model_path='/path/reconstructed_arbitrary_model.[pt, pth, h5, hdf5, tf]' )

Key-Parameter: Quantization parameter qp

Due to the quantization, the reconstructed model slightly differs from the original one, but the default settings of the NNC software usually do not degrade the model performance. However, there is a key parameter in the encoder function that controls the rate-perfomance trade-off, called quantization parameter: qp (default: qp=-38). Decreasing the qp value will result in a higher bitrate but also in a lower model performance degradation. Accordingly, increasing the qp value will result in a lower bitrate but in a higher model performance degradation. For details refer to the Functions and Parameters section.

For a higher bitrate but lower model degradation, e.g. do:

nnc.compress_model('/path/example_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc', qp=-42)

For a lower bitrate but a higher model degradation, e.g. do:

nnc.compress_model('/path/example_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc', qp=-34)

Functions and Parameters

There are two main functions for the encoder called compress_model and compress (see Encoder). Please note, that compress_model internally calls compress. Accordingly, there are two main functions for the decoder called decompress_model and decompress (see Decoder) and here also decompress_model internally calls decompress.

Encoder

def compress_model( model_path_or_object, 
                    bitstream_path="./bitstream.nnc", 
                    qp=-38, 
                    qp_density=2, 
                    nonweight_qp=-75,
                    qp_per_tensor=None,
                    use_dq=True, 
                    codebook_mode=0, 
                    scan_order=0, 
                    lambda_scale=0, 
                    param_opt=True,
                    cabac_unary_length_minus1=10, 
                    opt_qp=False, 
                    ioq=False,
                    bnf=False,
                    lsa=False,
                    fine_tune=False,
                    block_id_and_param_type=None, 
                    model_name=None, 
                    model_executer=None,
                    model_struct=None, 
                    dataset_path=None, 
                    learning_rate=1e-5, 
                    batch_size=64,
                    epochs=30,
                    max_batches=600, 
                    num_workers=8,
                    return_model_data=False,
                    verbose=True,
                    return_bitstream=False,
                   ):

Required input to this function is either a path specifying the location of a stored model or a PyTorch- or TensorFlow-Model of type torch.nn.Module or tensorflow.Module, respectively.

def compress( parameter_dict, 
              bitstream_path="./bitstream.nnc", 
              qp=-38, 
              qp_density=2, 
              nonweight_qp=-75,
              qp_per_tensor=None,
              use_dq=True, 
              codebook_mode=0, 
              scan_order=0, 
              lambda_scale=0 , 
              param_opt=True, 
              cabac_unary_length_minus1=10, 
              opt_qp=False, 
              ioq=False, 
              bnf=False,
              lsa=False,
              fine_tune=False,
              block_id_and_param_type=None, 
              model=None, 
              model_executer=None,
              verbose=True,
              return_bitstream=False,
            ):

Required input to this function is only a dict, with the keys specifying the tensor names as strings and the values representing the tensor (values) as numpy arrays of type numpy.float32 or numpy.int32 regardless of its shape. This dict represents the state dictionary of the NN model.

Information: The function will compress any dict, that fulfills these requirements (keys of type string and values numpy arrays of type numpy.int32/float32), regardless of whether it contains values related to neural network tensors or not.

Parameters

Details on the parameters are given in the following table. For a better understanding also check out the Examples-Section.

Parameter Function Description
model_path_or_object compress_model Required, Type: [String, torch.nn.Module, tensorflow.Module], Default: -. Can be either a string specifying the path to the source model file or a model object of type torch.nn.Module or tensorflow.Module to be compressed. If it is a path ,it can be any PyTorch model (ending with pt. or .pth), any TensorFlow model (ending with .h5, .hdf5, .tf) or any file that can be loaded with pythons pickle module and that contains a parameter dict which fulfills the requirements of parameter_dict.
parameter_dict compress Required, Type: Dict, Default: -. Specifies a python dict which represents the state dictionary to be compressed. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
bitstream_path compress_model
compress
Optional, Type: String, Default: "./bitstream.nnc". Specifies the path where the bitstream file is stored after the compression process. Principally, an arbitrary file ending can be used, since it is not strictly specified, but it is recommended to use ".nnc".
qp compress_model
compress
Optional, Type: int32, Default: -38. Quantization parameter (qp) that controls the quantization stepsize and thus the rate-performance trade-off for all weight parameters. A lower qp is related to a lower quantization stepsize, which yields a higher bitrate but also a lower model performance degradation. Accordingly, increasing the qp value results in a lower bitrate but also in a higher model performance degradation. The quantization stepsize delta is derived from qp and qp_density as follows:

mul = (1 << qp_density ) + ( qp + ( (1 << qp_density) -1) )
shift = qp >> qp_density
delta = mul * 2.0 shift - qp_density

A qp value of 0 corresponds to a quantization stepsize equal to 1. Assuming qp_density is equal to 2 (3, 4 ...), decreasing the qp value by 4 (8, 16, ...) means halving the quantization stepsize. Accordingly, increasing the qp value by 4 (8, 16, ...) means doubling the quantization stepsize. (See also: qp_density)

Important: This qp value only applies to weight parameters (e.g. convolutional layers or fully connected layers). Usually, all tensors which have more than one dimension are interpreted as weight parameters.
qp_density compress_model
compress
Optional, Type: int32, Default: 2. Controls the mapping between quantization parameters qp and the quantization stepsizes. The higher the value of qp_density, the closer are adjacent quantization stepsizes achievable by the qp. E.g. for a qp_density equal to 2, the qp values equal to -4 , -6 and -8 coresspond to quantization stepsizes equal to 0.5, 0.375 and 0.25. For a qp_density equal to 3, qp values of -4, -6 and -8 correspond to quantization stepsizes equal to 0.75, 0.625 and 0.5. (See also: qp)
nonweight_qp compress_model
compress
Optional, Type: int32, Default: -75. Non-weight quantization parameter (nonweight_qp) that controls the quantization stepsize and thus the rate-performance trade-off for all non-weight parameters. Works exactly like the (regular) qp parameter. Generally, tensors related to non-weight parameters are more sensitive to quantization, so a much finer quantization needs to be applied.

Important: This nonweight_qp value only applies to non-weight parameters (e.g. batch-norm layers or biases). Usually, all tensors which have only one dimension are interpreted as non-weight parameters.
qp_per_tensor compress_model
compress
Optional, Type: dict, Default: None. A dict that can be used to specify a qp (or nonweight_qp) per tensor. The keys are strings that must match exactly the tensor names in the parameter_dict. The values are integers specifiying the qp value to be applied and works exactly the same as qp and nonweight_qp, respectively.

Important: For each tensor that is not specified in the dict, the value of qp or nonweight_qp (depending on the tensor type) is applied, respectively.
use_dq compress_model
compress
Optional, Type: boolean, Default: True. Enable dependent scalar quantization (DQ), also known as Trellis-coded Quantization (TCQ). DQ is a vector quantization method, which usually achieves lower model performance degradation at lower or equal bitrates. It employs a procedure to switch between two scalar quantizers each having distinct reconstruction values, depending on the quantization stepsize controlled by the qp. If use_dq==False a single scalar, uniform quantizer is applied.

Important: In order to achieve a quantization performance (in terms of a similar quanitzation error) more or less comparable to uniform quantization with qpuni and qp_density0 the qp for dependent quantization qpdq shall be set to qpdq = qpuni - (1 << qp_density0).
E.g. for the default settings with qp_density=2 that means decreasing the qp for DQ by 4, or in other words uniform quantization with qpuni=-34 (use_dq==False) is equivalent to dependent scalar quantization with qpdq=-38 (use_dq==True). Please note, that this is only a general guideline. In order to achieve the same quantization error further adjustments of the qp and qp_density may be required!
codebook_mode compress_model
compress
Optional, Type: int32, Default: 0. The codebook mode denotes whether an integer codebook is derived for transmission of the quantized values of a tensor or not. Using a codebook does not change the quantization result but the way the quantized values are transmitted. Whenever a codebook is used, the quantized values are substituted by indices each denoting an entry in the codebook, which holds all unique values in the quantized tensor. There are three modes specified, denoted by the repective value for codebook_mode:

0: No codebook is used. The values are encoded as output by the uniform/DQ quanzization stage.
1: Force codebook. All tensors to be transmitted use a a codebook for the encoding process. DQ is disabled in this case.
2: Choose best. The encoder selects the mode which produces the lowest bitrate for each tensor. Note: This method produces the lowest bitrate, but may be time consuming, since it tests both variants for all tensors.

Information: Tensors containing many unique values may result in big codebooks and slow encoding.
scan_order compress_model
compress
Optional, Type: int32, Default: 0. Specifies the scan order of the tensors for the quantization and encoding process. Internally, all tensors to be encoded are interpreted either as 1D vectors or 2D matrices, which means that e.g. a 4D tensor is transformed to a 2D matrix (dim0 x (dim1 * dim2 * dim3)). Five different scan orders are specified for 2D matrices:

0: Row-first scan (scanning matrix row-by-row)
1: 8x8 block scan (scanning the 8x8 blocks block row by block row)
2: 16x16 block scan (scanning the 16x16 blocks block row by block row)
3: 32x32 block scan (scanning the 32x32 blocks block row by block row)
4: 64x64 block scan (scanning the 64x64 blocks block row by block row)

Note: For all block scan orders (scan_order > 0) a suitable decoder can decode each block row independently, which also enables parallel decoding of block rows. Currently, the provided decoder does not provide this feature, however this feature may be added at later stage.
lambda_scale compress_model
compress
Optional, Type: float32, Default: 0.0. A scaling factor which is applied to the lagrangian multiplier lambda in the rate-distortion (RD) cost function D + lambda * R, which is used for the quantisation and encoding decisions. More specifically, lambda_scale denotes whether and to which degree the bitrate is considered for computing the costs during the quantization of weight parameters. Hence, setting lambda_scale to zero means that the bitrate R , measured in bits, is not taken into account and only the distortion D, measured as mean-squared-error (MSE), is considered in the cost function.

Note: It is recommended to set lambda_scale to 0.0. The results for values larger than 0.0 are not stable and might cause significant drops in the model performance!
param_opt compress_model
compress
Optional, Type: boolean, Default: True. Enables parameter optimization for DeepCABAC entropy coding. If enabled, the encoder optimizes parameters for the DeepCABAC probability estimation scheme, which control the adaptation rate of the probability estimators to the source statistics. These parameters are then written to the bitstream to make them available at the decoder. This procedure usually yields lower bitrates with a small overhead in encoding time.
cabac_unary_length_minus1 compress_model
compress
Optional, Type: int32, Default: 10. A parameter that controls the length of the unary part in the binarization scheme of quantized neural network parameter values for the (DeepCABAC) entropy encoding process. Changing the values only affects the bitrate, however the effect is expected to be minor.
opt_qp compress_model
compress
Optional, Type: boolean, Default: False. Enables a QP optimization scheme, that is based on the tensor statistics.

Note: May require adjustments to the parameter qp, because it could cause significant model performance degradation, otherwise. Furthermore, the results of this scheme may not be stable for all models.
ioq compress_model
compress
Optional, Type: boolean, Default: False. Enables Inference-optimized quantization (IOQ), an optimization scheme that tests different qp values for each tensor, also considering the model performance change. This method runs a whole quantization, encoding and evaluation step for each tensor and QP that is tested on an validation dataset. In most cases this procedure yields a siginificant improvement of that rate-performacnce trade-off.

Important: Depending on the use case this may significantly increase in the encoding runtime (in some cases several orders). Furthermore, it requires a model_executer (of type ModelExecuter) which can run the inference on a dataset. More specifically, it requires the function eval_model to be implemented. For details refer to the sections Class Definitions and Advanced Features.
bnf compress_model
compress
Optional, Type: boolean, Default: False. Enables Batch-norm Folding (BNF), which reduces the number of batch-norm parameter vectors from 4 to 2, if batch-norm parameters are present. For this, the software needs to identify the batch-norm parameters and which layer they belong to. This can be specified using block_id_and_param_type (see parameter description below). For further details refer to the Advanced Features-Section.

Important: Batch-norm folding requires the tensors to be shaped, such that the first dimensions corresponds to the number of output channels, which is usually the case or PyTorch Models but not for TensorFlow Models. For changing the order of the dimensions e.g. use tensorflow.transpose
lsa compress_model
compress
Optional, Type: boolean, Default: False. Enables local scaling adaptation (LSA), which adds a scaling vector to each weight tensor. The length of the vector is equal to the number of output channels of the weigth tensor. When enabled the encoder tunes the values of the scaling vector such that it partly compensates the quantization error introduced by quantizing the weight tensor. Requires a model_executer (of type ModelExecuter) which implements the function tune_model with the funcitonalty to tune lsa parameters.
fine_tune compress_model
compress
Optional, Type: boolean, Default: False. Enables fine tuning (FT), which fine tunes all non-weight tensors. When enabled the encoder fine tunes the values of the non-weight tensors such that it partly compensates the quantization error introduced by quantizing the weight tensor. Requires a model_executer (of type ModelExecuter) which implements the function tune_model with the funcitonalty to fine tune non-weight parameters.
block_id_and_param_type compress_model
compress
Optional, Type: dict, Default: None. A dict specifying the block id and parameter type for each tensor. The dict shall contain two keys of type string 'block_identifier' and 'parameter_type'. The values shall also be dicts, with the keys of type string specifying the tensor names (exactly as in the parameter_dict) and the values of type string specifying the related 'block_identifier' and 'parameter_type', repectively. The parameter type strings can be any of:

'weight'
'weight.ls'
'bias'
'bn.beta'
'bn.gamma'
'bn.mean'
'bn.var'
'unspecified' (special type, see notes below)

Important: A single block can contain parameters of each type only once (except for 'unspecified')!

All tensors that belong to the same block shall have the same 'block_identifier'. These identifiers can be arbitrary strings which shall be unique for different blocks. Tensors with the same 'block_identifier' are encoded as a block structure in a single unit. Whenever a parameter is denoted as 'unspecified', it is ignored for the block structure and transmitted seperately in a single unit. This specifier can be used, whenever the parameter type of a tensor is unknown or does not fit any of the other parameter types.

For further details on block_id_and_param_type and the meaning of the parameter types refer to the Advanced Features-Section.

Note: If compress_model is called with batch-norm folding enabled (bnf=True) and block_id_and_param_type=None, the function tries to guess the the block identifiers and parameter types, which works at least for some models from the PyTorch and TensorFlow model zoo and probably for most PyTorch/TensorFlow models, which fulfill certain conditions (see Advanced Features-Section). By now, this feature is only available for PyTorch and TensorFlow!
model compress Optional, Type: NNRModel, Default: None. Instance of Class NNRModel, which provides model related information (e.g. parameter types, block identifiers, tensor dimensions, etc) required for the compression process, and functions for handling of the model. There are three types specified, a (generic) base class and two classes inherited from the base class for PyTorch and TensorFlow.

NNRModel: Base Model-Class
PytorchModel( NNRModel ): Model class for PyTorch Models
TensorflowModel( NNRModel ): Model class for TensorFlow Models

If not specified, an instance of the base model class NNRModel will be created, internally. Whenever PyTorchModel or TensorflowModel is used, an identifier is written to the bitstream such that the decoder can derive the related model framework.

Note: Whenever the function compress_model detects a PyTorch or TensorFlow model, it internally creates the respective model type and provides it to the (internal) compress function call!

Also see Class Definitions.
model_name compress_model Optional, Type: string, Default: None. Name of the model to be processed. Only required for TensorFlow models if an model_executer shall be created internally. For using data-based methods on ImageNet TensorFlow models need some preprocessing. Right now the following model names are supported:

[ 'DenseNet121', 'DenseNet121', 'DenseNet201', 'EfficientNetB0', 'EfficientNetB1', 'EfficientNetB2', 'EfficientNetB3', 'EfficientNetB4', 'EfficientNetB5', 'EfficientNetB6', 'EfficientNetB7', 'InceptionResNetV2', 'InceptionV3', 'MobileNet', 'MobileNetV2', 'NASNetLarge', 'NASNetMobile', 'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50V2', 'ResNet101V2', 'ResNet152V2', 'VGG16', 'VGG19', 'Xception']
model_executer compress_model
compress
Optional, Type: ModelExecuter, Default: None. A model_executer that can run the model, e.g. inference or training on a dataset. Must be an instance of ModelExecuter. If dataset_path and model_struct are provided, an instance of ModelExecuter for ImageNet-based models will be created within compress_model. However, the NNCodec software allows to also use user-customised model_executers (e.g. for different datasets) as long as they are inherited from the ModelExecuter-Class and implement its interface. For details refer to Class Definitions and Advanced Features
model_struct compress_model Optional, Type: [torch,nn.Module, tensorflow.Module], Default: None. The model file that contains the computational graph, which is required to run the model. For PyTorch this shall be an instance of torch.nn.Module. For TensorFlow this shall be an instance of tensorflow.Module. This model_struct requires to fit the stucture of the model parameters stored at model_path or in parameter_dict, repectively. Or in other words, model_struct must be able to load the parameters as specified in the file at model_path or in parameter_dict. For further details on the usage, check out the Examples-Section.
dataset_path compress_model Optional, Type: String, Default: None. Specifies the path to the ImageNet-dataset for training and evaluation. In order to perform full training of the model the folder shall contain a subfolder "train", which contains the training set. For validation of the model there shall be a subfolder "val", which contains the test or validation set. A third set, which shall be in the folder "tuning" is a fine-tuning set which is required for data-driven comrpession methods like inference-based quantization (IOQ). Usually the fine-tuning set is a subset of the training set.
learning_rate compress_model Optional, Type: float32, Default: 1e-5. Learning rate that is applied for fine tuning and local scaling adaptation (LSA) on ImageNet.
batch_size compress_model Optional, Type: int32, Default: 64. Batch size that is applied for fine tuning and local scaling adaptation (LSA) on ImageNet.
epochs compress_model Optional, Type: int32, Default: 64. Number of epochs that the model is trained during fine tuning and local scaling adaptation (LSA) on ImageNet.
max_batches compress_model Optional, Type: int32, Default: 600. Maximum number of batches the model is trained on during fine tuning and local scaling adaptation (LSA) on ImageNet.
num_workers compress_model Optional, Type: int32, Default: 8. Number of (parallel) workers that are used for the dataloaders for training and inference in order to speed up the process.
return_model_data compress_model Optional, Type: boolean, Default: False. The flag determines whether the return value block_id_and_param_type is present (return_model_data==True) or not (return_model_data==False).
verbose compress_model
compress
Optional, Type: boolean, Default: True. Activate verbosive output.
return_bitstream compress_model
compress
Optional, Type: boolean, Default: False. The flag determines whether the return value bitstream is present (return_bitstream==True) or not (return_bitstream==False).

Return Values

Important: The return values depend on the configuration of the flags return_bitstream and return_model_data as follows:

for compress_model:

  • return_bitstream==False, return_model_data == False: No return values
  • return_bitstream==True , return_model_data == False: a single value bitstream
  • return_bitstream==False, return_model_data == True : a single value block_id_and_param_type
  • return_bitstream==True , return_model_data == True : a 2-tuple (bitstream, block_id_and_param_type)

for compress:

  • return_bitstream==False: No return values
  • return_bitstream==True : a single value bitstream
Return Value Function Description
bitstream compress_model
compress
Condition: return_bitstream==True, Type: bytearray. The compressed bitstream as a bytearray. Only present if return_bitstream is equal to True.
block_id_and_param_type compress_model Condition: return_model_data==True , Type: dict. A dict specifying the block id and parameter type for each tensor (also see description of compress parameter block_id_and_param_type). Only present if return_model_data is equal to True.

Note: This dict can be used to provide it to the decoder, e.g. in order to reconstruct folded batch-norm parameters. The returned value is either equal to block_id_and_param_type guessed in compress_model or specified as input parameter of compress_model.

Decoder

def decompress_model( bitstream_or_path, 
                      model_path=None, 
                      block_id_and_param_type=None, 
                      model_struct=None,
                      model_executer=None,
                      model_name=None, 
                      dataset_path=None,  
                      batch_size=64,  
                      num_workers=8,
                      reconstruct_bnf=True,
                      reconstruct_lsa=True,
                      test_model=False,
                      return_model_information=False,
                      return_decompressed_model=False,
                      return_model_with_decoded_parameters=False,
                      verbose=True
                    ):
def decompress( bitstream_or_path, 
                block_id_and_param_type=None, 
                return_model_information=False,
                verbose=True,
                reconstruct_lsa=True, 
                reconstruct_bnf=True
              ):

Required input to both functions is only a path specifying the location of the bitstream to be decompressed. decompress returns the model parameter state dict and the topology storage format, if return_tpl_storage_format is True. The topology storage format denotes the related model framework, if specified. If the framework can be detected the parameter state dict is in the respective format, such that it is compatible with the framework (e.g. PyTorch, TensorFlow).

Parameters

Parameter Function Description
bitstream_or_path decompress_model
decompress
Required, Type: [string, bytearray], Default: None. Specifies either the path to the bitstream file to be decompressed (usually ends with ".nnc") as a string or the bitstream as bytearray.
model_path decompress_model Optional, Type: string, Default: None. Specifies the path where the reconstructed model file is stored after the decompression process. If a model related to a known framework (e.g. PyTorch, TensorFlow) is detected, the state dict will be stored in the respective format, such that it is compatible with the given framework. If no model path is specified, it will be set to "./rec.[pt, tf, mdl]" by default depending on the detected model format.
block_id_and_param_type decompress_model
decompress
Optional, Type: dict, Default: None. A dict specifying the block id and parameter type for each tensor. The dict shall contain two keys of type string 'block_identifier' and 'parameter_type'. The values shall also be dicts, with the keys of type string specifying the tensor names (exactly as in the parameter_dict) and the values of type string specifying the related 'block_identifier' and 'parameter_type', repectively. The parameter type strings can be any of:

'weight'
'weight.ls'
'bias'
'bn.beta'
'bn.gamma'
'bn.mean'
'bn.var'
'unspecified' (special type, see notes below)

Important: A single block can contain parameters of each type only once (except for 'unspecified')!

All tensors that belong to the same block shall have the same 'block_identifier'. These identifiers can be arbitrary strings which shall be unique for different blocks. The parameter type 'unspecified' can be used, whenever the parameter type of a tensor is unknown or does not fit any of the other parameter types.

Note: This structure is not required for den decoding process, but provides information on the original structure of the model e.g. in order to reconstruct folded batch-norm parameters.

For further details on block_id_and_param_type and the meaning of the parameter types refer to the Advanced Features-Section.
model_struct decompress_model Optional, Type: [torch.nn.Module, tensorflow.Module], Default: None. The model file that contains the computational graph, which is required to run the model. For PyTorch this must be an instance of torch.nn.Module. For TensorFlow this must be an instance of tensorflow.Module. This model_struct requires to fit the stucture of the model parameters stored at model_path or in parameter_dict, repectively. Or in other words, model_struct must be able to load the parameters as specified in the file at model_path or in parameter_dict.

Information: If model_struct is provided, a copy of model_struct equipped with the decompressed parameters can be returned by decompress_model for further use. Currently, this feature is only available for PyTorch and TensorFlow.

For further details on the usage, check out the Examples-Section.
model_executer deompress_model Optional, Type: ModelExecuter, Default: None. A model_executer that can run the model, e.g. inference or training on a dataset. Must be an instance of ModelExecuter. If dataset_path and model_struct are provided, an instance of ModelExecuter for ImageNet-based models will be created within decompress_model. However, the NNCodec software allows to also use user-customised model_executers (e.g. for different datasets) as long as they are inherited from the ModelExecuter-Class and implement its interface. For details refer to Class Definitions and Advanced Features
model_name decompress_model Optional, Type: string, Default: None. Name of the model to be processed. Only required for TensorFlow models if an model_executer shall be created internally. For using data-based methods on ImageNet TensorFlow models need some preprocessing. Right now the following model names are supported:

[ 'DenseNet121', 'DenseNet121', 'DenseNet201', 'EfficientNetB0', 'EfficientNetB1', 'EfficientNetB2', 'EfficientNetB3', 'EfficientNetB4', 'EfficientNetB5', 'EfficientNetB6', 'EfficientNetB7', 'InceptionResNetV2', 'InceptionV3', 'MobileNet', 'MobileNetV2', 'NASNetLarge', 'NASNetMobile', 'ResNet50', 'ResNet101', 'ResNet152', 'ResNet50V2', 'ResNet101V2', 'ResNet152V2', 'VGG16', 'VGG19', 'Xception']
dataset_path decompress_model Optional, Type: string, Default: None. Specifies the path to the ImageNet-dataset for training and evaluation. In order to perform full training of the model the folder shall contain a subfolder "train", which contains the training set. For validation of the model there shall be a subfolder "val", which contains the test or validation set. A third set, which shall be in the folder "tuning" is a fine-tuning set which is required for data-driven comrpession methods like inference-based quantization (IOQ). Usually the fine-tuning set is a subset of the training set.
batch_size decompress_model Optional, Type: int32, Default: 64. Batch size that is applied during inference on the testset for 'test_model'.
num_workers decompress_model Optional, Type: int32, Default: 8. Number of (parallel) workers that are used for the dataloaders for inference in order to speed up the process.
reconstruct_bnf decompress_model
decompress
Optional, Type: boolean, Default: True. Reconstruct (unfold) batch-norm parameters if possible. Requires block_id_and_param_type to be present.
reconstruct_lsa decompress_model
decompress
Optional, Type: boolean, Default: True. Apply (multiply) the LSA parameters if possible.
test_model decompress_model Optional, Type: boolean, Default: False. Run inference on a dataset. Requires a model_executer, which implements the function 'test_model'.
return_model_information decompress_model
decompress
Optional, Type: boolean, Default: False. The flag determines whether the return value model_information is present (return_model_information==True) or not (return_model_information==False).
return_decompressed_model decompress_model Optional, Type: boolean, Default: False. The flag determines whether the return value decompressed_model is present (return_decompressed_model==True) or not (return_decompressed_model==False).
verbose decompress_model
decompress
Optional, Type: boolean, Default: True. Activate verbosive output.

Return Values

Important: The return values depend on the configuration of the flags return_model_information and return_decompressed_model as follows:

for decompress_model:

  • return_decompressed_model==False, return_model_information == False: No return values
  • return_decompressed_model==True , return_model_information == False: a single value decompressed_model
  • return_decompressed_model==False, return_model_information == True : a single value model_information
  • return_decompressed_model==True , return_model_information == True : a 2-tuple (decompressed_model, model_information)

for decompress:

  • return_model_information==False: a single value rec_parameters
  • return_model_information==True : a 2-tuple (rec_parameters, model_information)
Return Value Function Description
rec_parameters decompress Condition: None, Type: dict. The reconstructed parameter state dict, containing the tensor names as keys and the parameter values as numpy array of type numpy.float32 or numpy.int32
model_information decompress_model
decompress
Condition: return_model_information==True, Type: dict. A dict that contains model related information, e.g. the topology storage format or pruning-, sparsification- decomposition- and unification performance maps (see [1]). Only present if return_model_information is equal to True.


model_information["topology_storage_format"]:
Denotes the storage format of the parameter dict (topology storage format). The following values are specified:

0: NNR_TPL_UNREC - unrecognized format
3: NNR_TPL_PT - PyTorch format
4: NNR_TPL_TEF - TensorFlow format

Note: There are more topoplogy storage formats specified by the standard, which are not yet implemented.


model_information["performance_map_flags"][performance_flag]:
A dict specifying the value of the performance flag denoted by performance_flag per tensor. The keys are the tensor names as strings and the values are integers which denote the value of the flag (either 0 or 1). The following performance_flags are available:

'mps_sparsification_flag'
'mps_pruning_flag'
'mps_unification_flag'
'mps_decomposition_performance_map_flag'
'lps_sparsification_flag'
'lps_pruning_flag'
'lps_unification_flag'
'lps_decomposition_performance_map_flag'


model_information["performance_maps"]["mps"][performance_map]:
A dict specifying the values of the model parameter set (MPS) related performance map denoted by performance_map. The keys are the the names of the performance map syntax elements as strings and the values are the values of the respective syntax elements as decoded from the bitstream. The specifier performance_map can be any of:

'sparsification_performance_map' (only present if mps_sparsification_flag==1)
'pruning_performance_map' (only present if mps_pruning_flag==1)
'unification_performance_map' (only present if mps_unification_flag==1)
'decomposition_performance_map' (only present if mps_decomposition_performance_map_flag==1)


model_information["performance_maps"]["lps"][performance_map]:
A dict specifying the values of the layer parameter set (LPS) related performance map denoted by performance_map. The keys are the the names of the performance map syntax elements as strings and the values are the values of the respective syntax elements as decoded from the bitstream. The specifier performance_map can be any of:

'sparsification_performance_map' (only present if lps_sparsification_flag==1)
'pruning_performance_map'(only present if lps_pruning_flag==1)
'unification_performance_map' (only present if lps_unification_flag==1)
'decomposition_performance_map' (only present if lps_decomposition_performance_map_flag==1)
decompressed_model decompress_model Condition: model_struct, return_decompressed_model==True, Type: [torch.nn.Module, tensorflow.Module]. A model_struct equipped with the decompressed parameters. Requires a model_struct which can load the parameters. For PyTorch is shall be of type toch.nn.Module and for TensorFlow it shall be of type tensorflow.Module. If no suitable model_struct is provided the returned value is None. Currently, this feature is only available for PyTorch and TensorFlow. Only present if return_model_information is equal to True. For further details on the usage check out the Examples-Section.

PyTorch and TensorFlow Support

The NNCodec software has a built-in support for PyTorch and Tensorflow Models. Usually, related models can be detected and handled automatically. An identifier is written to the bitstream that enables the decoder to detect the related framework and output the model in the respective format.

Furthermore, the software provides data-based methods for models based on ImageNet. These methods include inference-optimised quantization, local scaling adaptation, fine tuning and testing the model by inference on the validation set at the decoder. For this, a structure of type torch.nn.Module or tensorflow.Module and the path to the ImageNet-Dataset must be provided. Check out the Examples-Section for further details on how to use this software with PyTorch and TensorFlow models.

Advanced Features

Coming soon! Soon, we will provide further details on how to use the advanced features of NNCodec! Meanwhile, you will find details on the usage of adavanced features in the Examples-Section.

Class Definitions

Model Executer

The class ModelExecute provides an interface, which needs to be implemented in order to enable data-based functionalities like fine tuning, local scaling adaptation or inference-based optimization. This class is part of the module nnc_core (see below):

nnc_core.nnr_model.ModelExecute(ABC)

All inherited classes shall implement the fowlloing interface:

class ModelExecute(ABC):
    def eval_model(self,
                   parameters,
                   verbose=False,
                   ):

    def test_model(self,
                   parameters,
                   verbose=False,
                   ):
    
    def tune_model(self,
                   parameters,
                   param_types,
                   lsa_flag,
                   ft_flag,
                   verbose=False,
                   ):

    @abstractmethod
    def has_eval(self):
        return False
    
    @abstractmethod
    def has_test(self):
        return False
    
    @abstractmethod
    def has_tune_ft(self):
        return False
    
    @abstractmethod
    def has_tune_lsa(self):
        return False

Functions and Parameters

Important: All function parameters defined by the interface are mandatory, other function parameters shall be optional.

def eval_model(self,
               parameters,
               verbose=False,
              ):

The function eval_model evaluates the model performance by inference on a evaluation dataset. Usually, the evaluation dataset is a reduced data set (e.g. a subset of the training set) and different from the validation test set. The function shall return a tuple of values, where the first value of the tuple is a scalar that denotes or is related to the model performance (e.g. Top1-Accuracy). A bigger value means better performance and a smaller value means lower performance. This value is evaluated for tools like inference-optimized quantization. The returned tuple shall at least contain this value. Other values are optional.

Parameter Description
parameters Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
verbose Optional, Type: Boolean, Default: False. Activate verbosive output. Shows a progress bar when activated.
def test_model(self,
               parameters,
               verbose=False,
              ):

The function test_model evaluates the model performance by inference on a validation test set, which is usally different from the evaluation dataset used by eval_model. The function shall return a tuple of values, which denote or are related to the model performance (e.g. Top1-Accuracy, Top5-Accuracy, etc.). The returned tuple shall at least contain one value. Other values are optional. The output of this function is not required for any functionality and is thus informative. It can be used, e.g., to evaluate the effect of the quantization on the performance, after decompression.

Parameter Description
parameters Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
verbose Optional, Type: Boolean, Default: False. Activate verbosive output. Shows a progress bar when activated.
def tune_model(self,
               parameters,
               param_types,
               lsa_flag,
               ft_flag,
               verbose=False,
              ):

The function tune_model tunes (trains) the non-weight parameters (fine tuning of e.g. biases, batch-norm parameters, etc.) and/or the local scaling parameters (local scaling adaptation) on a parameter tuning set, which unsually is a subset of the training set. The function shall return a tuple with at least two values. The first value is a dict which contains the parameter dict with the local scaling parameters, only. The keys are strings with the name of the tensor (usually the name of the related weight tensor with "_scaling" attached to it) and the values are the tensors as numpy arrays of type numpy.int32 or numpy.float32. Whenever lsa is disabled (lsa_flag==False) or no lsa parameters are present the returned parameter dictionary shall be empty. The second value of the returned tuple is a dict which contrains the parameter dict with the non-weight parameters, only. The keys are strings with the name of the tensor and the values are the tensors as numpy arrays of type numpy.int32 or numpy.float32. Whenever fune tuning is disabled (ft_flag==False) or no non-weight parameters are present the returned parameter dictionary shall be empty. Adiitional values in the returned tuple are optional.

Parameter Description
parameters Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary (including the lsa parameters, if lsa is enabled). The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
param_types Required, Type: Dict, Default: -. A python dict which specifies the parameter types for each tensor in the parameter dict. The keys are strings which denote the name of the tensor and shall match exactly the names in the parameter dictionary. The values are strings which specifiy the parameter type and can be any of the following:

'weight' - weigths
'weight.ls' - local scaling parameters
'bias'- biases (non-weights)
'bn.beta' - batch-norm parameter (non-weights)
'bn.gamma' - batch-norm parameter (non-weights)
'bn.mean' - batch-norm parameter (non-weights)
'bn.var' - batch-norm parameter (non-weights)
'unspecified' - others or not specified (treated as non-weights)

lsa_flag Required, Type: Boolean, Default: -. Activate tuning of local scaling parameters, if present.
ft_flag Required, Type: Boolean, Default: -. Activate fine tuning of non-weight parameters, if present.
verbose Optional, Type: Boolean, Default: False. Activate verbosive output. Shows a progress bar when activated and additional information about the training process.
    @abstractmethod
    def has_eval(self):
        return False

The function has_eval denotes whether eval_model is implemented or not, and thus whether the functionality for evaluation on the evaluation dataset is available or not. If eval_modelis implemented the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_test(self):
        return False

The function has_test denotes whether test_model is implemented or not, and thus whether the functionality for inference on the validation test set is available or not. If test_modelis implemented the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_tune_ft(self):
        return False

The function has_tune_ft denotes whether tune_model is implemented for fine tuning of non-weight parameters or not, and thus whether the functionality for fine tuning of non-weight parameters is available or not. If tune_model implements fine tuning the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_tune_lsa(self):
        return False

The function has_tune_lsa denotes whether tune_model is implemented for fine tuning of local scaling parameters or not, and thus whether the functionality for local scaling adaptation is available or not. If tune_model implements local scaling adaptation the return value shall be 'True', otherwise it shall be 'False'.

Examples

This section provides several examples on how to use the software and specific features.

Basic Features

Compressing a model loaded from a file

The model file is stored at 'example/squeezenet1_1_pytorch_zoo.pt' (Squeezenet originally downloaded from the torchvision model zoo). The compressed bitstream is written to 'example/bitstream_squeezenet1_1.nnc'. After decompressing the model the reconstructed model is stored at 'example/reconstructed_squeezenet1_1.pt'.

import nnc

nnc.compress_model('./example/squeezenet1_1_pytorch_zoo.pt', bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

Here is an analogous example for a Tensorflow-Model, stored at 'example/densenet_121_tensorflow_zoo.h5' (DenseNet121 dowloaded from the keras model zoo).

import nnc

nnc.compress_model('example/densenet_121_tensorflow_zoo.h5', bitstream_path='./example/bitstream_densenet_121.nnc')
nnc.decompress_model('./example/bitstream_densenet_121.nnc', model_path='./example/reconstructed_densenet_121.pt' )

Compressing a model from a model object

Pytorch:

import nnc
import torchvision

model = torchvision.models.squeezenet1_1(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

Tensorflow:

import nnc
from tensorflow import keras

model = keras.applications.DenseNet121()

nnc.compress_model( model, bitstream_path='./example/bitstream_densenet_121.nnc')
nnc.decompress_model('./example/bitstream_densenet_121.nnc', model_path='./example/reconstructed_densenet_121.pt' )

Changing the quantization parameter (QP)

By default the quantization parameter (QP) is set to -38. Increasing the QP yields a lower bitrate, but usually a higer perfromance degradation. Decreasing the QO yields a higher bitrate, but usually a lower performance degradation.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-38.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-38.pt')

With the default QP value -38, MobileNetV2 achieves a compression ratio of 20.1% (compressed bitstream size 2.845741 MB) and a Top-1 accuracy of 71.622% on ImageNet.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp.nnc', qp=-34)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-34.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-34.pt')

Increaing the qp value to -34 achieves a compression ratio of 16.93% (compressed bitstream size 2.395967 MB) and a Top-1 accuracy of 71.306% on ImageNet.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp.nnc', qp=-30)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-30.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-30.pt')

Increaing the qp value to -30 achieves a compression ratio of 13.86% (compressed bitstream size 1.962087 MB) and a Top-1 accuracy of 69.432% on ImageNet.

Dependent scalar quatization (DQ) and uniform quantization

By default a vector quatization scheme called dependent scalar quatization (DQ) is applied, which usually achieves lower bitrates at a certain performance. However, there might be cases where DQ is not suitable and does not achieve a good performance. Hence DQ can be deacivated and a (simple) uniform quantizer is used instead. In order to achieve a similar performance the QP must be adjusted. For details refer to the parameter 'use_dq' at Functions and Parameters.

The following examples show how to use model compression with (enabled by default) and without DQ, respectively:

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_qp-38.nnc', model_path='./example/reconstructed_resnet50_qp-38.pt')

With the default QP value -38 and DQ enabled, ResNet50 achieves a compression ratio of 13.84% (compressed bitstream size 14.173388 MB) and a Top-1 accuracy of 75.96% on ImageNet.

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_noDQ_qp-38.nnc', qp=-35, use_dq=False)
nnc.decompress_model('./example/bitstream_resnet50_noDQ_qp-38.nnc', model_path='./example/reconstructed_resnet50_noDQ_qp-38.pt')

With DQ disabled, ResNet50 achieves a compression ratio of 14.48% (compressed bitstream size 14.832298 MB) and a Top-1 accuracy of 75.952% on ImageNet. Note: The qp values has been increase to -35 in order to achieve a comparable bitstream size and quantization error.

Similar results can be obtained using ResNet50 from keras:

import tensorflow
from tensorflow import keras

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_keras_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_qp-38.pt')

Here, with the default QP value -38 and DQ enabled, ResNet50 achieves a compression ratio of 14.13% (compressed bitstream size 14.488576 MB) and a Top-1 accuracy of 74.814% on ImageNet.

import tensorflow
from tensorflow import keras

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_noDQ_qp-38.nnc', qp=-35, use_dq=False)
nnc.decompress_model('./example/bitstream_resnet50_keras_noDQ_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_noDQ_qp-38.pt')

With DQ disabled, ResNet50 achieves a compression ratio of 14.79% (compressed bitstream size 15.169963 MB) and a Top-1 accuracy of 74.806% on ImageNet.

Batch-norm folding (BNF)

Batch-norm folding requires the tensors to be shaped such that the first dimension specifies the number of output channels. This is usually the case for PyTorch models but not for TensorFlow. Consequently, the presented examples refer to PyTorch. However, BNF can be applied to correctly shaped TensorFlow models in the same fashion.

Example for MobileNetV2:

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

block_id_and_param_type = nnc.compress_model( model, bitstream_path="./example/mobilenet_v2_pytorch_bnf.nnc", qp=-38, bnf=True, return_model_data=True )
nnc.decompress_model( "./example/mobilenet_v2_pytorch_bnf.nnc", model_path="./example/rec_mobilenet_v2_pytorch_bnf.pt", block_id_and_param_type=block_id_and_param_type)

MobileNetV2 with batch-norm folding enabled and the default QP values et to -38 achieves a compression ratio of 19.84% (compressed bitstream size 2.809033 MB) and a Top-1 accuracy of 71.604% on ImageNet.

Example for ResNet50:

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

block_id_and_param_type = nnc.compress_model( model, bitstream_path="./example/resnet50_pytorch_bnf.nnc", qp=-38, bnf=True, return_model_data=True )
nnc.decompress_model( "./example/resnet50_pytorch_bnf.nnc", model_path="./example/rec_resnet50_pytorch_bnf.pt", block_id_and_param_type=block_id_and_param_type ) 

ResNet50 with batch-norm folding enabled and the default QP values et to -38 achieves a compression ratio of 13.76% (compressed bitstream size 14.098468 MB) and a Top-1 accuracy of 75.962% on ImageNet.

In both cases the software internally derives block_id_and_param_type, which can then be provided to the decoder.

Testing the model by inference on ImageNet

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_test_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_test_qp-38.nnc', model_path='./example/reconstructed_resnet50_test_qp-38.pt', dataset_path=dataset_path, model_struct=model, test_model=True)
import tensorflow
from tensorflow import keras

dataset_path = "/path/to/ImageNet"

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_test_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_keras_test_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_test_qp-38.pt', dataset_path=dataset_path, model_struct=model, test_model=True, model_name="ResNet50")

Local scaling adaptation (LSA)

Fine-tuning (FT)

Inference-optimized quantization (IOQ)

Clone this wiki locally