MMDetection allows making compression of the models by NNCF (Neural Network Compression Framework).
The work of MMDetection is validated with NNCF 2.0.0. Please, use this version of NNCF in case of any issues.
NNCF supports multiple compression algorithms but at the moment the following algorithms can be used in MMDetection:
To make compression of a model, NNCF gets a pre-trained model and wraps the whole PyTorch model and PyTorch classes used by the model (e.g. Conv2d) by its own classes. After that to make compression a training (fine-tuning) of the model should be started -- typically, it should be done by the same code as the original model was trained. During the fine-tuning the wrapped classes make additional operations during each training step (e.g. in case of int8 quantization each result of convolution layers will be quantized, for filter pruning special technique will be applied to reduce number of filters in each convolution, etc).
The result of such fine-tuning is a compressed model that may be exported to OpenVINO™.
Please, note that MMDetection does not require NNCF framework to be installed for usual training (without compression).
If you want to compress models using NNCF, you can install NNCF together with
all the packages required for NNCF by the command
pip install -r requirements/nncf_compression.txt
.
Integration of MMDetection with NNCF framework is made in a transparent way:
-
If NNCF parameters are not set in the config file, the NNCF framework is not used and the MMDetection will work "in a normal way", no matter if NNCF is installed or is not.
-
If the config file of a model contains a parameter
nncf_config
, and the parameter is a non-empty dict, NNCF will be used for the model compression:
If NNCF is not installed, a corresponding exception will be raised.
If NNCF is installed, the dict that is the value of the parameternncf_config
will be passed to the NNCF framework as its config without changes.
Example of NNCF parameter nncf_config
that may be used for int8 quantization of ssd300_coco
:
nncf_config = dict(
input_info=dict(
"sample_size": [1, 3, 1000, 600]
),
compression=dict(
algorithm='quantization',
initializer=dict(
range=dict(
num_init_steps=10
),
batchnorm_adaptation=dict(
num_bn_adaptation_steps=30,
)
)
),
log_dir=work_dir
)
See details on NNCF config parameters in documentation on NNCF config files.
Also you can see parameters of different NNCF compression algorithms in the documentation on NNCF algorithms.
You can see the examples of the config files in the folder configs/nncf_compression.
Typically, the pipeline of compression with NNCF looks as follows:
-
Get a usual (i.e. uncompressed) model.
To receive the model it may be trained without compression or a pre-trained model may be downloaded. -
The model is fine-tuned with compression (the parameter
nncf_config
is set).
The result of this step is a checkpoint with the compressed model -
(optional) The compression process may be resumed: model may be fine-tuned with compression a bit more (with the same compression parameters
nncf_config
in the config file).
The result of this step is a checkpoint with fine-tuned compressed model -
The compressed model may be tested (with the same compression parameters
nncf_config
in the config file).
The result of this step is quality metrics of the compressed model -
The compressed model may be exported to ONNX/OpenVINO™ (with the same compression parameters
nncf_config
in the config file).
The result of this step is a ONNX/OpenVINO™ compressed model -
The model exported to ONNX/OpenVINO™ may be tested.
The result of this step is quality metrics of the exported model
To load an uncompressed model for compression you can use (as usual) the config parameter
load_from
to pass a pre-trained checkpoint; since the checkpoint was not
trained with compression, NNCF will initialize the compression inner structures for the model
and will start training with compression.
To load a compressed model's checkpoint to resume compression you can use the same config
parameter load_from
and the config parameter resume_from
: if the checkpoint is received as
a result of training with compression, the NNCF will not re-initialize the compression inner
structures and the fine tuning will be made.
Note that after NNCF compression is applied to a model, the model's checkpoints should be
loaded with a config file with the same NNCF parameters: the value of the dict nncf_config
in
the config file should not be changed at all or should be changed carefully, since after some
changes the compressed model won't be loaded.
At the moment also configuration parameter nncf_compress_postprocessing
may be set.
This parameter is used to choose if we should try to make NNCF compression
for a whole model graph including postprocessing (nncf_compress_postprocessing=True
),
or make NNCF compression of the part of the model without postprocessing
(nncf_compress_postprocessing=False
).
Our primary goal is to make NNCF compression of such big part of the model as
possible, so nncf_compress_postprocessing=True
is our primary choice, whereas
nncf_compress_postprocessing=False
is our fallback decision.
When we manage to enable NNCF compression for sufficiently many models, we will keep one choice only.
The code connecting NNCF framework with MMDetection is placed in the folder mmdet/integration/nncf/.