Skip to content

Commit

Permalink
Added PyTorch FFNet model, added INT4 to several models
Browse files Browse the repository at this point in the history
Added the following new model: PyTorch FFNet
Added INT4 quantization support to the following models:
- Pytorch Classification (regnet_x_3_2gf, resnet18, resnet50)
- PyTorch HRNet Posenet
- PyTorch HRNet
- PyTorch EfficientNet Lite0
- PyTorch DeeplabV3-MobileNetV2

Signed-off-by: Bharath Ramaswamy <quic_bharathr@quicinc.com>
  • Loading branch information
quic-bharathr committed Nov 16, 2022
1 parent b7a21a0 commit b5869cb
Show file tree
Hide file tree
Showing 15 changed files with 550 additions and 401 deletions.
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ An original FP32 source model is quantized either using post-training quantizati
<td>RetinaNet</td>
<td><a href="https://github.com/fizyr/keras-retinanet">GitHub Repo</a></td>
<td><a href="https://github.com/fizyr/keras-retinanet/releases/download/0.5.1/resnet50_coco_best_v2.1.0.h5">Pretrained Model</a></td>
<td><a href="zoo_tensorflow/examples/retinanet_quanteval.py">See Example</a></td>
<td><a href="zoo_tensorflow/examples/retinanet/retinanet_quanteval.py">See Example</a></td>
<td> (COCO) mAP <br> FP32: 0.35 <br> INT8: 0.349 <br><a href="#retinanet"> Detailed Results</a></td>
<td><a href="zoo_tensorflow/Docs/RetinaNet.md">RetinaNet.md</a></td>
<td>1.15</td>
Expand Down Expand Up @@ -250,46 +250,46 @@ An original FP32 source model is quantized either using post-training quantizati
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 69.75%<br> INT8: 69.54%<br></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 69.75%<br> INT8: 69.54%<br> INT4: 69.1% <br></td>
<td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
</tr>
<tr>
<td>Resnet50</td>
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 76.14%<br> INT8: 75.81%<br></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 76.14%<br> INT8: 75.81%<br> INT4: 75.63% <br></td>
<td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
</tr>
<tr>
<td>Regnet_x_3_2gf</td>
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
<td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 78.36%<br> INT8: 78.10%<br></td>
<td>(ImageNet) Top-1 Accuracy <br>FP32: 78.36%<br> INT8: 78.10%<br> INT4: 77.70% <br></td>
<td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
</tr>
<tr>
<td>EfficientNet-lite0</td>
<td><a href="https://github.com/rwightman/gen-efficientnet-pytorch">GitHub Repo</a></td>
<td><a href="https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_lite0_ra-37913777.pth">Pretrained Model</a></td>
<td><a href="/../../releases/download/pt-effnet-checkpoint/adaround_efficient_lite.pth">Quantized Model</a></td>
<td>(ImageNet) Top-1 Accuracy <br> FP32: 75.40%<br> INT8: 75.36%</td>
<td><a href="/../../releases/tag/pt-effnet-checkpoint">Quantized Model</a></td>
<td>(ImageNet) Top-1 Accuracy <br> FP32: 75.40%<br> INT8: 75.36%<br> INT4: 74.46%</td>
<td><a href="zoo_torch/Docs/EfficientNet-lite0.md">EfficientNet-lite0.md</a></td>
</tr>
<tr>
<td>DeepLabV3+</td>
<td><a href="https://github.com/jfzhang95/pytorch-deeplab-xception">GitHub Repo</a></td>
<td><a href="https://drive.google.com/file/d/1G9mWafUAj09P4KvGSRVzIsV_U5OqFLdt/view">Pretrained Model</a></td>
<td><a href="/../../releases/download/torch_dlv3_w8a8_pc/deeplabv3+w8a8_tfe_perchannel.pth">Quantized Model</a></td>
<td>(PascalVOC) mIOU <br>FP32: 72.91%<br> INT8: 72.44%</td>
<td><a href="/../../releases/tag/torch_dlv3_w8a8_pc">Quantized Model</a></td>
<td>(PascalVOC) mIOU <br>FP32: 72.91%<br> INT8: 72.44%<br> INT4: 72.18%</td>
<td><a href="zoo_torch/Docs/DeepLabV3.md">DeepLabV3.md</a></td>
</tr>
<tr>
<td>MobileNetV2-SSD-Lite</td>
<td><a href="https://github.com/qfgaohao/pytorch-ssd">GitHub Repo</a></td>
<td><a href="https://storage.googleapis.com/models-hao/mb2-ssd-lite-mp-0_686.pth">Pretrained Model</a></td>
<td><a href="/../../releases/download/MV2SSD-Lite-Torch/adaround_mv2ssd_model_new.tar.gz">Quantized Model</a></td>
<td><a href="/../../releases/tag/MV2SSD-Lite-Torch">Quantized Model</a></td>
<td>(PascalVOC) mAP<br> FP32: 68.7%<br> INT8: 68.6%</td>
<td><a href="zoo_torch/Docs/MobileNetV2-SSD-lite.md">MobileNetV2-SSD-lite.md</a></td>
</tr>
Expand All @@ -306,7 +306,7 @@ An original FP32 source model is quantized either using post-training quantizati
<td><a href="https://github.com/leoxiaobin/deep-high-resolution-net.pytorch">Based on Ref.</a></td>
<td><a href="/../../releases/tag/hrnet-posenet">FP32 Model</a></td>
<td><a href="/../../releases/tag/hrnet-posenet">Quantized Model</a></td>
<td>(COCO) mAP<br>FP32: 0.765 <br>INT8: 0.763 <br> mAR <br> FP32: 0.793<br> INT8: 0.792</td>
<td>(COCO) mAP<br>FP32: 0.765 <br>INT8: 0.763 <br> INT4: 0.762 <br> mAR <br> FP32: 0.793<br> INT8: 0.792 <br> INT4: 0.791 </td>
<td><a href="zoo_torch/Docs/Hrnet-posenet.md">Hrnet-posenet.md</a></td>
</tr>
<td>SRGAN</td>
Expand All @@ -320,8 +320,8 @@ An original FP32 source model is quantized either using post-training quantizati
<td>DeepSpeech2</td>
<td><a href="https://github.com/SeanNaren/deepspeech.pytorch">GitHub Repo</a></td>
<td><a href="https://github.com/SeanNaren/deepspeech.pytorch/releases/download/v2.0/librispeech_pretrained_v2.pth">Pretrained Model</a></td>
<td><a href="zoo_torch/examples/deepspeech2_quanteval.py">See Example</a></td>
<td>(Librispeech Test Clean) WER <br> FP32<br> 9.92%<br> INT8: 10.22%</td>
<td><a href="zoo_torch/examples/deepspeech2/deepspeech2_quanteval.py">See Example</a></td>
<td>(Librispeech Test Clean) WER <br> FP32: 9.92%<br> INT8: 10.22%</td>
<td><a href="zoo_torch/Docs/DeepSpeech2.md">DeepSpeech2.md</a></td>
</tr>
<tr>
Expand Down Expand Up @@ -353,25 +353,33 @@ An original FP32 source model is quantized either using post-training quantizati
<td><a href="https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/pytorch-v1.1">GitHub Repo</a></td>
<td> Original model weight not available </a></td>
<td><a href="zoo_torch/examples/hrnet-w48/hrnet-w48_quanteval.py">See Example</a></td>
<td>(Cityscapes) mIOU <br> FP32<br> 81.04%<br> INT8: 80.78%</td>
<td>(Cityscapes) mIOU <br> FP32: 81.04%<br> INT8: 80.65%<br> INT4: 80.07%</td>
<td><a href="zoo_torch/Docs/HRNet-w48.md">HRNet-w48.md</a></td>
</tr>
<tr>
<td>InverseForm (HRNet-16-Slim-IF)</td>
<td><a href="https://github.com/Qualcomm-AI-research/InverseForm">GitHub Repo</a></td>
<td><a href="https://github.com/Qualcomm-AI-research/InverseForm/releases/download/v1.0/hr16s_4k_slim.pth">Pretrained Model</a></td>
<td><a href="zoo_torch/examples/inverseform/inverseform_quanteval.py">See Example</a></td>
<td>(Cityscapes) mIOU <br> FP32<br> 77.81%<br> INT8: 77.17%</td>
<td>(Cityscapes) mIOU <br> FP32: 77.81%<br> INT8: 77.17%</td>
<td><a href="zoo_torch/Docs/InverseForm.md">InverseForm.md</a></td>
</tr>
<tr>
<td>InverseForm (OCRNet-48)</td>
<td><a href="https://github.com/Qualcomm-AI-research/InverseForm">GitHub Repo</a></td>
<td><a href="https://github.com/Qualcomm-AI-research/InverseForm/releases/download/v1.0/hrnet48_OCR_IF_checkpoint.pth">Pretrained Model</a></td>
<td><a href="zoo_torch/examples/inverseform/inverseform_quanteval.py">See Example</a></td>
<td>(Cityscapes) mIOU <br> FP32<br> 86.31%<br> INT8: 86.21%</td>
<td>(Cityscapes) mIOU <br> FP32: 86.31%<br> INT8: 86.21%</td>
<td><a href="zoo_torch/Docs/InverseForm.md">InverseForm.md</a></td>
</tr>
<tr>
<td>FFNets</td>
<td><a href="https://github.com/Qualcomm-AI-research/FFNet"> Github Repo</a></td>
<td><a href="/../../releases/tag/torch_segmentation_ffnet">Prepared Models (5 in total)</a></td>
<td><a href="zoo_torch/examples/ffnet/ffnet_quanteval.py">See Example</a></td>
<td>(Cityscapes) mIOU <br> segmentation_ffnet78S_dBBB_mobile<br> FP32: 81.3% INT8: 80.7%<br> segmentation_ffnet54S_dBBB_mobile<br> FP32: 80.8% INT8: 80.1%<br> segmentation_ffnet40S_dBBB_mobile<br> FP32: 79.2% INT8: 78.9%<br> segmentation_ffnet78S_BCC_mobile_pre_down<br> FP32: 80.6% INT8: 80.4%<br> segmentation_ffnet122NS_CCC_mobile_pre_down<br> FP32: 79.3% INT8: 79.0%</td>
<td><a href="zoo_torch/Docs/FFNet.md">FFNet.md</a></td>
</tr>
</table>

*<sup>[1]</sup>* Original FP32 model source
Expand Down Expand Up @@ -479,7 +487,7 @@ All results below used a *Scaling factor (LR-to-HR upscaling) of 2x* and the *Se
### Install AIMET
Before you can run the example script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this [Getting Started](https://github.com/quic/aimet#getting-started) page for an overview. Then install AIMET and its dependencies using these [Installation instructions](https://github.com/quic/aimet/blob/develop/packaging/install.md).

> **NOTE:** To obtain the exact version of AIMET software that was used to test this model zoo, please install release [1.13.0](https://github.com/quic/aimet/releases/tag/1.13.0) when following the above instructions *except where specified otherwise within the individual model documentation markdown file*.
> **NOTE:** To obtain the exact version of AIMET software that was used to test this model zoo, please install release [1.22.2](https://github.com/quic/aimet/releases/tag/1.22.2) when following the above instructions *except where specified otherwise within the individual model documentation markdown file*.
### Running the scripts
Download the necessary datasets and code required to run the example for the model of interest. The examples run quantized evaluation and if necessary apply AIMET techniques to improve quantized model performance. They generate the final accuracy results noted in the table above. Refer to the Docs for [TensorFlow](zoo_tensorflow/Docs) or [PyTorch](zoo_torch/Docs) folder to access the documentation and procedures for a specific model.
Expand Down
2 changes: 1 addition & 1 deletion zoo_tensorflow/Docs/SRGAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ pip install tensorflow-gpu==2.4.0

## Model Weights
- The original SRGAN model is available at:
- [krasserm](https://github.com/krasserm/super-resolution")
- [krasserm](https://github.com/krasserm/super-resolution)

## Usage
```bash
Expand Down
60 changes: 44 additions & 16 deletions zoo_torch/Docs/Classification.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,51 @@
# PyTorch Classification models
This document describes evaluation of optimized checkpoints for Resnet18, Resnet50 and Regnet_x_3_2gf.

## AIMET installation and setup
Please [install and setup AIMET](https://github.com/quic/aimet/blob/release-aimet-1.21/packaging/install.md) (*Torch GPU* variant) before proceeding further.

**NOTE**
- All AIMET releases are available here: https://github.com/quic/aimet/releases
- This model has been tested using AIMET version *1.21.0* (i.e. set `release_tag="1.21.0"` in the above instructions).
- This model is compatible with the PyTorch GPU variant of AIMET (i.e. set `AIMET_VARIANT="torch_gpu"` in the above instructions).
## Setup AI Model Efficiency Toolkit (AIMET)
Please [install and setup AIMET](https://github.com/quic/aimet/blob/release-aimet-1.22/packaging/install.md) before proceeding further.
This model was tested with the `torch_gpu` variant of AIMET 1.22.2.

## Additional Setup Dependencies
```
sudo -H pip install torchvision==0.11.2 --no-deps
sudo -H chmod 777 -R <path_to_python_package>/dist-packages/*
```

## Obtaining model checkpoint, ImageNet validation dataset and calibration dataset
- [Pytorch Torchvision hub](https://pytorch.org/vision/0.11/models.html#classification) instances of Resnet18, Resnet50 and Regnet_x_3_2gf are used as refernce FP32 models. These instances are optimized using AIMET to obtain quantized optimized checkpoints.
- Optimized Resnet18, Resnet50 and Regnet_x_3_2gf checkpoint can be downloaded from the [Releases](/../../releases) page.
- ImageNet can be downloaded from here:
- http://www.image-net.org/
- Use standard validation set of ImageNet dataset (50k images set) for evaluting performance of FP32 and quantized models.
## Obtain the Original Model for Comparison
- [Pytorch Torchvision hub](https://pytorch.org/vision/0.11/models.html#classification) instances of Resnet18, Resnet50 and Regnet_x_3_2gf are used as reference FP32 models. These instances are optimized using AIMET to obtain quantized optimized checkpoints.

## Experiment setup
```python
export PYTHONPATH=$PYTHONPATH:<path to parent>/aimet-model-zoo
```

For the quantization task, we require the model path, evaluation dataset path and calibration dataset path - which is a subset of validation dataset to be used for computing the encodings and AdaRound optimizaiton.
## Dataset
This evaluation was designed for the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012), which can be obtained from: http://www.image-net.org/
The dataset directory is expected to have 3 subdirectories: train, valid, and test (only the valid test is used, hence if the other subdirectories are missing that is ok).
Each of the {train, valid, test} directories is then expected to have 1000 subdirectories, each containing the images from the 1000 classes present in the ILSVRC2012 dataset, such as in the example below:

```
train/
├── n01440764
│ ├── n01440764_10026.JPEG
│ ├── n01440764_10027.JPEG
│ ├── ......
├── ......
val/
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ ├── ......
├── ......
```

## Usage
- To run evaluation with QuantSim in AIMET, use the following
To run evaluation with QuantSim in AIMET, use the following
```bash
cd classification
python classification_quanteval.py\
--fp32-model <name of the fp32 torchvision model - resnet18/resnet50/regnet_x_3_2gf> \
--default-param-bw <weight bitwidth for quantization - 8 for INT8> \
--default-param-bw <weight bitwidth for quantization - 8 for INT8, 4 for INT4> \
--default-output-bw <output bitwidth for quantization - 8 for INT8> \
--use-cuda <boolean for using cuda> \
--evaluation-dataset <path to Imagenet validation dataset>
Expand All @@ -40,6 +55,8 @@ python classification_quanteval.py --fp32-model=resnet18 --default-weight-bw=8 -
```

## Quantization Configuration
INT8 optimization

The following configuration has been used for the above models for INT8 quantization:
- Weight quantization: 8 bits, symmetric quantization
- Bias parameters are not quantized
Expand All @@ -48,3 +65,14 @@ The following configuration has been used for the above models for INT8 quantiza
- 2000 images from the calibration dataset were used for computing encodings
- TF_enhanced was used as quantization scheme
- Cross layer equalization and Adaround in per channel mode has been applied for all the models to get the best INT8 optimized checkpoint

INT4 optimization

The following configuration has been used for the above models for INT4 quantization:
- Weight quantization: 4 bits, symmetric quantization
- Bias parameters are not quantized
- Activation quantization: 8 bits, asymmetric quantization
- Model inputs are quantized
- 2000 images from the calibration dataset were used for computing encodings
- TF_enhanced was used as quantization scheme
- Cross layer equalization and Adaround in per channel mode has been applied for all the models to get the best INT4 optimized checkpoint
15 changes: 14 additions & 1 deletion zoo_torch/Docs/DeepLabV3.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,24 @@ python deeplabv3_quanteval.py \
--batch-size <Number of images per batch, default 4>
```

## Quantization Configuration (INT8)
## Quantization Configuration
INT8 optimization
The following configuration has been used for the above model for INT8 quantization
- Weight quantization: 8 bits, per tensor symmetric quantization
- Bias parameters are not quantized
- Activation quantization: 8 bits, asymmetric quantization
- Model inputs are quantized
- TF-Enhanced was used as quantization scheme
- Cross layer equalization and Adaround has been applied on optimized checkpoint
- Data Free Quantization has been performed on the optimized checkpoint

INT4 optimization
The following configuration has been used for the above model for W4A8 quantization
- Weight quantization: 4 bits, per channel symmetric quantization
- Bias parameters are not quantized
- Activation quantization: 8 bits, asymmetric quantization
- Model inputs are quantized
- TF-Enhanced was used as quantization scheme
- Cross layer equalization and Adaround has been applied on optimized checkpoint
- Data Free Quantization has been performed on the optimized checkpoint
- Quantization Aware Traning has been performed on the optimized checkpoint
5 changes: 3 additions & 2 deletions zoo_torch/Docs/EfficientNet-lite0.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,20 +43,21 @@ Each of the {train, valid, test} directories is then expected to have 1000 subdi
To run evaluation with QuantSim in AIMET, use the following
```bash
python3 efficientnetlite0_quanteval.py \
--default-param-bw <weight bitwidth for quantization - 8 for INT8, 4 for INT4> \
--dataset-path < path to validation dataset> \
--batch-size <batch size as an integer value> \
--use-cuda <use GPU or CPU>

```

## Quantization Configuration
- Weight quantization: 8 bits per channel symmetric quantization
- Weight quantization: 8 or 4 bits per channel symmetric quantization
- Bias parameters are not quantized
- Activation quantization: 8 bits, asymmetric quantization
- Model inputs are quantized
- TF_enhanced was used for weight quantization scheme
- TF was used for activation quantization scheme
- Batch norm folding and Adaround have been applied on optimized efficientnet-lite checkpoint
- [Conv - Relu6] layers has been fused as one operation via manual configurations
- 2K Images from ImageNet validation dataset (2 images per class) are used as calibration dataset
- 4K Images from ImageNet training dataset (4 images per class) are used as calibration dataset
- Standard ImageNet validation dataset are usef as evaluation dataset
Loading

0 comments on commit b5869cb

Please sign in to comment.