Added PyTorch FFNet model, added INT4 to several models

Added the following new model: PyTorch FFNet Added INT4 quantization support to the following models: - Pytorch Classification (regnet_x_3_2gf, resnet18, resnet50) - PyTorch HRNet Posenet - PyTorch HRNet - PyTorch EfficientNet Lite0 - PyTorch DeeplabV3-MobileNetV2 Signed-off-by: Bharath Ramaswamy <quic_bharathr@quicinc.com>
quic · Nov 16, 2022 · b5869cb · b5869cb
1 parent b7a21a0
commit b5869cb
Show file tree

Hide file tree

Showing 15 changed files with 550 additions and 401 deletions.
diff --git a/README.md b/README.md
@@ -73,7 +73,7 @@ An original FP32 source model is quantized either using post-training quantizati
     <td>RetinaNet</td>
     <td><a href="https://github.com/fizyr/keras-retinanet">GitHub Repo</a></td>
     <td><a href="https://github.com/fizyr/keras-retinanet/releases/download/0.5.1/resnet50_coco_best_v2.1.0.h5">Pretrained Model</a></td>
-    <td><a href="zoo_tensorflow/examples/retinanet_quanteval.py">See Example</a></td>
+    <td><a href="zoo_tensorflow/examples/retinanet/retinanet_quanteval.py">See Example</a></td>
     <td> (COCO) mAP <br> FP32: 0.35 <br> INT8: 0.349 <br><a href="#retinanet"> Detailed Results</a></td>
     <td><a href="zoo_tensorflow/Docs/RetinaNet.md">RetinaNet.md</a></td>
     <td>1.15</td>
@@ -250,46 +250,46 @@ An original FP32 source model is quantized either using post-training quantizati
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
-    <td>(ImageNet) Top-1 Accuracy <br>FP32: 69.75%<br> INT8: 69.54%<br></td>
+    <td>(ImageNet) Top-1 Accuracy <br>FP32: 69.75%<br> INT8: 69.54%<br> INT4: 69.1% <br></td>
     <td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
   </tr>
   <tr>
     <td>Resnet50</td>
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
-    <td>(ImageNet) Top-1 Accuracy <br>FP32: 76.14%<br> INT8: 75.81%<br></td>
+    <td>(ImageNet) Top-1 Accuracy <br>FP32: 76.14%<br> INT8: 75.81%<br> INT4: 75.63% <br></td>
     <td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
   </tr>
   <tr>
     <td>Regnet_x_3_2gf</td>
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="https://pytorch.org/vision/0.11/models.html#classification">Pytorch Torchvision </a></td>
     <td><a href="/../../releases/tag/torchvision_classification_INT4%2F8">Quantized Model</a></td>
-    <td>(ImageNet) Top-1 Accuracy <br>FP32: 78.36%<br> INT8: 78.10%<br></td>
+    <td>(ImageNet) Top-1 Accuracy <br>FP32: 78.36%<br> INT8: 78.10%<br> INT4: 77.70% <br></td>
     <td><a href="zoo_torch/Docs/Classification.md">Classification.md</a></td>
   </tr>
   <tr>
     <td>EfficientNet-lite0</td>
     <td><a href="https://github.com/rwightman/gen-efficientnet-pytorch">GitHub Repo</a></td>
     <td><a href="https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_lite0_ra-37913777.pth">Pretrained Model</a></td>
-    <td><a href="/../../releases/download/pt-effnet-checkpoint/adaround_efficient_lite.pth">Quantized Model</a></td>
-    <td>(ImageNet) Top-1 Accuracy <br> FP32: 75.40%<br> INT8: 75.36%</td>
+    <td><a href="/../../releases/tag/pt-effnet-checkpoint">Quantized Model</a></td>
+    <td>(ImageNet) Top-1 Accuracy <br> FP32: 75.40%<br> INT8: 75.36%<br> INT4: 74.46%</td>
     <td><a href="zoo_torch/Docs/EfficientNet-lite0.md">EfficientNet-lite0.md</a></td>
   </tr>
   <tr>
     <td>DeepLabV3+</td>
     <td><a href="https://github.com/jfzhang95/pytorch-deeplab-xception">GitHub Repo</a></td>
     <td><a href="https://drive.google.com/file/d/1G9mWafUAj09P4KvGSRVzIsV_U5OqFLdt/view">Pretrained Model</a></td>
-    <td><a href="/../../releases/download/torch_dlv3_w8a8_pc/deeplabv3+w8a8_tfe_perchannel.pth">Quantized Model</a></td>
-    <td>(PascalVOC) mIOU <br>FP32: 72.91%<br> INT8: 72.44%</td>
+    <td><a href="/../../releases/tag/torch_dlv3_w8a8_pc">Quantized Model</a></td>
+    <td>(PascalVOC) mIOU <br>FP32: 72.91%<br> INT8: 72.44%<br> INT4: 72.18%</td>
     <td><a href="zoo_torch/Docs/DeepLabV3.md">DeepLabV3.md</a></td>
   </tr>
   <tr>
     <td>MobileNetV2-SSD-Lite</td>
     <td><a href="https://github.com/qfgaohao/pytorch-ssd">GitHub Repo</a></td>
     <td><a href="https://storage.googleapis.com/models-hao/mb2-ssd-lite-mp-0_686.pth">Pretrained Model</a></td>
-    <td><a href="/../../releases/download/MV2SSD-Lite-Torch/adaround_mv2ssd_model_new.tar.gz">Quantized Model</a></td>
+    <td><a href="/../../releases/tag/MV2SSD-Lite-Torch">Quantized Model</a></td>
     <td>(PascalVOC) mAP<br> FP32: 68.7%<br> INT8: 68.6%</td>
     <td><a href="zoo_torch/Docs/MobileNetV2-SSD-lite.md">MobileNetV2-SSD-lite.md</a></td>
   </tr>
@@ -306,7 +306,7 @@ An original FP32 source model is quantized either using post-training quantizati
     <td><a href="https://github.com/leoxiaobin/deep-high-resolution-net.pytorch">Based on Ref.</a></td>
     <td><a href="/../../releases/tag/hrnet-posenet">FP32 Model</a></td>
     <td><a href="/../../releases/tag/hrnet-posenet">Quantized Model</a></td>
-    <td>(COCO) mAP<br>FP32: 0.765 <br>INT8: 0.763 <br> mAR <br> FP32: 0.793<br> INT8: 0.792</td>
+    <td>(COCO) mAP<br>FP32: 0.765 <br>INT8: 0.763 <br> INT4: 0.762  <br> mAR <br> FP32: 0.793<br> INT8: 0.792 <br> INT4: 0.791 </td>
     <td><a href="zoo_torch/Docs/Hrnet-posenet.md">Hrnet-posenet.md</a></td>
   </tr>
     <td>SRGAN</td>
@@ -320,8 +320,8 @@ An original FP32 source model is quantized either using post-training quantizati
     <td>DeepSpeech2</td>
     <td><a href="https://github.com/SeanNaren/deepspeech.pytorch">GitHub Repo</a></td>
     <td><a href="https://github.com/SeanNaren/deepspeech.pytorch/releases/download/v2.0/librispeech_pretrained_v2.pth">Pretrained Model</a></td>
-    <td><a href="zoo_torch/examples/deepspeech2_quanteval.py">See Example</a></td>
-    <td>(Librispeech Test Clean) WER <br> FP32<br> 9.92%<br> INT8: 10.22%</td>
+    <td><a href="zoo_torch/examples/deepspeech2/deepspeech2_quanteval.py">See Example</a></td>
+    <td>(Librispeech Test Clean) WER <br> FP32: 9.92%<br> INT8: 10.22%</td>
     <td><a href="zoo_torch/Docs/DeepSpeech2.md">DeepSpeech2.md</a></td>
   </tr>
   <tr>
@@ -353,25 +353,33 @@ An original FP32 source model is quantized either using post-training quantizati
     <td><a href="https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/pytorch-v1.1">GitHub Repo</a></td>
     <td> Original model weight not available </a></td>
     <td><a href="zoo_torch/examples/hrnet-w48/hrnet-w48_quanteval.py">See Example</a></td>
-    <td>(Cityscapes) mIOU <br> FP32<br> 81.04%<br> INT8: 80.78%</td>
+    <td>(Cityscapes) mIOU <br> FP32: 81.04%<br> INT8: 80.65%<br> INT4: 80.07%</td>
     <td><a href="zoo_torch/Docs/HRNet-w48.md">HRNet-w48.md</a></td>
   </tr>
   <tr>
     <td>InverseForm (HRNet-16-Slim-IF)</td>
     <td><a href="https://github.com/Qualcomm-AI-research/InverseForm">GitHub Repo</a></td>
     <td><a href="https://github.com/Qualcomm-AI-research/InverseForm/releases/download/v1.0/hr16s_4k_slim.pth">Pretrained Model</a></td>
     <td><a href="zoo_torch/examples/inverseform/inverseform_quanteval.py">See Example</a></td>
-    <td>(Cityscapes) mIOU <br> FP32<br> 77.81%<br> INT8: 77.17%</td>
+    <td>(Cityscapes) mIOU <br> FP32: 77.81%<br> INT8: 77.17%</td>
     <td><a href="zoo_torch/Docs/InverseForm.md">InverseForm.md</a></td>
   </tr>
   <tr>
     <td>InverseForm (OCRNet-48)</td>
     <td><a href="https://github.com/Qualcomm-AI-research/InverseForm">GitHub Repo</a></td>
     <td><a href="https://github.com/Qualcomm-AI-research/InverseForm/releases/download/v1.0/hrnet48_OCR_IF_checkpoint.pth">Pretrained Model</a></td>
     <td><a href="zoo_torch/examples/inverseform/inverseform_quanteval.py">See Example</a></td>
-    <td>(Cityscapes) mIOU <br> FP32<br> 86.31%<br> INT8: 86.21%</td>
+    <td>(Cityscapes) mIOU <br> FP32: 86.31%<br> INT8: 86.21%</td>
     <td><a href="zoo_torch/Docs/InverseForm.md">InverseForm.md</a></td>
   </tr>
+  <tr>
+    <td>FFNets</td>
+    <td><a href="https://github.com/Qualcomm-AI-research/FFNet"> Github Repo</a></td>
+    <td><a href="/../../releases/tag/torch_segmentation_ffnet">Prepared Models (5 in total)</a></td>
+    <td><a href="zoo_torch/examples/ffnet/ffnet_quanteval.py">See Example</a></td>
+    <td>(Cityscapes) mIOU <br> segmentation_ffnet78S_dBBB_mobile<br> FP32: 81.3%  INT8: 80.7%<br> segmentation_ffnet54S_dBBB_mobile<br> FP32: 80.8%  INT8: 80.1%<br> segmentation_ffnet40S_dBBB_mobile<br> FP32: 79.2%  INT8: 78.9%<br> segmentation_ffnet78S_BCC_mobile_pre_down<br> FP32: 80.6%  INT8: 80.4%<br> segmentation_ffnet122NS_CCC_mobile_pre_down<br> FP32: 79.3%  INT8: 79.0%</td>
+    <td><a href="zoo_torch/Docs/FFNet.md">FFNet.md</a></td>
+  </tr>
 </table>
 
 *<sup>[1]</sup>* Original FP32 model source  
@@ -479,7 +487,7 @@ All results below used a *Scaling factor (LR-to-HR upscaling) of 2x* and the *Se
 ### Install AIMET
 Before you can run the example script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this [Getting Started](https://github.com/quic/aimet#getting-started) page for an overview. Then install AIMET and its dependencies using these [Installation instructions](https://github.com/quic/aimet/blob/develop/packaging/install.md).
 
-> **NOTE:** To obtain the exact version of AIMET software that was used to test this model zoo, please install release [1.13.0](https://github.com/quic/aimet/releases/tag/1.13.0) when following the above instructions *except where specified otherwise within the individual model documentation markdown file*.
+> **NOTE:** To obtain the exact version of AIMET software that was used to test this model zoo, please install release [1.22.2](https://github.com/quic/aimet/releases/tag/1.22.2) when following the above instructions *except where specified otherwise within the individual model documentation markdown file*.
 
 ### Running the scripts
 Download the necessary datasets and code required to run the example for the model of interest. The examples run quantized evaluation and if necessary apply AIMET techniques to improve quantized model performance. They generate the final accuracy results noted in the table above. Refer to the Docs for [TensorFlow](zoo_tensorflow/Docs) or [PyTorch](zoo_torch/Docs) folder to access the documentation and procedures for a specific model.

diff --git a/zoo_tensorflow/Docs/SRGAN.md b/zoo_tensorflow/Docs/SRGAN.md
@@ -26,7 +26,7 @@ pip install tensorflow-gpu==2.4.0
 
 ## Model Weights
 - The original SRGAN model is available at:
-  - [krasserm](https://github.com/krasserm/super-resolution")
+  - [krasserm](https://github.com/krasserm/super-resolution)
 
 ## Usage
 ```bash

diff --git a/zoo_torch/Docs/Classification.md b/zoo_torch/Docs/Classification.md
@@ -1,36 +1,51 @@
 # PyTorch Classification models
 This document describes evaluation of optimized checkpoints for Resnet18, Resnet50 and Regnet_x_3_2gf.
 
-## AIMET installation and setup
-Please [install and setup AIMET](https://github.com/quic/aimet/blob/release-aimet-1.21/packaging/install.md) (*Torch GPU* variant) before proceeding further.
-
-**NOTE**
-- All AIMET releases are available here: https://github.com/quic/aimet/releases
-- This model has been tested using AIMET version *1.21.0*  (i.e. set `release_tag="1.21.0"` in the above instructions).
-- This model is compatible with the PyTorch GPU variant of AIMET (i.e. set `AIMET_VARIANT="torch_gpu"` in the above instructions).
+## Setup AI Model Efficiency Toolkit (AIMET)
+Please [install and setup AIMET](https://github.com/quic/aimet/blob/release-aimet-1.22/packaging/install.md) before proceeding further.
+This model was tested with the `torch_gpu` variant of AIMET 1.22.2.
 
 ## Additional Setup Dependencies
 ```
 sudo -H pip install torchvision==0.11.2 --no-deps
 sudo -H chmod 777 -R <path_to_python_package>/dist-packages/*
 ```
 
-## Obtaining model checkpoint, ImageNet validation dataset and calibration dataset
-- [Pytorch Torchvision hub](https://pytorch.org/vision/0.11/models.html#classification) instances of Resnet18, Resnet50 and Regnet_x_3_2gf are used as refernce FP32 models. These instances are optimized using AIMET to obtain quantized optimized checkpoints.
-- Optimized Resnet18, Resnet50 and Regnet_x_3_2gf checkpoint can be downloaded from the [Releases](/../../releases) page.
-- ImageNet can be downloaded from here:
-  - http://www.image-net.org/
-- Use standard validation set of ImageNet dataset (50k images set) for evaluting performance of FP32 and quantized models.
+## Obtain the Original Model for Comparison
+- [Pytorch Torchvision hub](https://pytorch.org/vision/0.11/models.html#classification) instances of Resnet18, Resnet50 and Regnet_x_3_2gf are used as reference FP32 models. These instances are optimized using AIMET to obtain quantized optimized checkpoints.
+
+## Experiment setup
+```python
+export PYTHONPATH=$PYTHONPATH:<path to parent>/aimet-model-zoo
+```
 
-For the quantization task, we require the model path, evaluation dataset path and calibration dataset path - which is a subset of validation dataset to be used for computing the encodings and AdaRound optimizaiton.
+## Dataset
+This evaluation was designed for the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012), which can be obtained from: http://www.image-net.org/  
+The dataset directory is expected to have 3 subdirectories: train, valid, and test (only the valid test is used, hence if the other subdirectories are missing that is ok).
+Each of the {train, valid, test} directories is then expected to have 1000 subdirectories, each containing the images from the 1000 classes present in the ILSVRC2012 dataset, such as in the example below:
+
+```
+  train/
+  ├── n01440764
+  │   ├── n01440764_10026.JPEG
+  │   ├── n01440764_10027.JPEG
+  │   ├── ......
+  ├── ......
+  val/
+  ├── n01440764
+  │   ├── ILSVRC2012_val_00000293.JPEG
+  │   ├── ILSVRC2012_val_00002138.JPEG
+  │   ├── ......
+  ├── ......
+```
 
 ## Usage
-- To run evaluation with QuantSim in AIMET, use the following
+To run evaluation with QuantSim in AIMET, use the following
 ```bash
 cd classification
 python classification_quanteval.py\
 	--fp32-model <name of the fp32 torchvision model - resnet18/resnet50/regnet_x_3_2gf> \
-	--default-param-bw <weight bitwidth for quantization - 8 for INT8> \
+	--default-param-bw <weight bitwidth for quantization - 8 for INT8, 4 for INT4> \
 	--default-output-bw <output bitwidth for quantization - 8 for INT8> \
 	--use-cuda <boolean for using cuda> \
 	--evaluation-dataset <path to Imagenet validation dataset>
@@ -40,6 +55,8 @@ python classification_quanteval.py --fp32-model=resnet18 --default-weight-bw=8 -
 ```
 
 ## Quantization Configuration
+INT8 optimization
+
 The following configuration has been used for the above models for INT8 quantization:
 - Weight quantization: 8 bits, symmetric quantization
 - Bias parameters are not quantized
@@ -48,3 +65,14 @@ The following configuration has been used for the above models for INT8 quantiza
 - 2000 images from the calibration dataset were used for computing encodings
 - TF_enhanced was used as quantization scheme
 - Cross layer equalization and Adaround in per channel mode has been applied for all the models to get the best INT8 optimized checkpoint
+
+INT4 optimization
+
+The following configuration has been used for the above models for INT4 quantization:
+- Weight quantization: 4 bits, symmetric quantization
+- Bias parameters are not quantized
+- Activation quantization: 8 bits, asymmetric quantization
+- Model inputs are quantized
+- 2000 images from the calibration dataset were used for computing encodings
+- TF_enhanced was used as quantization scheme
+- Cross layer equalization and Adaround in per channel mode has been applied for all the models to get the best INT4 optimized checkpoint
diff --git a/zoo_torch/Docs/DeepLabV3.md b/zoo_torch/Docs/DeepLabV3.md
@@ -47,11 +47,24 @@ python deeplabv3_quanteval.py \
 		--batch-size <Number of images per batch, default 4>
 ```
 
-## Quantization Configuration (INT8)
+## Quantization Configuration
+INT8 optimization
+The following configuration has been used for the above model for INT8 quantization
 - Weight quantization: 8 bits, per tensor symmetric quantization
 - Bias parameters are not quantized
 - Activation quantization: 8 bits, asymmetric quantization
 - Model inputs are quantized
 - TF-Enhanced was used as quantization scheme
 - Cross layer equalization and Adaround has been applied on optimized checkpoint
 - Data Free Quantization has been performed on the optimized checkpoint
+
+INT4 optimization
+The following configuration has been used for the above model for W4A8 quantization
+- Weight quantization: 4 bits, per channel symmetric quantization
+- Bias parameters are not quantized
+- Activation quantization: 8 bits, asymmetric quantization
+- Model inputs are quantized
+- TF-Enhanced was used as quantization scheme
+- Cross layer equalization and Adaround has been applied on optimized checkpoint
+- Data Free Quantization has been performed on the optimized checkpoint
+- Quantization Aware Traning has been performed on the optimized checkpoint
diff --git a/zoo_torch/Docs/EfficientNet-lite0.md b/zoo_torch/Docs/EfficientNet-lite0.md
@@ -43,20 +43,21 @@ Each of the {train, valid, test} directories is then expected to have 1000 subdi
 To run evaluation with QuantSim in AIMET, use the following
 ```bash
  python3  efficientnetlite0_quanteval.py \
+                --default-param-bw <weight bitwidth for quantization - 8 for INT8, 4 for INT4> \
                 --dataset-path < path to validation dataset> \
                 --batch-size  <batch size as an integer value> \
                 --use-cuda <use GPU or CPU> 
 
 ```
 
 ## Quantization Configuration
-- Weight quantization: 8 bits per channel symmetric quantization
+- Weight quantization: 8 or 4 bits per channel symmetric quantization
 - Bias parameters are not quantized
 - Activation quantization: 8 bits, asymmetric quantization
 - Model inputs are quantized
 - TF_enhanced was used for weight quantization scheme
 - TF was used for activation quantization scheme
 - Batch norm folding and Adaround have been applied on optimized efficientnet-lite checkpoint
 - [Conv - Relu6] layers has been fused as one operation via manual configurations
-- 2K Images from ImageNet validation dataset (2 images per class) are used as calibration dataset
+- 4K Images from ImageNet training dataset (4 images per class) are used as calibration dataset
 - Standard ImageNet validation dataset are usef as evaluation dataset