First readmes

Xilinx · Apr 4, 2023 · 16717a8 · 16717a8
1 parent 792e11d
commit 16717a8
Show file tree

Hide file tree

Showing 3 changed files with 94 additions and 42 deletions.
diff --git a/src/brevitas_examples/imagenet_classification/README.md b/src/brevitas_examples/imagenet_classification/README.md
@@ -1,45 +1,9 @@
-# Examples
+# Imagenet Examples
 
-The models provided in this folder are meant to showcase how to leverage the quantized layers provided by Brevitas,
-and by no means a direct mapping to hardware should be assumed.
+This folder contains examples about how to leverage quantized layers and quantization flows offered by Brevitas.
 
-Below in the table is a list of example pretrained models made available for reference.
+There are two main category of examples at the moment:
+- QAT (Quantization Aware Training): Examples on how to run inference on a small set of pre-trained quantized networks, obtained through QAT. For each model, the corresponding quantized model definition is also provided.
+- PTQ (Post-Training Quantization): Examples on how to use the Brevitas PTQ flows to quantize a subset of torchvision models.
 
-| Name | Cfg | Scaling Type | First layer weights | Weights | Activations | Avg pool | Top1 | Top5 | Pretrained model | Retrained from |
-|--------------|-----------------------|----------------------------|---------------------|---------|-------------|----------|-------|-------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
-| MobileNet V1 | quant_mobilenet_v1_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 71.14 | 90.10 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_mobilenet_v1_4b-r1/quant_mobilenet_v1_4b-0100a667.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
-| ProxylessNAS Mobile14 w/ Hadamard classifier | quant_proxylessnas_mobile14_hadamard_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 73.52 | 91.46 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_hadamard_4b-r0/quant_proxylessnas_mobile14_hadamard_4b-4acbfa9f.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
-| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 74.42 | 92.04 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b-r0/quant_proxylessnas_mobile14_4b-e10882e1.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
-| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b5b | Floating-point per channel | 8 bit | 4 bit, 5 bit | 4 bit, 5 bit | 4 bit | 75.01 | 92.33 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b5b-r0/quant_proxylessnas_mobile14_4b5b-2bdf7f8d.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
-
-
-To evaluate a pretrained quantized model on ImageNet:
-
- - Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format (following this [script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)).
- - Pass the name of the model as an input to the evaluation script. The required checkpoint will be downloaded automatically.
-
- For example, for *quant_mobilenet_v1_4b* evaluated on GPU 0:
-
-```
-brevitas_imagenet_val --imagenet-dir /path/to/imagenet --model quant_mobilenet_v1_4b --gpu 0 --pretrained
-```
-
-## MobileNet V1
-
-The reduced-precision implementation of MobileNet V1 makes the following assumptions:
-- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
-- Input preprocessing is modified to have a single scale factor rather than a per-channel one, so that it can be propagated through the first convolution to thresholds.
-- Weights of the first layer are always quantized to 8 bit.
-- Padding in the first convolution is removed, so that the input's mean can be propagated through the first convolution to thresholds.
-- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized. The only exception is the bias of the fully connected layer, which is quantized.
-- Scaling of the fully connected layer is per-layer, so that the output of the network doesn't require rescaling.
-- Per-channel scale factors before depthwise convolution layers can be propagate through the convolution.
-- Quantized avg pool performs a sum followed by a truncation to the specified bit-width (in place of a division).
-
-## VGG
-
-The reduced-precision implementation of VGG makes the following assumptions:
-- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
-- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized.
-- Quantizing avg pooling requires to propagate scaling factors along the forward pass, which generates some additional verbosity.
- To keep things simple, this particular example then leaves avg pooling unquantized.
+For more details, check the corresponding folders.
diff --git a/src/brevitas_examples/imagenet_classification/ptq/README.md b/src/brevitas_examples/imagenet_classification/ptq/README.md
@@ -0,0 +1,43 @@
+# Post Training Quantization
+
+This folder contains an example on how to use Brevitas PTQ flow to quantize a subset of torchvision models, as well as how to calibrate models that use Brevitas quantized modules.
+
+Starting from FP torchvision models, Brevitas offers the possibility to automatically obtain the corresponding quantized model leveraging torch.fx transformations.
+
+Currently, this transformation adheres to the following quantization configuration:
+- Weights and Activations are quantized to 8 bit
+- Biases are quantized to 16bit with scale factors equal to scale<sub>input</sub> * scale<sub>weight</sub>
+- Scale factors are computed per-tensor for both weights and activations
+- Scale factors are restricted to power of two values (i.e., we are quantizing to Fixed Point)
+
+
+Brevitas supports different PTQ techniques that the user can decide to enable or disable; in particular:
+- Bias Correction[<sup>1 </sup>]
+- Graph Equalization[<sup>1 </sup>]
+- If Graph Equalization is enabled, it is possible to use the _merge\_bias_ technique.[<sup>2 </sup>] [<sup>3 </sup>]
+
+The example provided has two main flows:
+- The first will iterate through a subset of pre-trained torchvision models, and quantize them using Brevitas graph quantization flow;
+- The second will use few pre-defined quantized model definitions, and load the corresponding pre-trained FP weights[<sup>4 </sup>].
+
+All models are quantized at 8 bit.
+The pre-defined quantized models use floating point scale factors, with a mix of per-tensor and per-channel strategies.
+
+To evaluate a PTQ quantized models on ImageNet:
+
+ - Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format.
+ - Run the script passing the as `--git-hash` argument a string that will be used as identifier for the `.csv` output file.
+
+For example, to run the script on the GPU 0:
+```bash
+brevitas_ptq_imagenet_val --imagenet-dir /path/to/imagenet --gpu 0 --git-hash xxxxx
+```
+The script assumes the presence of a `/path/to/imagenet/train` folder, from which the calibration samples will be taken (configurable with the `--calibration-samples` argument), and
+a `/path/to/imagenet/val` that will be used for validation.
+
+After launching the script, a `RESULT.md` markdown file will be generated two tables correspoding to the two types of quantization flow.
+
+[<sup>1 </sup>]: https://arxiv.org/abs/1906.04721
+[<sup>2 </sup>]: https://github.com/Xilinx/Vitis-AI/blob/50da04ddae396d10a1545823aca30b3abb24a276/src/vai_quantizer/vai_q_pytorch/nndct_shared/optimization/commander.py#L450
+[<sup>3 </sup>]: https://github.com/openppl-public/ppq/blob/master/ppq/quantization/algorithm/equalization.py
+[<sup>4 </sup>]: https://github.com/osmr/imgclsmob
diff --git a/src/brevitas_examples/imagenet_classification/qat/README.md b/src/brevitas_examples/imagenet_classification/qat/README.md
@@ -0,0 +1,45 @@
+# QAT Examples
+
+The models provided in this folder are meant to showcase how to leverage the quantized layers provided by Brevitas,
+and by no means a direct mapping to hardware should be assumed.
+
+Below in the table is a list of example pretrained models made available for reference.
+
+| Name | Cfg | Scaling Type | First layer weights | Weights | Activations | Avg pool | Top1 | Top5 | Pretrained model | Retrained from |
+|--------------|-----------------------|----------------------------|---------------------|---------|-------------|----------|-------|-------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
+| MobileNet V1 | quant_mobilenet_v1_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 71.14 | 90.10 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_mobilenet_v1_4b-r1/quant_mobilenet_v1_4b-0100a667.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
+| ProxylessNAS Mobile14 w/ Hadamard classifier | quant_proxylessnas_mobile14_hadamard_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 73.52 | 91.46 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_hadamard_4b-r0/quant_proxylessnas_mobile14_hadamard_4b-4acbfa9f.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
+| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 74.42 | 92.04 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b-r0/quant_proxylessnas_mobile14_4b-e10882e1.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
+| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b5b | Floating-point per channel | 8 bit | 4 bit, 5 bit | 4 bit, 5 bit | 4 bit | 75.01 | 92.33 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b5b-r0/quant_proxylessnas_mobile14_4b5b-2bdf7f8d.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
+
+
+To evaluate a pretrained quantized model on ImageNet:
+
+ - Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format (following this [script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)).
+ - Pass the name of the model as an input to the evaluation script. The required checkpoint will be downloaded automatically.
+
+ For example, for *quant_mobilenet_v1_4b* evaluated on GPU 0:
+
+```
+brevitas_imagenet_val --imagenet-dir /path/to/imagenet --model quant_mobilenet_v1_4b --gpu 0 --pretrained
+```
+
+## MobileNet V1
+
+The reduced-precision implementation of MobileNet V1 makes the following assumptions:
+- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
+- Input preprocessing is modified to have a single scale factor rather than a per-channel one, so that it can be propagated through the first convolution to thresholds.
+- Weights of the first layer are always quantized to 8 bit.
+- Padding in the first convolution is removed, so that the input's mean can be propagated through the first convolution to thresholds.
+- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized. The only exception is the bias of the fully connected layer, which is quantized.
+- Scaling of the fully connected layer is per-layer, so that the output of the network doesn't require rescaling.
+- Per-channel scale factors before depthwise convolution layers can be propagate through the convolution.
+- Quantized avg pool performs a sum followed by a truncation to the specified bit-width (in place of a division).
+
+## VGG
+
+The reduced-precision implementation of VGG makes the following assumptions:
+- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
+- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized.
+- Quantizing avg pooling requires to propagate scaling factors along the forward pass, which generates some additional verbosity.
+ To keep things simple, this particular example then leaves avg pooling unquantized.