Skip to content

Commit

Permalink
First readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
Giuseppe5 committed Apr 4, 2023
1 parent 792e11d commit 16717a8
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 42 deletions.
48 changes: 6 additions & 42 deletions src/brevitas_examples/imagenet_classification/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,9 @@
# Examples
# Imagenet Examples

The models provided in this folder are meant to showcase how to leverage the quantized layers provided by Brevitas,
and by no means a direct mapping to hardware should be assumed.
This folder contains examples about how to leverage quantized layers and quantization flows offered by Brevitas.

Below in the table is a list of example pretrained models made available for reference.
There are two main category of examples at the moment:
- QAT (Quantization Aware Training): Examples on how to run inference on a small set of pre-trained quantized networks, obtained through QAT. For each model, the corresponding quantized model definition is also provided.
- PTQ (Post-Training Quantization): Examples on how to use the Brevitas PTQ flows to quantize a subset of torchvision models.

| Name | Cfg | Scaling Type | First layer weights | Weights | Activations | Avg pool | Top1 | Top5 | Pretrained model | Retrained from |
|--------------|-----------------------|----------------------------|---------------------|---------|-------------|----------|-------|-------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| MobileNet V1 | quant_mobilenet_v1_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 71.14 | 90.10 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_mobilenet_v1_4b-r1/quant_mobilenet_v1_4b-0100a667.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 w/ Hadamard classifier | quant_proxylessnas_mobile14_hadamard_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 73.52 | 91.46 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_hadamard_4b-r0/quant_proxylessnas_mobile14_hadamard_4b-4acbfa9f.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 74.42 | 92.04 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b-r0/quant_proxylessnas_mobile14_4b-e10882e1.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b5b | Floating-point per channel | 8 bit | 4 bit, 5 bit | 4 bit, 5 bit | 4 bit | 75.01 | 92.33 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b5b-r0/quant_proxylessnas_mobile14_4b5b-2bdf7f8d.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |


To evaluate a pretrained quantized model on ImageNet:

- Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format (following this [script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)).
- Pass the name of the model as an input to the evaluation script. The required checkpoint will be downloaded automatically.

For example, for *quant_mobilenet_v1_4b* evaluated on GPU 0:

```
brevitas_imagenet_val --imagenet-dir /path/to/imagenet --model quant_mobilenet_v1_4b --gpu 0 --pretrained
```

## MobileNet V1

The reduced-precision implementation of MobileNet V1 makes the following assumptions:
- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
- Input preprocessing is modified to have a single scale factor rather than a per-channel one, so that it can be propagated through the first convolution to thresholds.
- Weights of the first layer are always quantized to 8 bit.
- Padding in the first convolution is removed, so that the input's mean can be propagated through the first convolution to thresholds.
- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized. The only exception is the bias of the fully connected layer, which is quantized.
- Scaling of the fully connected layer is per-layer, so that the output of the network doesn't require rescaling.
- Per-channel scale factors before depthwise convolution layers can be propagate through the convolution.
- Quantized avg pool performs a sum followed by a truncation to the specified bit-width (in place of a division).

## VGG

The reduced-precision implementation of VGG makes the following assumptions:
- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized.
- Quantizing avg pooling requires to propagate scaling factors along the forward pass, which generates some additional verbosity.
To keep things simple, this particular example then leaves avg pooling unquantized.
For more details, check the corresponding folders.
43 changes: 43 additions & 0 deletions src/brevitas_examples/imagenet_classification/ptq/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Post Training Quantization

This folder contains an example on how to use Brevitas PTQ flow to quantize a subset of torchvision models, as well as how to calibrate models that use Brevitas quantized modules.

Starting from FP torchvision models, Brevitas offers the possibility to automatically obtain the corresponding quantized model leveraging torch.fx transformations.

Currently, this transformation adheres to the following quantization configuration:
- Weights and Activations are quantized to 8 bit
- Biases are quantized to 16bit with scale factors equal to scale<sub>input</sub> * scale<sub>weight</sub>
- Scale factors are computed per-tensor for both weights and activations
- Scale factors are restricted to power of two values (i.e., we are quantizing to Fixed Point)


Brevitas supports different PTQ techniques that the user can decide to enable or disable; in particular:
- Bias Correction[<sup>1 </sup>]
- Graph Equalization[<sup>1 </sup>]
- If Graph Equalization is enabled, it is possible to use the _merge\_bias_ technique.[<sup>2 </sup>] [<sup>3 </sup>]

The example provided has two main flows:
- The first will iterate through a subset of pre-trained torchvision models, and quantize them using Brevitas graph quantization flow;
- The second will use few pre-defined quantized model definitions, and load the corresponding pre-trained FP weights[<sup>4 </sup>].

All models are quantized at 8 bit.
The pre-defined quantized models use floating point scale factors, with a mix of per-tensor and per-channel strategies.

To evaluate a PTQ quantized models on ImageNet:

- Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format.
- Run the script passing the as `--git-hash` argument a string that will be used as identifier for the `.csv` output file.

For example, to run the script on the GPU 0:
```bash
brevitas_ptq_imagenet_val --imagenet-dir /path/to/imagenet --gpu 0 --git-hash xxxxx
```
The script assumes the presence of a `/path/to/imagenet/train` folder, from which the calibration samples will be taken (configurable with the `--calibration-samples` argument), and
a `/path/to/imagenet/val` that will be used for validation.

After launching the script, a `RESULT.md` markdown file will be generated two tables correspoding to the two types of quantization flow.

[<sup>1 </sup>]: https://arxiv.org/abs/1906.04721
[<sup>2 </sup>]: https://github.com/Xilinx/Vitis-AI/blob/50da04ddae396d10a1545823aca30b3abb24a276/src/vai_quantizer/vai_q_pytorch/nndct_shared/optimization/commander.py#L450
[<sup>3 </sup>]: https://github.com/openppl-public/ppq/blob/master/ppq/quantization/algorithm/equalization.py
[<sup>4 </sup>]: https://github.com/osmr/imgclsmob
45 changes: 45 additions & 0 deletions src/brevitas_examples/imagenet_classification/qat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# QAT Examples

The models provided in this folder are meant to showcase how to leverage the quantized layers provided by Brevitas,
and by no means a direct mapping to hardware should be assumed.

Below in the table is a list of example pretrained models made available for reference.

| Name | Cfg | Scaling Type | First layer weights | Weights | Activations | Avg pool | Top1 | Top5 | Pretrained model | Retrained from |
|--------------|-----------------------|----------------------------|---------------------|---------|-------------|----------|-------|-------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| MobileNet V1 | quant_mobilenet_v1_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 71.14 | 90.10 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_mobilenet_v1_4b-r1/quant_mobilenet_v1_4b-0100a667.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 w/ Hadamard classifier | quant_proxylessnas_mobile14_hadamard_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 73.52 | 91.46 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_hadamard_4b-r0/quant_proxylessnas_mobile14_hadamard_4b-4acbfa9f.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b | Floating-point per channel | 8 bit | 4 bit | 4 bit | 4 bit | 74.42 | 92.04 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b-r0/quant_proxylessnas_mobile14_4b-e10882e1.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |
| ProxylessNAS Mobile14 | quant_proxylessnas_mobile14_4b5b | Floating-point per channel | 8 bit | 4 bit, 5 bit | 4 bit, 5 bit | 4 bit | 75.01 | 92.33 | [Download](https://github.com/Xilinx/brevitas/releases/download/quant_proxylessnas_mobile14_4b5b-r0/quant_proxylessnas_mobile14_4b5b-2bdf7f8d.pth) | [link](https://github.com/osmr/imgclsmob/tree/master/pytorch) |


To evaluate a pretrained quantized model on ImageNet:

- Make sure you have Brevitas installed and the ImageNet dataset in a Pytorch friendly format (following this [script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh)).
- Pass the name of the model as an input to the evaluation script. The required checkpoint will be downloaded automatically.

For example, for *quant_mobilenet_v1_4b* evaluated on GPU 0:

```
brevitas_imagenet_val --imagenet-dir /path/to/imagenet --model quant_mobilenet_v1_4b --gpu 0 --pretrained
```

## MobileNet V1

The reduced-precision implementation of MobileNet V1 makes the following assumptions:
- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
- Input preprocessing is modified to have a single scale factor rather than a per-channel one, so that it can be propagated through the first convolution to thresholds.
- Weights of the first layer are always quantized to 8 bit.
- Padding in the first convolution is removed, so that the input's mean can be propagated through the first convolution to thresholds.
- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized. The only exception is the bias of the fully connected layer, which is quantized.
- Scaling of the fully connected layer is per-layer, so that the output of the network doesn't require rescaling.
- Per-channel scale factors before depthwise convolution layers can be propagate through the convolution.
- Quantized avg pool performs a sum followed by a truncation to the specified bit-width (in place of a division).

## VGG

The reduced-precision implementation of VGG makes the following assumptions:
- Floating point per-channel scale factors can be implemented by the target hardware, e.g. using FINN-style thresholds.
- Biases and batch-norm can be merged into FINN-style thresholds, and as such as left unquantized.
- Quantizing avg pooling requires to propagate scaling factors along the forward pass, which generates some additional verbosity.
To keep things simple, this particular example then leaves avg pooling unquantized.

0 comments on commit 16717a8

Please sign in to comment.