From 30d45e0f24722bf48a4b391118638a3e46d4a0a6 Mon Sep 17 00:00:00 2001
From: humu789 <humu@pjlab.org.cn>
Date: Thu, 19 Jan 2023 21:33:51 +0800
Subject: [PATCH 1/4] add quantization user guide

---
 docs/en/user_guides/index.rst                 |   4 +
 .../en/user_guides/quantization_user_guide.md | 219 ++++++++++++++++++
 2 files changed, 223 insertions(+)
 create mode 100644 docs/en/user_guides/quantization_user_guide.md

diff --git a/docs/en/user_guides/index.rst b/docs/en/user_guides/index.rst
index 622987867..b61c87be8 100644
--- a/docs/en/user_guides/index.rst
+++ b/docs/en/user_guides/index.rst
@@ -13,3 +13,7 @@ Train & Test
 Useful Tools
 ************
    please refer to upstream applied repositories' docs
+
+Quantization
+************
+quantization_user_guide.md
diff --git a/docs/en/user_guides/quantization_user_guide.md b/docs/en/user_guides/quantization_user_guide.md
new file mode 100644
index 000000000..4f0a82b27
--- /dev/null
+++ b/docs/en/user_guides/quantization_user_guide.md
@@ -0,0 +1,219 @@
+# Quantization
+
+## Introduction
+
+MMRazor's quantization is OpenMMLab's quantization toolkit, which has got through task models and model deployment. With its help, we can quantize and deploy pre-trained models in OpenMMLab to specified backend quickly. Of course, it can also contribute to implementing some custom quantization algorithms easier.
+
+### Major features
+
+- **Ease of use**. Benefited from PyTorch fx, we can quantize our model without modifying the original model, but with user-friendly config.
+- **Multiple backends deployment support**. Because of the specificity of each backend, a gap in performance usually exists between before and after deployment. We provided some common backend deployment support to reduce the gap as much.
+- **Multiple task repos support.** Benefited from OpenMMLab 2.0, our quantization can support all task repos of OpenMMLab without extra code.
+- **Be compatible with PyTorch's core module in quantization**. Some core modules in PyTorch can be used directly in mmrazor, such as `Observer`, `FakeQuantize`, `BackendConfig` and so on.
+
+## Quick run
+
+MMRazor's quantization is based on `torch==1.13`. Other requirements are the same as MMRazor's
+
+Model quantization is in mmrazor, but quantized model deployment is in mmdeploy. So we need to use two branches as follows:
+
+mmrazor: https://github.com/open-mmlab/mmrazor/tree/quantize
+
+mmdeploy: https://github.com/humu789/mmdeploy/tree/adapt_razor_quantize
+
+1. Quantize the float model in mmrazor.
+
+```Shell
+# For QAT (Quantization Aware Training)
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+
+# For PTQ (Post-training quantization)
+python tools/ptq.py ${CONFIG_FILE} [optional arguments]
+```
+
+2. Convert quantized model checkpoint in mmrazor. (required by model deployment)
+
+```Shell
+python tools/model_converters/convert_quant_ckpt.py ${CKPT_PATH}
+```
+
+3. Export quantized model to a specific backend in mmdeploy. (required by model deployment)
+
+```Shell
+python ./tools/deploy.py \
+    ${DEPLOY_CFG_PATH} \
+    ${MODEL_CFG_PATH} \
+    ${MODEL_CHECKPOINT_PATH} \
+    ${INPUT_IMG} \
+    [optional arguments]
+```
+
+This step is the same as how to export an OpenMMLab model to a specific backend. For more details, please refer to [How to convert model](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/02-how-to-run/convert_model.md)
+
+4. Evaluate the exported model. (optional)
+
+```Shell
+python tools/test.py \
+    ${DEPLOY_CFG} \
+    ${MODEL_CFG} \
+    --model ${BACKEND_MODEL_FILES} \
+    [optional arguments]
+```
+
+This step is the same as evaluating backend models. For more details, please refer to [How to evaluate model](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/02-how-to-run/profile_model.md)
+
+## How to quantize your own model quickly
+
+If you want to try quantize your own model quickly, you just need to learn about how to change our provided config.
+
+**Case 1: If the model you want to quantize is in our provided configs.**
+
+You can refer to the previous chapter Quick Run.
+
+**Case 2: If the model you want to quantize is not in our provided configs.**
+
+Let us take `resnet50` as an example to show how to handle case 2.
+
+```Python
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
+
+train_dataloader = dict(batch_size=32)
+
+test_cfg = dict(
+    type='mmrazor.PTQLoop',
+    calibrate_dataloader=train_dataloader,
+    calibrate_steps=32,
+)
+
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'  # noqa: E501
+
+model = dict(
+    _delete_=True,
+    type='mmrazor.MMArchitectureQuant',
+    data_preprocessor=dict(
+        type='mmcls.ClsDataPreprocessor',
+        num_classes=1000,
+        # RGB format normalization parameters
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        # convert image from BGR to RGB
+        to_rgb=True),
+    architecture=_base_.model,
+    float_checkpoint=float_checkpoint,
+    quantizer=dict(
+        type='mmrazor.OpenVINOQuantizer',
+        global_qconfig=global_qconfig,
+        tracer=dict(
+            type='mmrazor.CustomTracer',
+            skipped_methods=[
+                'mmcls.models.heads.ClsHead._get_loss',
+                'mmcls.models.heads.ClsHead._get_predictions'
+            ])))
+
+model_wrapper_cfg = dict(type='mmrazor.MMArchitectureQuantDDP', )
+```
+
+This is a config that quantize `resnet18` with OpenVINO backend. You just need to modify two args: `_base_` and `float_checkpoint`.
+
+```Python
+# before
+_base_ = ['mmcls::resnet/resnet18_8xb32_in1k.py']
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth'
+
+# after
+_base_ = ['mmcls::resnet/resnet50_8xb32_in1k.py']
+float_checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
+```
+
+- `_base_` will be called from mmcls by mmengine, so you can just use mmcls provided configs directly. Other repos are similar.
+- `float_checkpoint ` is a pre-trained float checkpoint by OpenMMLab. You can find it in the corresponding repo.
+
+After modifying required config, we can use it the same as case 1.
+
+## How to improve your quantization performance
+
+If you can not be satisfied with quantization performance by applying our provided configs to your own model, you can try to improve it with our provided various quantization schemes by modifying `global_qconfig`.
+
+```Python
+global_qconfig = dict(
+    w_observer=dict(type='mmrazor.PerChannelMinMaxObserver'),
+    a_observer=dict(type='mmrazor.MovingAverageMinMaxObserver'),
+    w_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    a_fake_quant=dict(type='mmrazor.FakeQuantize'),
+    w_qscheme=dict(
+        qdtype='qint8', bit=8, is_symmetry=True, is_symmetric_range=True),
+    a_qscheme=dict(
+        qdtype='quint8', bit=8, is_symmetry=True, averaging_constant=0.1),
+)
+```
+
+As shown above, `global_qconfig` contains server common core args as follows:
+
+- Observes
+
+> In `forward`, they will update the statistics of the observed Tensor. And they should provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
+
+Whether it is per channel quantization depends on whether `PerChannel` is in the observer name.
+
+Because mmrazor's quantization has been compatible with PyTorch's observers, we can use observers in PyTorch and our custom observers.
+
+Supported observers list in Pytorch.
+
+```Python
+FixedQParamsObserver
+HistogramObserver
+MinMaxObserver
+MovingAverageMinMaxObserver
+MovingAveragePerChannelMinMaxObserver
+NoopObserver
+ObserverBase
+PerChannelMinMaxObserver
+PlaceholderObserver
+RecordingObserver
+ReuseInputObserver
+UniformQuantizationObserverBase
+```
+
+- Fake quants
+
+> In `forward`, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
+
+Because mmrazor's quantization has been compatible with PyTorch's fakequants, we can use fakequants in PyTorch and our custom fakequants.
+
+Supported fakequants list in Pytorch.
+
+```Python
+FakeQuantize
+FakeQuantizeBase
+FixedQParamsFakeQuantize
+FusedMovingAvgObsFakeQuantize
+```
+
+- Qschemes
+
+> Include some basic quantization configurations.
+>
+> `qdtype`: to specify whether quantized data type is sign or unsign. It can be chosen from \[ 'qint8',  'quint8' \]
+>
+> `bit`: to specify the quantized data bit. It can be chosen from \[1 ~ 16\].
+>
+> `is_symmetry`: to specify whether to use symmetry quantization. It can be chosen from \[ True, False \]
+
+The specified qscheme is actually implemented by observers, so how to configurate other args needs to be based on the given observers, such as `is_symmetric_range` and `averaging_constant`.
+
+## How to customize your quantization algorithm
+
+If you try to customize your quantization algorithm, you can refer to the following link for more details.
+
+[Customize Quantization algorithms](https://github.com/open-mmlab/mmrazor/blob/quantize/docs/en/advanced_guides/customize_quantization_algorithms.md)

From 2099e24d1c28a7ff0df8697aa047c9733dd0203b Mon Sep 17 00:00:00 2001
From: humu789 <humu@pjlab.org.cn>
Date: Thu, 19 Jan 2023 21:47:52 +0800
Subject: [PATCH 2/4] fix layout

---
 docs/en/user_guides/index.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/en/user_guides/index.rst b/docs/en/user_guides/index.rst
index b61c87be8..fc3de61d1 100644
--- a/docs/en/user_guides/index.rst
+++ b/docs/en/user_guides/index.rst
@@ -16,4 +16,8 @@ Useful Tools
 
 Quantization
 ************
-quantization_user_guide.md
+
+.. toctree::
+   :maxdepth: 1
+
+   quantization_user_guide.md

From 90591a509f557c19622f63ace4b6a3f73e279e70 Mon Sep 17 00:00:00 2001
From: humu789 <humu@pjlab.org.cn>
Date: Thu, 19 Jan 2023 22:03:14 +0800
Subject: [PATCH 3/4] fix layout

---
 docs/en/user_guides/index.rst                 |  9 ++++----
 .../en/user_guides/quantization_user_guide.md | 22 +++++++++++--------
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/docs/en/user_guides/index.rst b/docs/en/user_guides/index.rst
index fc3de61d1..96ebc0a6e 100644
--- a/docs/en/user_guides/index.rst
+++ b/docs/en/user_guides/index.rst
@@ -10,10 +10,6 @@ Train & Test
    3_train_with_different_devices.md
    4_test_a_model.md
 
-Useful Tools
-************
-   please refer to upstream applied repositories' docs
-
 Quantization
 ************
 
@@ -21,3 +17,8 @@ Quantization
    :maxdepth: 1
 
    quantization_user_guide.md
+
+Useful Tools
+************
+
+please refer to upstream applied repositories' docs
diff --git a/docs/en/user_guides/quantization_user_guide.md b/docs/en/user_guides/quantization_user_guide.md
index 4f0a82b27..d645d8451 100644
--- a/docs/en/user_guides/quantization_user_guide.md
+++ b/docs/en/user_guides/quantization_user_guide.md
@@ -13,7 +13,9 @@ MMRazor's quantization is OpenMMLab's quantization toolkit, which has got throug
 
 ## Quick run
 
+```{note}
 MMRazor's quantization is based on `torch==1.13`. Other requirements are the same as MMRazor's
+```
 
 Model quantization is in mmrazor, but quantized model deployment is in mmdeploy. So we need to use two branches as follows:
 
@@ -162,9 +164,11 @@ As shown above, `global_qconfig` contains server common core args as follows:
 
 - Observes
 
-> In `forward`, they will update the statistics of the observed Tensor. And they should provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
+In `forward`, they will update the statistics of the observed Tensor. And they should provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
 
+```{note}
 Whether it is per channel quantization depends on whether `PerChannel` is in the observer name.
+```
 
 Because mmrazor's quantization has been compatible with PyTorch's observers, we can use observers in PyTorch and our custom observers.
 
@@ -187,7 +191,7 @@ UniformQuantizationObserverBase
 
 - Fake quants
 
-> In `forward`, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
+In `forward`, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics.
 
 Because mmrazor's quantization has been compatible with PyTorch's fakequants, we can use fakequants in PyTorch and our custom fakequants.
 
@@ -202,13 +206,13 @@ FusedMovingAvgObsFakeQuantize
 
 - Qschemes
 
-> Include some basic quantization configurations.
->
-> `qdtype`: to specify whether quantized data type is sign or unsign. It can be chosen from \[ 'qint8',  'quint8' \]
->
-> `bit`: to specify the quantized data bit. It can be chosen from \[1 ~ 16\].
->
-> `is_symmetry`: to specify whether to use symmetry quantization. It can be chosen from \[ True, False \]
+Include some basic quantization configurations.
+
+`qdtype`: to specify whether quantized data type is sign or unsign. It can be chosen from \[ 'qint8',  'quint8' \]
+
+`bit`: to specify the quantized data bit. It can be chosen from \[1 ~ 16\].
+
+`is_symmetry`: to specify whether to use symmetry quantization. It can be chosen from \[ True, False \]
 
 The specified qscheme is actually implemented by observers, so how to configurate other args needs to be based on the given observers, such as `is_symmetric_range` and `averaging_constant`.
 

From 3fe5d89b629daff822d47873e1c3413af1a6ccd3 Mon Sep 17 00:00:00 2001
From: humu789 <humu@pjlab.org.cn>
Date: Thu, 19 Jan 2023 22:12:55 +0800
Subject: [PATCH 4/4] update README

---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 6a96d0372..ee5be4bc8 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
 <!--算法库 Badges-->
 
 [![PyPI](https://img.shields.io/pypi/v/mmrazor)](https://pypi.org/project/mmrazor)
-[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmrazor.readthedocs.io/en/dev-1.x/)
+[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmrazor.readthedocs.io/en/quantize/)
 [![badge](https://github.com/open-mmlab/mmrazor/workflows/build/badge.svg)](https://github.com/open-mmlab/mmrazor/actions)
 [![codecov](https://codecov.io/gh/open-mmlab/mmrazor/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmrazor)
 [![license](https://img.shields.io/github/license/open-mmlab/mmrazor.svg)](https://github.com/open-mmlab/mmrazor/blob/master/LICENSE)
@@ -32,9 +32,9 @@
 
 <!--Note:请根据各算法库自身情况设置项目和链接-->
 
-[📘Documentation](https://mmrazor.readthedocs.io/en/dev-1.x/) |
-[🛠️Installation](https://mmrazor.readthedocs.io/en/dev-1.x/get_started/installation.html) |
-[👀Model Zoo](https://mmrazor.readthedocs.io/en/dev-1.x/get_started/model_zoo.html) |
+[📘Documentation](https://mmrazor.readthedocs.io/en/quantize/) |
+[🛠️Installation](https://mmrazor.readthedocs.io/en/quantize/get_started/installation.html) |
+[👀Model Zoo](https://mmrazor.readthedocs.io/en/quantize/get_started/model_zoo.html) |
 [🤔Reporting Issues](https://github.com/open-mmlab/mmrazor/issues/new/choose)
 
 </div>
@@ -54,7 +54,7 @@ MMRazor is a model compression toolkit for model slimming and AutoML, which incl
 - Neural Architecture Search (NAS)
 - Pruning
 - Knowledge Distillation (KD)
-- Quantization (come soon)
+- Quantization
 
 It is a part of the [OpenMMLab](https://openmmlab.com/) project.
 
@@ -72,7 +72,7 @@ Major features:
 
   With better modular design, developers can implement new model compression algorithms with only a few codes, or even by simply modifying config files.
 
-Below is an overview of MMRazor's design and implementation, please refer to [tutorials](https://mmrazor.readthedocs.io/en/dev-1.x/get_started/overview.html) for more details.
+Below is an overview of MMRazor's design and implementation, please refer to [tutorials](https://mmrazor.readthedocs.io/en/quantize/get_started/overview.html) for more details.
 
 <div align="center">
   <img src="resources/design_and_implement.png" style="zoom:100%"/>
@@ -150,7 +150,7 @@ Please refer to [installation.md](/docs/en/get_started/installation.md) for more
 
 ## Getting Started
 
-Please refer to [user guides](https://mmrazor.readthedocs.io/en/dev-1.x/user_guides/index.html) for the basic usage of MMRazor. There are also [advanced guides](https://mmrazor.readthedocs.io/en/dev-1.x/advanced_guides/index.html):
+Please refer to [user guides](https://mmrazor.readthedocs.io/en/quantize/user_guides/index.html) for the basic usage of MMRazor. There are also [advanced guides](https://mmrazor.readthedocs.io/en/quantize/advanced_guides/index.html):
 
 ## Contributing