Skip to content

Commit

Permalink
Update for API 3.0 online doc (#1940)
Browse files Browse the repository at this point in the history
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
  • Loading branch information
NeoZhangJianyu and arthw authored Jul 23, 2024
1 parent b787940 commit efcb293
Show file tree
Hide file tree
Showing 45 changed files with 219 additions and 160 deletions.
52 changes: 26 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,21 @@ pip install neural-compressor[pt]
# Install 2.X API + Framework extension API + TensorFlow dependency
pip install neural-compressor[tf]
```
> **Note**:
> **Note**:
> Further installation methods can be found under [Installation Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/installation_guide.md). check out our [FAQ](https://github.com/intel/neural-compressor/blob/master/docs/source/faq.md) for more details.
## Getting Started

Setting up the environment:
Setting up the environment:
```bash
pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
```
After successfully installing these packages, try your first quantization program.

### Weight-Only Quantization (LLMs)
Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.
Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.

To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
```bash
# Run a container with an interactive shell
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest
Expand Down Expand Up @@ -91,9 +91,9 @@ woq_conf = PostTrainingQuantConfig(
)
quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)
```
**Note:**
**Note:**

To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.
To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.

### Static Quantization (Non-LLMs)

Expand Down Expand Up @@ -121,10 +121,10 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="2" align="center"><a href="./docs/3x/design.md#architecture">Architecture</a></td>
<td colspan="2" align="center"><a href="./docs/3x/design.md#workflow">Workflow</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/design.md#architecture">Architecture</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/design.md#workflow">Workflow</a></td>
<td colspan="2" align="center"><a href="https://intel.github.io/neural-compressor/latest/docs/source/api-doc/apis.html">APIs</a></td>
<td colspan="1" align="center"><a href="./docs/3x/llm_recipes.md">LLMs Recipes</a></td>
<td colspan="1" align="center"><a href="./docs/source/3x/llm_recipes.md">LLMs Recipes</a></td>
<td colspan="1" align="center">Examples</td>
</tr>
</tbody>
Expand All @@ -135,15 +135,15 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="2" align="center"><a href="./docs/3x/PyTorch.md">Overview</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_StaticQuant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PyTorch.md">Overview</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_StaticQuant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
</tr>
<tr>
<td colspan="4" align="center"><a href="./docs/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
</tr>
</tbody>
<thead>
Expand All @@ -153,9 +153,9 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="3" align="center"><a href="./docs/3x/TensorFlow.md">Overview</a></td>
<td colspan="3" align="center"><a href="./docs/3x/TF_Quant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/TF_SQ.md">Smooth Quantization</a></td>
<td colspan="3" align="center"><a href="./docs/source/3x/TensorFlow.md">Overview</a></td>
<td colspan="3" align="center"><a href="./docs/source/3x/TF_Quant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
</tr>
</tbody>
<thead>
Expand All @@ -165,24 +165,24 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="4" align="center"><a href="./docs/3x/autotune.md">Auto Tune</a></td>
<td colspan="4" align="center"><a href="./docs/3x/benchmark.md">Benchmark</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/autotune.md">Auto Tune</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/benchmark.md">Benchmark</a></td>
</tr>
</tbody>
</table>

> **Note**:
> **Note**:
> From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in [2.X API](https://github.com/intel/neural-compressor/blob/master/docs/source/2x_user_guide.md) currently.
## Selected Publications/Events
* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
* Blog by Intel: [Optimization of Intel AI Solutions for Alibaba Cloud’s Qwen2 Large Language Models](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-accelerate-alibaba-qwen2-llms.html) (June 2024)
* Blog by Intel: [Accelerate Meta* Llama 3 with Intel AI Solutions](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html) (Apr 2024)
* EMNLP'2023 (Under Review): [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://openreview.net/forum?id=iaI8xEINAf&referrer=%5BAuthor%20Console%5D) (Sep 2023)
* arXiv: [Efficient Post-training Quantization with FP8 Formats](https://arxiv.org/abs/2309.14592) (Sep 2023)
* arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs](https://arxiv.org/abs/2309.05516) (Sep 2023)

> **Note**:
> **Note**:
> View [Full Publication List](https://github.com/intel/neural-compressor/blob/master/docs/source/publication_list.md).
## Additional Content
Expand All @@ -192,8 +192,8 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
* [Legal Information](./docs/source/legal_information.md)
* [Security Policy](SECURITY.md)

## Communication
## Communication
- [GitHub Issues](https://github.com/intel/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.
- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
- [Email](mailto:inc.maintainers@intel.com): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
- [Discord Channel](https://discord.com/invite/Wxk3J3ZJkU): join the discord channel for more flexible technical discussion.
- [WeChat group](/docs/source/imgs/wechat_group.jpg): scan the QA code to join the technical discussion.
88 changes: 0 additions & 88 deletions docs/3x/get_started.md

This file was deleted.

14 changes: 9 additions & 5 deletions docs/build_docs/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,17 +84,18 @@ cp -rf ../docs/ ./source
cp -f "../README.md" "./source/docs/source/Welcome.md"
cp -f "../SECURITY.md" "./source/docs/source/SECURITY.md"


all_md_files=`find ./source/docs -name "*.md"`
for md_file in ${all_md_files}
do
sed -i 's/.md/.html/g' ${md_file}
done


sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
# sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
#sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
#sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
#sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md

sed -i 's/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/Welcome.md

Expand Down Expand Up @@ -130,6 +131,8 @@ if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
cp -r ${SRC_FOLDER}/* ${DST_FOLDER}
python update_html.py ${DST_FOLDER} ${VERSION}
cp -r ./source/docs/source/imgs ${DST_FOLDER}/docs/source
cp -r ./source/docs/source/3x/imgs ${DST_FOLDER}/docs/source/3x


cp source/_static/index.html ${DST_FOLDER}
else
Expand All @@ -143,6 +146,7 @@ if [[ ${UPDATE_LATEST_FOLDER} -eq 1 ]]; then
cp -r ${SRC_FOLDER}/* ${LATEST_FOLDER}
python update_html.py ${LATEST_FOLDER} ${VERSION}
cp -r ./source/docs/source/imgs ${LATEST_FOLDER}/docs/source
cp -r ./source/docs/source/3x/imgs ${LATEST_FOLDER}/docs/source/3x
cp source/_static/index.html ${LATEST_FOLDER}
else
echo "skip to create ${LATEST_FOLDER}"
Expand All @@ -152,7 +156,7 @@ echo "Create document is done"

if [[ ${CHECKOUT_GH_PAGES} -eq 1 ]]; then
git clone -b gh-pages --single-branch https://github.com/intel/neural-compressor.git ${RELEASE_FOLDER}

if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
python update_version.py ${ROOT_DST_FOLDER} ${VERSION}
cp -rf ${DST_FOLDER} ${RELEASE_FOLDER}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
29 changes: 29 additions & 0 deletions docs/source/api-doc/api_2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
2.0 API
####

**User facing APIs:**

.. toctree::
:maxdepth: 1

quantization.rst
mix_precision.rst
training.rst
benchmark.rst
config.rst
objective.rst


**Advanced APIs:**

.. toctree::
:maxdepth: 1

compression.rst
strategy.rst
model.rst

**API document example:**

.. toctree::
api_doc_example.rst
27 changes: 27 additions & 0 deletions docs/source/api-doc/api_3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
3.0 API
####

**PyTorch Extension API:**

.. toctree::
:maxdepth: 1

torch_quantization_common.rst
torch_quantization_config.rst
torch_quantization_autotune.rst

**Tensorflow Extension API:**

.. toctree::
:maxdepth: 1

tf_quantization_common.rst
tf_quantization_config.rst
tf_quantization_autotune.rst

**Other Modules:**

.. toctree::
:maxdepth: 1

benchmark.rst
21 changes: 2 additions & 19 deletions docs/source/api-doc/apis.rst
Original file line number Diff line number Diff line change
@@ -1,29 +1,12 @@
APIs
####

**User facing APIs:**

.. toctree::
:maxdepth: 1

quantization.rst
mix_precision.rst
training.rst
benchmark.rst
config.rst
objective.rst


**Advanced APIs:**
api_3.rst

.. toctree::
:maxdepth: 1

compression.rst
strategy.rst
model.rst

**API document example:**

.. toctree::
api_doc_example.rst
api_2.rst
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_autotune.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization AutoTune
============

.. autoapisummary::

neural_compressor.tensorflow.quantization.autotune
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization Base API
#################################

.. autoapisummary::

neural_compressor.tensorflow.quantization.quantize
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization Config
============

.. autoapisummary::

neural_compressor.tensorflow.quantization.config
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_autotune.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization AutoTune
============

.. autoapisummary::

neural_compressor.torch.quantization.autotune
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization Base API
#################################

.. autoapisummary::

neural_compressor.torch.quantization.quantize
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization Config
============

.. autoapisummary::

neural_compressor.torch.quantization.config
Loading

0 comments on commit efcb293

Please sign in to comment.