The sample is a getting started tutorial for the Intel® Neural Compressor (INC), and demonstrates how to perform INT8 quantization on a Hugging Face BERT model. This sample shows how to achieve performance boosts using Intel hardware.
Area | Description |
---|---|
What you will learn | How to quantize a BERT model using Intel® Neural Compressor |
Time to complete | 20 minutes |
Category | Code Optimization |
Intel® Neural Compressor comes with many options for deep learning model compression, one of them being INT8 Quantization. Quantization help to reduce the size of the model, which enables faster inference. The approach requires a trade-off in reduced accuracy for the reduced size; however, Intel® Neural Compressor provides automated accuracy-driven tuning recipes that will allow you to quantize your model and maintain your model accuracy goals.
The sample starts by loading a BERT model from Hugging Face. After loading the model, we set up an evaluation function that we care about using PyTorch* Dataset and DataLoader classes. Using this evaluation function, Intel® Neural Compressor can perform both post training static and dynamic quantization to achieve the speedups.
Optimized for | Description |
---|---|
OS | Ubuntu* 20.04 (or newer) |
Hardware | Intel® Xeon® Scalable processor family |
Software | Intel® Neural Compressor, Intel® Extension of PyTorch* |
Note: AI and Analytics samples are validated on AI Tools Offline Installer. For the full list of validated platforms refer to Platform Validation.
The sample contains one Jupyter Notebook and one Python script. It can be run using Jupyter notebooks or the offline installer.
Notebook | Description |
---|---|
quantize_with_inc.ipynb |
Get started tutorial for using Intel® Neural Compressor for PyTorch* |
Script | Description |
---|---|
dataset.py |
The script provides a PyTorch* Dataset class that tokenizes text data |
You will need to download and install the following toolkits, tools, and components to use the sample.
1. Get Intel® AI Tools
Required AI Tools: Intel® Neural Compressor, Intel® Extension of PyTorch*.
If you have not already, select and install these Tools via via AI Tools Selector. AI and Analytics samples are validated on AI Tools Offline Installer. It is recommended to select Offline Installer option in AI Tools Selector.
2. Install dependencies
pip install -r requirements.txt
Install Jupyter Notebook by running pip install notebook
. Alternatively, see Installing Jupyter for detailed installation instructions.
Note: Before running the sample, make sure Environment Setup is completed.
Go to the section which corresponds to the installation method chosen in AI Tools Selector to see relevant instructions:
- If you have not already done so, activate the AI Tools bundle base environment. If you used the default location to install AI Tools, open a terminal and type the following
source $HOME/intel/oneapi/intelpython/bin/activate
If you used a separate location, open a terminal and type the following
source <custom_path>/bin/activate
- Activate the Conda environment:
conda activate pytorch
- Clone the GitHub repository:
git clone https://github.com/oneapi-src/oneAPI-samples.git
cd oneAPI-samples/AI-and-Analytics/Getting-Started-Samples
- Launch Jupyter Notebook:
Note: You might need to register Conda kernel to Jupyter Notebook kernel, feel free to check the instruction
jupyter notebook --ip=0.0.0.0
- Follow the instructions to open the URL with the token in your browser.
- Select the Notebook:
quantize_with_inc.ipynb
- Change the kernel to
pytorch
- Run every cell in the Notebook in sequence.
Note: Make sure your Conda/Python environment with AI Tools installed is activated
- Clone the GitHub repository:
git clone https://github.com/oneapi-src/oneAPI-samples.git
cd oneAPI-samples/AI-and-Analytics/Getting-Started-Samples
- Launch Jupyter Notebook:
Note: You might need to register Conda kernel to Jupyter Notebook kernel, feel free to check the instruction
jupyter notebook --ip=0.0.0.0
- Follow the instructions to open the URL with the token in your browser.
- Select the Notebook:
quantize_with_inc.ipynb
- Run every cell in the Notebook in sequence.
AI Tools Docker images already have Get Started samples pre-installed. Refer to Working with Preset Containers to learn how to run the docker and samples.
You should see an image showing the performance comparison and analysis between FP32 and INT8.
Note: The image shown below is an example of a general performance comparison for inference speedup obtained by quantization. (Your results might be different.)
- Fine-tuning Text Classification Model with Intel® Neural Compressor (INC)
- Optimize PyTorch* Models using Intel® Extension for PyTorch* (IPEX)
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt.
*Other names and brands may be claimed as the property of others. Trademarks