User Guide

Intel® Neural Compressor aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search to help the user optimize their model. The below documents could help you to get familiar with concepts and modules in Intel® Neural Compressor. Learn how to utilize the APIs in Intel® Neural Compressor to conduct quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks.

Overview

This part helps user to get a quick understand about design structure and workflow of Intel® Neural Compressor. We provided broad examples to help users get started.

Architecture				Workflow			APIs
Notebook		Examples	Results	Intel oneAPI AI Analytics Toolkit

Python-based APIs

Python-based APIs contains more details about the functional APIs in Intel® Neural Compressor, which introduce the mechanism of each function and provides a tutorial to help the user apply in their own cases. Please note that we will stop to support Intel Neural Compressor 1.X API in the future. So we provide a comprehensive migration document in Code Migration to help the user update their code from previous 1.X version to the new 2.X version. In 2.X API, it's very important to create the DataLoader and Metrics for your examples, so we provide the detail introductions.

Quantization	Advanced Mixed Precision		Pruning (Sparsity)	Distillation
Orchestration	Benchmarking	Distributed Compression		Model Export
Code Migration from Intel® Neural Compressor 1.X to Intel® Neural Compressor 2.X
DataLoader		Metric

Neural Coder (Zero-code Optimization)

Neural Coder shows our special innovation about zero-code optimization to help user quickly apply Intel® Neural Compressor optimization without coding.

Launcher

JupyterLab Extension

Visual Studio Code Extension

Supported Matrix

Advanced Topics

This part provides the advanced topics that help user dive deep into Intel® Neural Compressor.

Adaptor	Strategy		Objective		Calibration
Diagnosis		Add New Data Type		Add New Adaptor
Distillation for Quantization		SmoothQuant		Weight-Only Quantization

Innovations for Productivity

We are continue creating some user-friendly applications to improve the productivity. From v2.2 we have Neural Solution for distributed quantization and Neural Insights for quantization accuracy debugging.

Neural Solution

Neural Insights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user_guide.md

user_guide.md

User Guide

Overview

Python-based APIs

Neural Coder (Zero-code Optimization)

Advanced Topics

Innovations for Productivity

Files

user_guide.md

Latest commit

History

user_guide.md

File metadata and controls

User Guide

Overview

Python-based APIs

Neural Coder (Zero-code Optimization)

Advanced Topics

Innovations for Productivity