diff --git a/Docs/install/index.rst b/Docs/install/index.rst index 2e7f75bf2c6..badff25ee45 100644 --- a/Docs/install/index.rst +++ b/Docs/install/index.rst @@ -104,7 +104,7 @@ The following pre-requisites apply to all variants. The GPU variants may need ad .. parsed-literal:: # ONNX 1.16 GPU with CUDA 11.x - python3 -m pip install |download_url|\ |version|/aimet_onnx-\ |version|\+cu117\ |whl_suffix| -f |torch_pkg_url| + python3 -m pip install |download_url|\ |version|/aimet_onnx-\ |version|\+cu118\ |whl_suffix| -f |torch_pkg_url| # ONNX 1.16 CPU python3 -m pip install |download_url|\ |version|/aimet_onnx-\ |version|\+cpu\ |whl_suffix| -f |torch_pkg_url| diff --git a/Examples/README.md b/Examples/README.md index a6faf2d7d77..5a0838a80c0 100755 --- a/Examples/README.md +++ b/Examples/README.md @@ -1,12 +1,12 @@ ![Qualcomm Innovation Center, Inc.](../Docs/images/logo-quic-on@h68.png) # AIMET Examples -AIMET Examples provide reference code (in the form of *scripts* and *Jupyter Notebooks*) to learn how to load models, apply AIMET quantization and compression features, fine tune and save your models. It is also a quick way to become familiar with AIMET usage and APIs. For more details on each of the features and APIs please reference the _[user guide](https://quic.github.io/aimet-pages/releases/1.19.1/user_guide/index.html#api-documentation-and-usage-examples)_. +AIMET Examples provide reference code (in the form of *scripts* and *Jupyter Notebooks*) to learn how to load models, apply AIMET quantization and compression features, fine tune and save your models. It is also a quick way to become familiar with AIMET usage and APIs. For more details on each of the features and APIs please reference the _[user guide](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html)_. ## Table of Contents - [Installation](#installation-instructions) - [Code Layout](#code-layout) -- [Supported Examples](#supported-examples) +- [Overview](#overview) - [Running Examples via Jupyter Notebook](#running-examples-via-jupyter-notebook) - [Running Examples via Command Line](#running-examples-via-command-line) diff --git a/Jenkins/Jenkinsfile b/Jenkins/Jenkinsfile index 052d17aa65b..bb182b17aca 100755 --- a/Jenkins/Jenkinsfile +++ b/Jenkins/Jenkinsfile @@ -1,3 +1,4 @@ +// Jenkinsfile to run pull-request status checks pipeline { parameters { string(name: 'PROJECT_NAME', defaultValue: 'aimet', description: 'project name') @@ -45,8 +46,8 @@ pipeline { } stage("Check Commits") { - agent { label "${params.BUILD_LABEL_CPU}" } - + agent { label "${params.BUILD_LABEL_CPU}" } + steps { //Set up a TF-CPU docker container to run commit checks script on cleanWs() @@ -76,12 +77,12 @@ pipeline { } } sh "bash -l -c \"rm -rf commit_checks_repo\"" - } + } } stage('Pipelines start') { - + matrix { axes { axis { @@ -106,12 +107,12 @@ pipeline { } } - agent { label "docker-build-aimet-pr-${PROC_TYPE}" } + agent { label "docker-build-aimet-pr-${PROC_TYPE}" } stages { - + stage('Start') { - + steps { script { stage("${ML_FMWORK}-${PROC_TYPE}".toUpperCase()) { @@ -121,7 +122,7 @@ pipeline { } } - + stage('Setup') { steps { @@ -133,7 +134,7 @@ pipeline { stage('Build') { - + steps { echo 'Building code (and generating Docs and pip packages)...' script { @@ -143,21 +144,27 @@ pipeline { } stage('Code violations') { - + // Works with newer jenkins instances that support the warnings-ng plugin (https://plugins.jenkins.io/warnings-ng) + when { + expression { + env.QCInternalValidation == "false" + } + } steps { echo 'Running code violations...' script { runStage("${ML_FMWORK}-${PROC_TYPE}", "-v") } } + // TODO: Following code needs to be updated to conform to this plugin: https://plugins.jenkins.io/warnings-ng // post { // always { // step([ - // $class : 'WarningsPublisher', + // $class : 'WarningsNgPublisher', // parserConfigurations : [[ // parserName: 'PYLint', // pattern : "**/**/**/*pylint_results.out" - // ]], + // ]], // failedTotalHigh : THRESHOLD_OBJ.pylint_fail_thresholds.high_priority, // failedTotalNormal : THRESHOLD_OBJ.pylint_fail_thresholds.normal_priority, // failedTotalLow : THRESHOLD_OBJ.pylint_fail_thresholds.low_priority, @@ -171,7 +178,45 @@ pipeline { // } // } // } - // } + // } + } + + stage('Code violations Legacy') { + // Works with older jenkins instances that support the warnings plugin (https://plugins.jenkins.io/warnings) + when { + expression { + env.QCInternalValidation == "true" + } + } + steps { + echo 'Running code violations...' + script { + runStage("${ML_FMWORK}-${PROC_TYPE}", "-v") + } + } + post { + always { + // NOTE: Works only with https://plugins.jenkins.io/warnings/ (deprecated) + step([ + $class : 'WarningsPublisher', + parserConfigurations : [[ + parserName: 'PYLint', + pattern: "**/**/**/*pylint_results.out" + ]], + failedTotalHigh : THRESHOLD_OBJ.pylint_fail_thresholds.high_priority, + failedTotalNormal : THRESHOLD_OBJ.pylint_fail_thresholds.normal_priority, + failedTotalLow : THRESHOLD_OBJ.pylint_fail_thresholds.low_priority, + usePreviousBuildAsReference : true + ]) + script { + if (currentBuild.currentResult.equals("FAILURE")) { + // the plugin won't fail the stage. it only sets the build status, so we have to fail it + // manually + sh "exit 1" + } + } + } + } } stage('Unit tests') { @@ -217,14 +262,13 @@ pipeline { runStage("${ML_FMWORK}-${PROC_TYPE}", "-s | true") } } - } - + } + } - } + } } - stage("AIMET extra ALL STAGES") { @@ -235,8 +279,8 @@ pipeline { callAimetExtra(env.CHANGE_TARGET) } } - } - + } + } } post { @@ -291,7 +335,7 @@ def callAimetExtra(target_branch) { // setting USE LINARO value to EMPTY to rebuild docker image using_linaro="" } - + if (target_branch.startsWith("release-aimet")) { echo "Running AIMET additional stages on ${CHANGE_TARGET} branch ..." build job: "AIMET-Extra", parameters: [string(name: 'AIMET_GIT_COMMIT', value: "${CHANGE_BRANCH}"), string(name: 'PROJECT_BRANCH', value: target_branch), string(name: 'USE_LINARO', value: "${using_linaro}"), string(name: 'PREBUILT_DOCKER_IMAGE_URL', value: "${params.PREBUILT_DOCKER_IMAGE_URL}"), string(name: 'AIMETPRO_BRANCH', value: target_branch)] @@ -299,10 +343,9 @@ def callAimetExtra(target_branch) { else if (target_branch != "develop") { echo "Running AIMET additional stages on ${CHANGE_TARGET} branch ..." build job: "AIMET-Extra", parameters: [string(name: 'AIMET_GIT_COMMIT', value: "${CHANGE_BRANCH}"), string(name: 'PROJECT_BRANCH', value: target_branch), string(name: 'USE_LINARO', value: "${using_linaro}"), string(name: 'PREBUILT_DOCKER_IMAGE_URL', value: "${params.PREBUILT_DOCKER_IMAGE_URL}")] - } + } else { echo "Running AIMET additional stages on develop branch ..." build job: "AIMET-Extra", parameters: [string(name: 'AIMET_GIT_COMMIT', value: "${CHANGE_BRANCH}"), string(name: 'USE_LINARO', value: "${using_linaro}"), string(name: 'PREBUILT_DOCKER_IMAGE_URL', value: "${params.PREBUILT_DOCKER_IMAGE_URL}")] } } - diff --git a/README.md b/README.md index 8f47a761b90..5a5d962b5a1 100644 --- a/README.md +++ b/README.md @@ -3,32 +3,26 @@ [![AIMET on GitHub Pages](Docs/images/button-overview.png)](https://quic.github.io/aimet-pages/index.html) [![Documentation](Docs/images/button-docs.png)](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html) -[![Install instructions](Docs/images/button-install.png)](#installation-instructions) -[![Discussion Forums](Docs/images/button-forums.png)](https://forums.quicinc.com) +[![Install instructions](Docs/images/button-install.png)](#quick-installation) +[![Discussion Forums](Docs/images/button-forums.png)](https://github.com/quic/aimet/discussions) [![What's New](Docs/images/button-whats-new.png)](#whats-new) # AI Model Efficiency Toolkit (AIMET) -AIMET is a library that provides advanced model quantization -and compression techniques for trained neural network models. -It provides features that have been proven to improve run-time performance of deep learning neural network models with -lower compute and memory requirements and minimal impact to task accuracy. - +AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy. ![How AIMET works](Docs/images/how-it-works.png) AIMET is designed to work with [PyTorch](https://pytorch.org), [TensorFlow](https://tensorflow.org) and [ONNX](https://onnx.ai) models. -We also host the [AIMET Model Zoo](https://github.com/quic/aimet-model-zoo) - a collection of popular neural network models optimized for 8-bit inference. -We also provide recipes for users to quantize floating point models using AIMET. +We also host the [AIMET Model Zoo](https://github.com/quic/aimet-model-zoo) - a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET. ## Table of Contents +- [Installation](#quick-installation) - [Why AIMET?](#why-aimet) -- [Quick Installation](#quick-install) - [Supported features](#supported-features) - [What's New](#whats-new) - [Results](#results) -- [Installation](#installation-instructions) - [Resources](#resources) - [Contributions](#contributions) - [Team](#team) @@ -42,7 +36,7 @@ The AIMET PyTorch GPU PyPI packages are available for environments that meet the * Linux Ubuntu 22.04 LTS [Python 3.10] or Linux Ubuntu 20.04 LTS [Python 3.8] * Torch 1.13+cu117 -#### Installation +### Installation ``` apt-get install liblapacke python3 -m pip install aimet-torch @@ -57,21 +51,15 @@ To install other AIMET variants and versions, please follow one of the links bel ![Benefits of AIMET](Docs/images/AImodelEfficency.png) -* **Supports advanced quantization techniques**: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run -5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x -smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often -challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on -several popular models. +* **Supports advanced quantization techniques**: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run 5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on several popular models. * **Supports advanced model compression techniques** that enable models to run faster at inference-time and require less memory -* **AIMET is designed to automate optimization** of neural networks avoiding time-consuming and tedious manual tweaking. -AIMET also provides user-friendly APIs that allow users to make calls directly from their [TensorFlow](https://tensorflow.org) -or [PyTorch](https://pytorch.org) pipelines. +* **AIMET is designed to automate optimization** of neural networks avoiding time-consuming and tedious manual tweaking. AIMET also provides user-friendly APIs that allow users to make calls directly from their [TensorFlow](https://tensorflow.org) or [PyTorch](https://pytorch.org) pipelines. Please visit the [AIMET on Github Pages](https://quic.github.io/aimet-pages/index.html) for more details. ## Supported Features -#### Quantization +### Quantization * *Cross-Layer Equalization*: Equalize weight tensors to reduce amplitude variation across channels * *Bias Correction*: Corrects shift in layer outputs introduced due to quantization @@ -79,13 +67,13 @@ Please visit the [AIMET on Github Pages](https://quic.github.io/aimet-pages/inde * *Quantization Simulation*: Simulate on-target quantized inference accuracy * *Quantization-aware Training*: Use quantization simulation to train the model further to improve accuracy -#### Model Compression +### Model Compression * *Spatial SVD*: Tensor decomposition technique to split a large layer into two smaller ones * *Channel Pruning*: Removes redundant input channels from a layer and reconstructs layer weights * *Per-layer compression-ratio selection*: Automatically selects how much to compress each layer in the model -#### Visualization +### Visualization * *Weight ranges*: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique * *Per-layer compression sensitivity*: Visually get feedback about the sensitivity of any given layer in the model to compression @@ -96,14 +84,12 @@ Some recently added features include * Quantization-aware Training (QAT) for recurrent models (including with RNNs, LSTMs and GRUs) ## Results - AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.
Configuration | @@ -153,8 +138,7 @@ AdaRound can recover the accuracy to within 1% of the FP32 accuracy.
---|
For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to -4-bit precision without a significant drop in accuracy.
+For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.
Configuration | @@ -176,9 +160,7 @@ AdaRound can recover the accuracy to within 1% of the FP32 accuracy.
---|