Skip to content

Latest commit

 

History

History
146 lines (104 loc) · 6.4 KB

one-shot-experiment-guide.md

File metadata and controls

146 lines (104 loc) · 6.4 KB

Sparsify One-Shot Experiment Guide

If you're just getting started with Sparsify, we recommend you try out this One-Shot Experiment pathway first. We also have Sparse-Transfer and Training-Aware Experiments, which you can explore in the Next Steps section of this guide.

Table of Contents

  1. Experiment Overview
  2. CLI Quickstart
  3. Examples
  4. Next Steps
  5. Resources

Experiment Overview

Sparsity Sparsification Speed Accuracy
++ +++++ +++

One-Shot Experiments are the quickest way to create a faster and smaller version of your model. The algorithms are applied to the model post-training, utilizing a calibration dataset, so they result in no further training time and much faster sparsification times compared with Training-Aware Experiments.

Generally, One-Shot Experiments result in a 3-5x speedup with minimal accuracy loss. They are ideal for when you want to quickly sparsify your model and have limited time to spend on the sparsification process.

The CLI Quickstart below will walk you through the steps to run a One-Shot Experiment on your model. To utilize the cloud pathways for One-Shot Experiments, review the Cloud User Guide.

CLI Quickstart

Now that you understand what a One-Shot Experiment is and the benefits, including short optimization time due to post-training algorithms, you can now use the CLI to effectively run a One-Shot Experiment.

Before you run a One-Shot Experiment, confirm you are logged into the Sparsify CLI. For installation and setup instructions, review the Install and Setup Section in the Sparsify README.

One-Shot Experiments use the following general command:

sparsify.run one-shot --use-case USE_CASE --model MODEL --data DATA --optim-level OPTIM_LEVEL*

* optional arguments

The description, rules, and possible values for each of the arguments are described below:

USE_CASE

The generally supported use cases for Sparsify are:

  • cv-classification
  • cv-detection
  • cv-segmentation
  • nlp-question_answering
  • nlp-text_classification
  • nlp-sentiment_analysis
  • nlp-token_classification
  • nlp-named_entity_recognition

Note that other aliases are recognized for these use cases, such as image-classification for cv-classification. Sparsify will automatically recognize these aliases and apply the correct use case.

For One-Shot Experiments, both the CLIs and APIs always support custom use cases. To utilize, run a One-Shot Experiment with --use-case set to the desired custom use case. This custom use case can be any ASCII string.

MODEL

One-Shot requires the model provided to be in an ONNX format. The ONNX model must be exported with static input shapes and not contain custom ONNX operators. For guidance on how to convert a PyTorch model to ONNX, read our ONNX Export User Guide.

In the near future, more formats including PyTorch will be added for support with One-Shot Experiments.

DATA

For One-Shot Experiments, Sparsify utilizes the .npz format for data storage, which is a file format based on the popular NumPy library. In the future, more formats will be added for support with One-Shot Experiments.

Specifically, the following structure is expected for the dataset:

data
├── input1.npz
├── input2.npz
├── input3.npz

Where each input#.npz file contains a single data sample, and the data sample is structured as a dictionary mapping the input name in the ONNX specification to a numpy array containing the data that matches the input shapes without the batch dimension. For example, a BERT-style model running with a sequence length of 128 would have the following data sample:

{
  "input_ids": ndarray(128,), 
  "attention_mask": ndarray(128,), 
  "token_type_ids": ndarray(128,)
}

For more information on the specs and guides for creating the NPZ format, read the NPZ Dataset Guide.

OPTIM_LEVEL

When using Sparsify, the optim (sparsification) level is one of the top arguments you should decide on. Specifically, it controls how much sparsification is applied to your model, with higher values resulting in faster and more compressed models. At the max range, though, you may see a drop in accuracy.

Given that One-Shot is applied in post-training, the sparsity ranges are lowered to avoid accuracy drops as compared with Sparse-Transfer or Training-Aware. The current ranges are the following (subject to change):

  • optim-level == 0.0: no sparsification is applied and the input model is returned as a baseline test case.
  • optim-level < 0.3: INT8 quantization of the model (activations and weights) is applied.
  • optim-level >= 0.3: unstructured pruning (sparsity) is applied to the weights of the model from 40% for 0.3 to 80% for 1.0 with linear scaling between. Additionally, INT8 quantization of the model is applied.

The default of 0.5 will result in a ~50% sparse model with INT8 quantization.

Examples

Check back in soon for walkthroughs and examples of One-Shot Experiments applied to various popular models and use cases.

Next Steps

Now that you have successfully run a One-Shot Experiment, check out the following guides to continue your Sparsify journey:

Resources

To learn more about Sparsify and the available pathways other than One-Shot Experiments, refer to the Sparsify README.