Skip to content

vaishdho1/Quantization-Inference

Repository files navigation

Quantization and Inference

Description

This project focuses on model quantization techniques to reduce model size and potentially improve inference speed. Currently, it supports quantization in a packed format.

Current Status

  • Quantization: Implemented GPTQ and tested quantization to 4 bits for Llama 3.3 4B parameter model using a packed format.
  • Inference: Inference for quantized models is yet to be run.

Key Files

  • Run_quantization.ipynb: Jupyter notebook demonstrating the quantization process.
  • quantize.py: Core script for performing model quantization.
  • pack_quantized.py: Script for packing the quantized model components.
  • helpers.py: Utility functions supporting the quantization process.
  • load_datasets.py: Script for loading datasets used during quantization.
  • requirements.txt: Lists the necessary Python packages for this project.
  • hooks.py: Contains hooks used during the quantization process.
  • test.py: Contains tests for the project.

Setup

  1. Clone the repository.
  2. Create a Python virtual environment (recommended):
    python -m venv myenv
    source myenv/bin/activate  # On Windows use `myenv\Scripts\activate`
  3. Install the required packages:
    pip install -r requirements.txt

Usage

Running Quantization

The primary way to run quantization is through the Run_quantization.ipynb notebook. This notebook provides a step-by-step guide and execution environment for quantizing models.

Alternatively, you might be able to run quantization using the scripts directly, though the notebook is the recommended starting point.

Future Work

  • Run inference for the quantized models.
  • Test and validate performance on a wider range of models and datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published