Quantization and Inference

Description

This project focuses on model quantization techniques to reduce model size and potentially improve inference speed. Currently, it supports quantization in a packed format.

Current Status

Quantization: Implemented GPTQ and tested quantization to 4 bits for Llama 3.3 4B parameter model using a packed format.
Inference: Inference for quantized models is yet to be run.

Key Files

Run_quantization.ipynb: Jupyter notebook demonstrating the quantization process.
quantize.py: Core script for performing model quantization.
pack_quantized.py: Script for packing the quantized model components.
helpers.py: Utility functions supporting the quantization process.
load_datasets.py: Script for loading datasets used during quantization.
requirements.txt: Lists the necessary Python packages for this project.
hooks.py: Contains hooks used during the quantization process.
test.py: Contains tests for the project.

Setup

Clone the repository.

Create a Python virtual environment (recommended):

python -m venv myenv
source myenv/bin/activate  # On Windows use `myenv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Running Quantization

The primary way to run quantization is through the Run_quantization.ipynb notebook. This notebook provides a step-by-step guide and execution environment for quantizing models.

Alternatively, you might be able to run quantization using the scripts directly, though the notebook is the recommended starting point.

Future Work

Run inference for the quantized models.
Test and validate performance on a wider range of models and datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantization and Inference

Description

Current Status

Key Files

Setup

Usage

Running Quantization

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GPTQ		GPTQ
docs		docs
tests		tests
.DS_Store		.DS_Store
README.md		README.md
Run_quantization.ipynb		Run_quantization.ipynb
document.docx		document.docx
helpers.py		helpers.py
hooks.py		hooks.py
llama.py		llama.py
load_datasets.py		load_datasets.py
pack_quantized.py		pack_quantized.py
quantize.py		quantize.py
requirements.txt		requirements.txt
test.py		test.py

vaishdho1/Quantization-Inference

Folders and files

Latest commit

History

Repository files navigation

Quantization and Inference

Description

Current Status

Key Files

Setup

Usage

Running Quantization

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages