Skip to content

2stacks/QuantizeLLMs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Quantizing LLMs with llama.cpp

Learn how to efficiently quantize Large Language Models (LLMs) using llama.cpp. This process allows for significant reduction in model size and inference time, making it possible to deploy state-of-the-art models on devices with limited resources. Introduction

Model quantization is a crucial step in optimizing AI models for performance and efficiency. By reducing the precision of the numbers used in a model's weights and activations, we can achieve smaller model sizes and faster inference times. This repository provides a step-by-step guide and the necessary tools to quantize any LLM available on Hugging Face using llama.cpp.

Setup

This tutorial is designed to be accessible across various platforms, including Google Colab and local setups such as MacBooks with Apple Silicon.

Requirements

Python 3.x
CUDA-compatible GPU (for acceleration)
Hugging Face account (for model access)

Installation

Clone this repository to get started:

git clone https://github.com/<your-username>/quantize-llm-llama.cpp
cd quantize-llm-llama.cpp

Follow the instructions in the notebook for detailed setup and execution steps.

Usage

The main content of this repository is a Jupyter Notebook that outlines the process of downloading, converting, quantizing, and testing a model from Hugging Face Hub. To use this notebook:

Open the notebook in Google Colab or your local Jupyter environment.
Ensure you have a CUDA-compatible GPU available for use.
Follow the steps in the notebook, which will guide you through the process of quantizing a model.

Hugging Face Authentication

To download models from Hugging Face Hub, you'll need to authenticate using your Hugging Face token. In Google Colab, use the "Secrets" feature to securely store your token. Locally, you can set the HF_TOKEN environment variable or use the Hugging Face CLI.

Contributing

I welcome contributions and suggestions! Feel free to fork this repository, make your improvements, and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Note: When uploading the model, please ensure to replace with your actual Hugging Face username and adjust any paths or URLs as necessary.

About

Learn to quantize LLMs on HuggingFace

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%