Quantizers is a library that provides an easy-to-use interface for quantizing LLMs into various formats by using YAML configs.
- Linux
- Windows
- macOS
- GGUF
- ExLlamaV2
- GPTQ
- AWQ
- AQLM
- QuIP
- QuIP#
- HQQ
- HQQ+
- SqueezeLLM
- Marlin
- EETQ
- SmoothQuant
- Bitsandbytes
- TensorRT-LLM
To get started, clone the repo recursively:
git clone https://github.com/PygmalionAI/quantizers.git
cd quantizers
git submodule update --init --recursive
python3 -m pip install -e .
python3 -m pip install -r requirements.txt
To build with GPU support (currently for imatrix only), run this instead:
LLAMA_CUBLAS=1 python3 -m pip install -e .
Only GGUF is supported for now. You will need a YAML config file. An example is provided in the examples directory.
Once you've filled out your YAML file, run:
quantizers examples/gguf/config.yaml
At the moment, we don't accept feature contributions until we've finished supporting all the planned quantization methods. PRs for bug fixes and OS support are welcome!