Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/features/quantization/auto_awq.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github
Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint.
The main benefits are lower latency and memory usage.

You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?sort=trending&search=awq).
You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?search=awq).

```console
pip install autoawq
Expand Down
4 changes: 2 additions & 2 deletions docs/source/features/quantization/bitblas.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ vLLM reads the model's config file and supports pre-quantized checkpoints.

You can find pre-quantized models on:

- [Hugging Face (BitBLAS)](https://huggingface.co/models?other=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?other=gptq)
- [Hugging Face (BitBLAS)](https://huggingface.co/models?search=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?search=gptq)

Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/quantization/bnb.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ pip install bitsandbytes>=0.45.3

vLLM reads the model's config file and supports both in-flight quantization and pre-quantized checkpoint.

You can find bitsandbytes quantized models on <https://huggingface.co/models?other=bitsandbytes>.
You can find bitsandbytes quantized models on <https://huggingface.co/models?search=bitsandbytes>.
And usually, these repositories have a config.json file that includes a quantization_config section.

## Read quantized checkpoint
Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/quantization/gptqmodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ for more details on this and other advanced features.

## Installation

You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?sort=trending&search=gptq).
You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?search=gptq).

```console
pip install -U gptqmodel --no-build-isolation -v
Expand Down
3 changes: 1 addition & 2 deletions docs/source/features/quantization/torchao.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,4 @@ tokenizer.push_to_hub(hub_repo)
quantized_model.push_to_hub(hub_repo, safe_serialization=False)
```

Alternatively, you can use the TorchAO Quantization space for quantizing models with a simple UI.
See: https://huggingface.co/spaces/medmekk/TorchAO_Quantization
Alternatively, you can use the [TorchAO Quantization space](https://huggingface.co/spaces/medmekk/TorchAO_Quantization) for quantizing models with a simple UI.