From cb8a746834ac2c6ed5ddb93bcc0b3af46db428c6 Mon Sep 17 00:00:00 2001 From: reidliu41 Date: Fri, 25 Apr 2025 19:29:25 +0800 Subject: [PATCH] [doc] update wrong hf model links Signed-off-by: reidliu41 --- docs/source/features/quantization/auto_awq.md | 2 +- docs/source/features/quantization/bitblas.md | 4 ++-- docs/source/features/quantization/bnb.md | 2 +- docs/source/features/quantization/gptqmodel.md | 2 +- docs/source/features/quantization/torchao.md | 3 +-- 5 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/source/features/quantization/auto_awq.md b/docs/source/features/quantization/auto_awq.md index d8773dbd7b20..b4ac597f5a79 100644 --- a/docs/source/features/quantization/auto_awq.md +++ b/docs/source/features/quantization/auto_awq.md @@ -6,7 +6,7 @@ To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint. The main benefits are lower latency and memory usage. -You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?sort=trending&search=awq). +You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?search=awq). ```console pip install autoawq diff --git a/docs/source/features/quantization/bitblas.md b/docs/source/features/quantization/bitblas.md index 2901f760d3e4..d0b2bf858c9b 100644 --- a/docs/source/features/quantization/bitblas.md +++ b/docs/source/features/quantization/bitblas.md @@ -20,8 +20,8 @@ vLLM reads the model's config file and supports pre-quantized checkpoints. You can find pre-quantized models on: -- [Hugging Face (BitBLAS)](https://huggingface.co/models?other=bitblas) -- [Hugging Face (GPTQ)](https://huggingface.co/models?other=gptq) +- [Hugging Face (BitBLAS)](https://huggingface.co/models?search=bitblas) +- [Hugging Face (GPTQ)](https://huggingface.co/models?search=gptq) Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section. diff --git a/docs/source/features/quantization/bnb.md b/docs/source/features/quantization/bnb.md index e356b99d85cd..1843a33a3dfd 100644 --- a/docs/source/features/quantization/bnb.md +++ b/docs/source/features/quantization/bnb.md @@ -14,7 +14,7 @@ pip install bitsandbytes>=0.45.3 vLLM reads the model's config file and supports both in-flight quantization and pre-quantized checkpoint. -You can find bitsandbytes quantized models on . +You can find bitsandbytes quantized models on . And usually, these repositories have a config.json file that includes a quantization_config section. ## Read quantized checkpoint diff --git a/docs/source/features/quantization/gptqmodel.md b/docs/source/features/quantization/gptqmodel.md index 0a1cb0c3d349..dafaac842795 100644 --- a/docs/source/features/quantization/gptqmodel.md +++ b/docs/source/features/quantization/gptqmodel.md @@ -18,7 +18,7 @@ for more details on this and other advanced features. ## Installation -You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?sort=trending&search=gptq). +You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?search=gptq). ```console pip install -U gptqmodel --no-build-isolation -v diff --git a/docs/source/features/quantization/torchao.md b/docs/source/features/quantization/torchao.md index 9a85f0bab9ec..82100c6ddcac 100644 --- a/docs/source/features/quantization/torchao.md +++ b/docs/source/features/quantization/torchao.md @@ -30,5 +30,4 @@ tokenizer.push_to_hub(hub_repo) quantized_model.push_to_hub(hub_repo, safe_serialization=False) ``` -Alternatively, you can use the TorchAO Quantization space for quantizing models with a simple UI. -See: https://huggingface.co/spaces/medmekk/TorchAO_Quantization +Alternatively, you can use the [TorchAO Quantization space](https://huggingface.co/spaces/medmekk/TorchAO_Quantization) for quantizing models with a simple UI.