diff --git a/docs/source/features/quantization/bitblas.md b/docs/source/features/quantization/bitblas.md index aff917f90ec2..2901f760d3e4 100644 --- a/docs/source/features/quantization/bitblas.md +++ b/docs/source/features/quantization/bitblas.md @@ -1,7 +1,15 @@ +(bitblas)= + # BitBLAS vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations. +:::{note} +Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`). +Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper. +For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html). +::: + Below are the steps to utilize BitBLAS with vLLM. ```console