[Doc] Add documentation for FP8 W8A8 #5388

mgoin · 2024-06-10T18:18:09Z

Initial documentation for the FP8 W8A8 quantization feature, detailing how to produce quantized checkpoints models ahead of time using AutoFP8 in addition to the dynamic quantization within vLLM.

comaniac

Overall LGTM. Once we have decided where to put FP8 checkpoints we can add links to this documents too.

docs/source/quantization/fp8.rst

robertgshaw2-neuralmagic

Approved. Can you please add a note that AutoFP8 is in early beta and subject to change

mgoin · 2024-06-11T00:55:09Z

AutoFP8 is in early beta and subject to change

Sure, I'll add it to the repo itself.

* upstream/main: (126 commits) [Bugfix][Frontend] Cleanup "fix chat logprobs" (vllm-project#5026) [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (vllm-project#5312) [Misc] Various simplifications and typing fixes (vllm-project#5368) [ci] Fix Buildkite agent path (vllm-project#5392) [Doc] Add documentation for FP8 W8A8 (vllm-project#5388) Bump version to v0.5.0 (vllm-project#5384) [Docs] Alphabetically sort sponsors (vllm-project#5386) [Docs] Add Docs on Limitations of VLM Support (vllm-project#5383) [ci] Mount buildkite agent on Docker container to upload benchmark results (vllm-project#5330) [ci] Use small_cpu_queue for doc build (vllm-project#5331) [Bugfix] Fix LLaVA-NeXT (vllm-project#5380) [Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest (vllm-project#5319) [Model] Initial support for LLaVA-NeXT (vllm-project#4199) [Misc] Improve error message when LoRA parsing fails (vllm-project#5194) [misc][typo] fix typo (vllm-project#5372) [Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (vllm-project#5374) [Misc] Update to comply with the new `compressed-tensors` config (vllm-project#5350) [Bugfix] Fix KeyError: 1 When Using LoRA adapters (vllm-project#5164) [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (vllm-project#5047) [mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-project#5361) ...

Add documentation for FP8 W8A8

b87244e

comaniac approved these changes Jun 10, 2024

View reviewed changes

docs/source/quantization/fp8.rst Outdated Show resolved Hide resolved

docs/source/quantization/fp8.rst Outdated Show resolved Hide resolved

mgoin added 3 commits June 10, 2024 15:16

Update fp8.rst

09488ab

Add FP8 to toctree

cb78405

Fix inline code and add checkpoint format

555334d

robertgshaw2-neuralmagic self-requested a review June 11, 2024 00:48

robertgshaw2-neuralmagic approved these changes Jun 11, 2024

View reviewed changes

mgoin merged commit 77c87be into vllm-project:main Jun 11, 2024
53 checks passed

mgoin deleted the fp8-quantization-doc branch June 11, 2024 00:55

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 12, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

1535153

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

463860f

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

ade989a

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

f1e96b0

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

867b27a

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Doc] Add documentation for FP8 W8A8 (vllm-project#5388)

10cb1c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add documentation for FP8 W8A8 #5388

[Doc] Add documentation for FP8 W8A8 #5388

mgoin commented Jun 10, 2024

comaniac left a comment

robertgshaw2-neuralmagic left a comment

mgoin commented Jun 11, 2024

[Doc] Add documentation for FP8 W8A8 #5388

[Doc] Add documentation for FP8 W8A8 #5388

Conversation

mgoin commented Jun 10, 2024

comaniac left a comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic left a comment

Choose a reason for hiding this comment

mgoin commented Jun 11, 2024