Skip to content

[Bug]: FlashMLA V1 with FP8 KV cache not yet supported! #18887

@tensorflowt

Description

@tensorflowt

Your current environment

FlashMLA V1 with FP8 KV cache not yet supported!

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NCCL_SOCKET_IFNAME=bond0 \
GLOO_SOCKET_IFNAME=bond0 \
VLLM_USE_V1=1 \
VLLM_USE_MODELSCOPE=true \
vllm serve /data/models/huggingface.co/deepseek-ai/DeepSeek-R1/DeepSeek-R1-Hzz1 \
--served-model-name deepseek-r1 \
--gpu-memory-utilization 0.8 \
--tensor-parallel-size 8  \
--trust-remote-code \
--enable-chunked-prefill \
--port 8000 \
--kv-cache-dtype fp8 \
--enable-expert-parallel

result:

Image

GPU DEVICE h800 x 8 x 140GB
cuda drive version 550.127.08
vllm version 0.8.5

pip list

Package                                  Version
---------------------------------------- --------------------
accelerate                               0.34.0
aiofiles                                 23.2.1
aiohappyeyeballs                         2.4.0
aiohttp                                  3.10.5
aiohttp-cors                             0.7.0
aioprometheus                            23.12.0
aiosignal                                1.3.1
airportsdata                             20241001
aliyun-python-sdk-core                   2.16.0
aliyun-python-sdk-kms                    2.16.5
altair                                   5.5.0
annotated-types                          0.7.0
antlr4-python3-runtime                   4.9.3
anyascii                                 0.3.2
anyio                                    4.4.0
argcomplete                              3.5.3
astor                                    0.8.1
async-timeout                            4.0.3
attrdict                                 2.0.1
attrs                                    24.2.0
audioread                                3.0.1
auto_gptq                                0.7.1
autoawq                                  0.2.5
autoawq_kernels                          0.0.9
av                                       14.0.1
babel                                    2.16.0
bcrypt                                   4.2.1
beautifulsoup4                           4.12.3
bitsandbytes                             0.45.1
black                                    24.10.0
blake3                                   1.0.4
blinker                                  1.9.0
boto3                                    1.28.64
botocore                                 1.31.85
cached_path                              1.6.7
cachetools                               5.5.1
cdifflib                                 1.2.9
certifi                                  2019.11.28
cffi                                     1.17.1
chardet                                  3.0.4
charset-normalizer                       3.3.2
chattts                                  0.2.1
click                                    8.1.7
cloudpickle                              3.0.0
cn2an                                    0.5.23
colorama                                 0.4.6
coloredlogs                              15.0.1
colorful                                 0.5.6
compressed-tensors                       0.9.3
conformer                                0.3.2
contourpy                                1.3.1
controlnet-aux                           0.0.7
crcmod                                   1.7
cryptography                             44.0.0
cuda-python                              12.6.2.post1
cupy-cuda12x                             13.4.0
cycler                                   0.12.1
Cython                                   3.0.11
dataclass-wizard                         0.35.0
datamodel-code-generator                 0.26.5
datasets                                 2.21.0
dateparser                               1.1.8
dbus-python                              1.2.16
decorator                                5.1.1
decord                                   0.6.0
DeepCache                                0.1.1
Deprecated                               1.2.15
depyf                                    0.18.0
diffusers                                0.32.2
dill                                     0.3.8
diskcache                                5.6.3
Distance                                 0.1.3
distlib                                  0.3.9
distro                                   1.9.0
distro-info                              0.23+ubuntu1.1
dnspython                                2.7.0
docopt                                   0.6.2
ecdsa                                    0.19.0
editdistance                             0.8.1
einops                                   0.8.0
einx                                     0.3.0
email_validator                          2.2.0
encodec                                  0.1.1
eva-decord                               0.6.1
exceptiongroup                           1.2.2
fastapi                                  0.115.11
fastapi-cli                              0.0.7
fastrlock                                0.8.3
ffmpy                                    0.5.0
filelock                                 3.17.0
FlagEmbedding                            1.2.11
flashinfer                               0.1.6+cu124torch2.4
Flask                                    3.1.0
flatbuffers                              25.1.21
fonttools                                4.55.5
frozendict                               2.4.6
frozenlist                               1.4.1
fsspec                                   2024.6.1
fugashi                                  1.4.0
funasr                                   1.1.16
fvcore                                   0.1.5.post20221221
g2p-en                                   2.1.0
gdown                                    5.2.0
gekko                                    1.2.1
genson                                   1.3.0
gguf                                     0.16.3
google-api-core                          2.24.0
google-auth                              2.38.0
google-cloud-core                        2.4.1
google-cloud-storage                     2.19.0
google-crc32c                            1.6.0
google-resumable-media                   2.7.2
googleapis-common-protos                 1.66.0
gradio                                   4.26.0
gradio_client                            0.15.1
grpcio                                   1.70.0
gruut                                    2.4.0
gruut-ipa                                0.13.0
gruut-lang-de                            2.0.1
gruut-lang-en                            2.0.1
gruut-lang-es                            2.0.1
gruut-lang-fr                            2.0.2
h11                                      0.14.0
h2                                       4.2.0
hf_transfer                              0.1.8
hf-xet                                   1.0.2
hiredis                                  3.1.0
hpack                                    4.1.0
httpcore                                 1.0.5
httptools                                0.6.1
httpx                                    0.27.2
huggingface-hub                          0.30.1
humanfriendly                            10.0
hydra-core                               1.3.2
Hypercorn                                0.17.3
hyperframe                               6.1.0
HyperPyYAML                              1.2.2
idna                                     2.8
imageio                                  2.37.0
imageio-ffmpeg                           0.6.0
importlib_metadata                       8.0.0
importlib_resources                      6.5.2
inflect                                  5.6.2
iniconfig                                2.0.0
interegular                              0.3.3
iopath                                   0.1.10
isort                                    5.13.2
itsdangerous                             2.2.0
jaconv                                   0.4.0
jamo                                     0.4.1
jieba                                    0.42.1
Jinja2                                   3.1.5
jiter                                    0.5.0
jj-pytorchvideo                          0.1.5
jmespath                                 0.10.0
joblib                                   1.4.2
jsonlines                                1.2.0
jsonschema                               4.23.0
jsonschema-specifications                2023.12.1
kaldiio                                  2.18.0
kiwisolver                               1.4.8
lark                                     1.2.2
lazy_loader                              0.4
libnacl                                  2.1.0
librosa                                  0.10.2.post1
lightning                                2.5.0.post0
lightning-utilities                      0.11.9
linkify-it-py                            2.0.3
llama_cpp_python                         0.3.4
llguidance                               0.7.13
llvmlite                                 0.44.0
lm-format-enforcer                       0.10.11
loguru                                   0.7.3
loralib                                  0.1.2
markdown-it-py                           3.0.0
MarkupSafe                               2.1.5
matplotlib                               3.10.0
mdit-py-plugins                          0.4.2
mdurl                                    0.1.2
mecab-python3                            1.0.10
memray                                   1.15.0
mistral_common                           1.5.4
modelscope                               1.18.1
mpmath                                   1.3.0
msgpack                                  1.0.8
msgspec                                  0.18.6
multidict                                6.0.5
multiprocess                             0.70.16
mypy-extensions                          1.0.0
nanobind                                 2.6.1
narwhals                                 1.23.0
natsort                                  8.4.0
nemo_text_processing                     1.0.2
nest-asyncio                             1.6.0
networkx                                 3.3
ninja                                    1.11.1.3
nltk                                     3.9.1
num2words                                0.5.14
numba                                    0.61.2
numpy                                    1.26.4
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-ml-py                             12.560.30
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu12                         12.4.127
omegaconf                                2.3.0
onnxruntime                              1.20.1
onnxruntime-gpu                          1.16.0
openai                                   1.60.0
opencensus                               0.11.4
opencensus-context                       0.1.3
opencv-contrib-python-headless           4.11.0.86
opencv-python                            4.11.0.86
opencv-python-headless                   4.11.0.86
opentelemetry-api                        1.26.0
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.26.0
opentelemetry-semantic-conventions       0.47b0
opentelemetry-semantic-conventions-ai    0.4.6
optimum                                  1.23.3
orjson                                   3.10.15
ormsgpack                                1.7.0
oss2                                     2.19.1
outlines                                 0.1.11
outlines_core                            0.1.26
packaging                                24.1
pandas                                   2.2.2
parameterized                            0.9.0
partial-json-parser                      0.2.1.1.post4
passlib                                  1.7.4
pathspec                                 0.12.1
peft                                     0.14.0
pillow                                   10.4.0
pip                                      24.3.1
platformdirs                             4.3.6
pluggy                                   1.5.0
pooch                                    1.8.2
portalocker                              3.1.1
prettytable                              3.14.0
priority                                 2.0.0
proces                                   0.1.7
prometheus_client                        0.20.0
prometheus-fastapi-instrumentator        7.0.0
proto-plus                               1.25.0
protobuf                                 4.25.7
psutil                                   6.0.0
py-cpuinfo                               9.0.0
py-spy                                   0.4.0
pyairports                               2.1.1
pyarrow                                  17.0.0
pyasn1                                   0.6.1
pyasn1_modules                           0.4.1
pybase16384                              0.3.7
pybind11                                 2.13.6
pycountry                                24.6.1
pycparser                                2.22
pycryptodome                             3.21.0
pydantic                                 2.10.6
pydantic_core                            2.27.2
pydub                                    0.25.1
Pygments                                 2.19.1
PyGObject                                3.36.0
pykakasi                                 2.3.0
pynini                                   2.1.5
pynndescent                              0.5.13
pyparsing                                3.2.1
pypinyin                                 0.53.0
PySocks                                  1.7.1
pytest                                   8.3.4
python-apt                               2.0.1+ubuntu0.20.4.1
python-crfsuite                          0.9.11
python-dateutil                          2.9.0.post0
python-dotenv                            1.0.1
python-jose                              3.3.0
python-json-logger                       3.2.1
python-multipart                         0.0.20
python-rapidjson                         1.20
pytorch-lightning                        2.5.0.post0
pytorch-wpe                              0.0.1
pytz                                     2024.1
PyYAML                                   6.0.2
pyzmq                                    26.2.0
quantile-python                          1.1
Quart                                    0.20.0
qwen-vl-utils                            0.0.8
ray                                      2.43.0
redis                                    5.2.1
referencing                              0.35.1
regex                                    2024.7.24
requests                                 2.32.3
requests-unixsocket                      0.2.0
rich                                     13.9.4
rich-toolkit                             0.13.2
rouge                                    1.0.1
rpds-py                                  0.20.0
rsa                                      4.9
ruamel.yaml                              0.18.10
ruamel.yaml.clib                         0.2.12
ruff                                     0.9.3
s3transfer                               0.7.0
sacremoses                               0.1.1
safetensors                              0.4.4
scikit-image                             0.25.0
scikit-learn                             1.6.1
scipy                                    1.15.1
semantic-version                         2.10.0
sentence-transformers                    3.4.0
sentencepiece                            0.2.0
setproctitle                             1.3.4
setuptools                               75.8.0
sglang                                   0.4.1.post7
shellingham                              1.5.4
silero-vad                               5.1.2
six                                      1.17.0
smart-open                               7.1.0
sniffio                                  1.3.1
soundfile                                0.13.0
soupsieve                                2.6
soxr                                     0.5.0.post1
sse-starlette                            2.1.3
starlette                                0.46.0
sympy                                    1.13.1
tabulate                                 0.9.0
taskgroup                                0.2.2
tblib                                    3.0.0
tensorboardX                             2.6.2.2
tensorizer                               2.9.1
termcolor                                2.5.0
text-generation                          0.7.0
textual                                  1.0.0
threadpoolctl                            3.5.0
tifffile                                 2025.1.10
tiktoken                                 0.7.0
timm                                     1.0.14
tokenizers                               0.21.1
toml                                     0.10.2
tomli                                    2.2.1
tomlkit                                  0.12.0
torch                                    2.6.0
torch-complex                            0.4.4
torchao                                  0.10.0
torchaudio                               2.6.0
torchdiffeq                              0.2.5
torchmetrics                             1.6.1
torchvision                              0.21.0
tqdm                                     4.66.5
transformers                             4.51.3
transformers-stream-generator            0.0.5
triton                                   3.2.0
tritonclient                             2.54.0
typer                                    0.15.2
typing_extensions                        4.12.2
tzdata                                   2024.1
tzlocal                                  5.2
uc-micro-py                              1.0.3
umap-learn                               0.5.7
unattended-upgrades                      0.1
unidic-lite                              1.0.8
urllib3                                  2.0.7
uvicorn                                  0.30.6
uvloop                                   0.20.0
vector-quantize-pytorch                  1.17.3
verovio                                  4.5.1
virtualenv                               20.29.2
vllm                                     0.8.5
vllm-flash-attn                          2.6.1
vocos                                    0.1.0
watchfiles                               0.24.0
wcwidth                                  0.2.13
websockets                               11.0.3
Werkzeug                                 3.1.3
WeTextProcessing                         1.0.3
wget                                     3.2
wheel                                    0.34.2
wrapt                                    1.17.2
wsproto                                  1.2.0
x-transformers                           1.44.6
xformers                                 0.0.29.post2
xgrammar                                 0.1.18
xinference                               1.2.2
xoscar                                   0.4.6
xxhash                                   3.5.0
yacs                                     0.1.8
yarl                                     1.9.9
zipp                                     3.20.1
zstandard                                0.23.0

🐛 Describe the bug

FlashMLA V1 with FP8 KV cache not yet supported!

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions