-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
FlashMLA V1 with FP8 KV cache not yet supported!
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NCCL_SOCKET_IFNAME=bond0 \
GLOO_SOCKET_IFNAME=bond0 \
VLLM_USE_V1=1 \
VLLM_USE_MODELSCOPE=true \
vllm serve /data/models/huggingface.co/deepseek-ai/DeepSeek-R1/DeepSeek-R1-Hzz1 \
--served-model-name deepseek-r1 \
--gpu-memory-utilization 0.8 \
--tensor-parallel-size 8 \
--trust-remote-code \
--enable-chunked-prefill \
--port 8000 \
--kv-cache-dtype fp8 \
--enable-expert-parallel
result:
GPU DEVICE h800 x 8 x 140GB
cuda drive version 550.127.08
vllm version 0.8.5
pip list
Package Version
---------------------------------------- --------------------
accelerate 0.34.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiohttp-cors 0.7.0
aioprometheus 23.12.0
aiosignal 1.3.1
airportsdata 20241001
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
altair 5.5.0
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyascii 0.3.2
anyio 4.4.0
argcomplete 3.5.3
astor 0.8.1
async-timeout 4.0.3
attrdict 2.0.1
attrs 24.2.0
audioread 3.0.1
auto_gptq 0.7.1
autoawq 0.2.5
autoawq_kernels 0.0.9
av 14.0.1
babel 2.16.0
bcrypt 4.2.1
beautifulsoup4 4.12.3
bitsandbytes 0.45.1
black 24.10.0
blake3 1.0.4
blinker 1.9.0
boto3 1.28.64
botocore 1.31.85
cached_path 1.6.7
cachetools 5.5.1
cdifflib 1.2.9
certifi 2019.11.28
cffi 1.17.1
chardet 3.0.4
charset-normalizer 3.3.2
chattts 0.2.1
click 8.1.7
cloudpickle 3.0.0
cn2an 0.5.23
colorama 0.4.6
coloredlogs 15.0.1
colorful 0.5.6
compressed-tensors 0.9.3
conformer 0.3.2
contourpy 1.3.1
controlnet-aux 0.0.7
crcmod 1.7
cryptography 44.0.0
cuda-python 12.6.2.post1
cupy-cuda12x 13.4.0
cycler 0.12.1
Cython 3.0.11
dataclass-wizard 0.35.0
datamodel-code-generator 0.26.5
datasets 2.21.0
dateparser 1.1.8
dbus-python 1.2.16
decorator 5.1.1
decord 0.6.0
DeepCache 0.1.1
Deprecated 1.2.15
depyf 0.18.0
diffusers 0.32.2
dill 0.3.8
diskcache 5.6.3
Distance 0.1.3
distlib 0.3.9
distro 1.9.0
distro-info 0.23+ubuntu1.1
dnspython 2.7.0
docopt 0.6.2
ecdsa 0.19.0
editdistance 0.8.1
einops 0.8.0
einx 0.3.0
email_validator 2.2.0
encodec 0.1.1
eva-decord 0.6.1
exceptiongroup 1.2.2
fastapi 0.115.11
fastapi-cli 0.0.7
fastrlock 0.8.3
ffmpy 0.5.0
filelock 3.17.0
FlagEmbedding 1.2.11
flashinfer 0.1.6+cu124torch2.4
Flask 3.1.0
flatbuffers 25.1.21
fonttools 4.55.5
frozendict 2.4.6
frozenlist 1.4.1
fsspec 2024.6.1
fugashi 1.4.0
funasr 1.1.16
fvcore 0.1.5.post20221221
g2p-en 2.1.0
gdown 5.2.0
gekko 1.2.1
genson 1.3.0
gguf 0.16.3
google-api-core 2.24.0
google-auth 2.38.0
google-cloud-core 2.4.1
google-cloud-storage 2.19.0
google-crc32c 1.6.0
google-resumable-media 2.7.2
googleapis-common-protos 1.66.0
gradio 4.26.0
gradio_client 0.15.1
grpcio 1.70.0
gruut 2.4.0
gruut-ipa 0.13.0
gruut-lang-de 2.0.1
gruut-lang-en 2.0.1
gruut-lang-es 2.0.1
gruut-lang-fr 2.0.2
h11 0.14.0
h2 4.2.0
hf_transfer 0.1.8
hf-xet 1.0.2
hiredis 3.1.0
hpack 4.1.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.2
huggingface-hub 0.30.1
humanfriendly 10.0
hydra-core 1.3.2
Hypercorn 0.17.3
hyperframe 6.1.0
HyperPyYAML 1.2.2
idna 2.8
imageio 2.37.0
imageio-ffmpeg 0.6.0
importlib_metadata 8.0.0
importlib_resources 6.5.2
inflect 5.6.2
iniconfig 2.0.0
interegular 0.3.3
iopath 0.1.10
isort 5.13.2
itsdangerous 2.2.0
jaconv 0.4.0
jamo 0.4.1
jieba 0.42.1
Jinja2 3.1.5
jiter 0.5.0
jj-pytorchvideo 0.1.5
jmespath 0.10.0
joblib 1.4.2
jsonlines 1.2.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kaldiio 2.18.0
kiwisolver 1.4.8
lark 1.2.2
lazy_loader 0.4
libnacl 2.1.0
librosa 0.10.2.post1
lightning 2.5.0.post0
lightning-utilities 0.11.9
linkify-it-py 2.0.3
llama_cpp_python 0.3.4
llguidance 0.7.13
llvmlite 0.44.0
lm-format-enforcer 0.10.11
loguru 0.7.3
loralib 0.1.2
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.10.0
mdit-py-plugins 0.4.2
mdurl 0.1.2
mecab-python3 1.0.10
memray 1.15.0
mistral_common 1.5.4
modelscope 1.18.1
mpmath 1.3.0
msgpack 1.0.8
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
mypy-extensions 1.0.0
nanobind 2.6.1
narwhals 1.23.0
natsort 8.4.0
nemo_text_processing 1.0.2
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.3
nltk 3.9.1
num2words 0.5.14
numba 0.61.2
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
omegaconf 2.3.0
onnxruntime 1.20.1
onnxruntime-gpu 1.16.0
openai 1.60.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-contrib-python-headless 4.11.0.86
opencv-python 4.11.0.86
opencv-python-headless 4.11.0.86
opentelemetry-api 1.26.0
opentelemetry-exporter-otlp 1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc 1.26.0
opentelemetry-exporter-otlp-proto-http 1.26.0
opentelemetry-proto 1.26.0
opentelemetry-sdk 1.26.0
opentelemetry-semantic-conventions 0.47b0
opentelemetry-semantic-conventions-ai 0.4.6
optimum 1.23.3
orjson 3.10.15
ormsgpack 1.7.0
oss2 2.19.1
outlines 0.1.11
outlines_core 0.1.26
packaging 24.1
pandas 2.2.2
parameterized 0.9.0
partial-json-parser 0.2.1.1.post4
passlib 1.7.4
pathspec 0.12.1
peft 0.14.0
pillow 10.4.0
pip 24.3.1
platformdirs 4.3.6
pluggy 1.5.0
pooch 1.8.2
portalocker 3.1.1
prettytable 3.14.0
priority 2.0.0
proces 0.1.7
prometheus_client 0.20.0
prometheus-fastapi-instrumentator 7.0.0
proto-plus 1.25.0
protobuf 4.25.7
psutil 6.0.0
py-cpuinfo 9.0.0
py-spy 0.4.0
pyairports 2.1.1
pyarrow 17.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybase16384 0.3.7
pybind11 2.13.6
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.10.6
pydantic_core 2.27.2
pydub 0.25.1
Pygments 2.19.1
PyGObject 3.36.0
pykakasi 2.3.0
pynini 2.1.5
pynndescent 0.5.13
pyparsing 3.2.1
pypinyin 0.53.0
PySocks 1.7.1
pytest 8.3.4
python-apt 2.0.1+ubuntu0.20.4.1
python-crfsuite 0.9.11
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-jose 3.3.0
python-json-logger 3.2.1
python-multipart 0.0.20
python-rapidjson 1.20
pytorch-lightning 2.5.0.post0
pytorch-wpe 0.0.1
pytz 2024.1
PyYAML 6.0.2
pyzmq 26.2.0
quantile-python 1.1
Quart 0.20.0
qwen-vl-utils 0.0.8
ray 2.43.0
redis 5.2.1
referencing 0.35.1
regex 2024.7.24
requests 2.32.3
requests-unixsocket 0.2.0
rich 13.9.4
rich-toolkit 0.13.2
rouge 1.0.1
rpds-py 0.20.0
rsa 4.9
ruamel.yaml 0.18.10
ruamel.yaml.clib 0.2.12
ruff 0.9.3
s3transfer 0.7.0
sacremoses 0.1.1
safetensors 0.4.4
scikit-image 0.25.0
scikit-learn 1.6.1
scipy 1.15.1
semantic-version 2.10.0
sentence-transformers 3.4.0
sentencepiece 0.2.0
setproctitle 1.3.4
setuptools 75.8.0
sglang 0.4.1.post7
shellingham 1.5.4
silero-vad 5.1.2
six 1.17.0
smart-open 7.1.0
sniffio 1.3.1
soundfile 0.13.0
soupsieve 2.6
soxr 0.5.0.post1
sse-starlette 2.1.3
starlette 0.46.0
sympy 1.13.1
tabulate 0.9.0
taskgroup 0.2.2
tblib 3.0.0
tensorboardX 2.6.2.2
tensorizer 2.9.1
termcolor 2.5.0
text-generation 0.7.0
textual 1.0.0
threadpoolctl 3.5.0
tifffile 2025.1.10
tiktoken 0.7.0
timm 1.0.14
tokenizers 0.21.1
toml 0.10.2
tomli 2.2.1
tomlkit 0.12.0
torch 2.6.0
torch-complex 0.4.4
torchao 0.10.0
torchaudio 2.6.0
torchdiffeq 0.2.5
torchmetrics 1.6.1
torchvision 0.21.0
tqdm 4.66.5
transformers 4.51.3
transformers-stream-generator 0.0.5
triton 3.2.0
tritonclient 2.54.0
typer 0.15.2
typing_extensions 4.12.2
tzdata 2024.1
tzlocal 5.2
uc-micro-py 1.0.3
umap-learn 0.5.7
unattended-upgrades 0.1
unidic-lite 1.0.8
urllib3 2.0.7
uvicorn 0.30.6
uvloop 0.20.0
vector-quantize-pytorch 1.17.3
verovio 4.5.1
virtualenv 20.29.2
vllm 0.8.5
vllm-flash-attn 2.6.1
vocos 0.1.0
watchfiles 0.24.0
wcwidth 0.2.13
websockets 11.0.3
Werkzeug 3.1.3
WeTextProcessing 1.0.3
wget 3.2
wheel 0.34.2
wrapt 1.17.2
wsproto 1.2.0
x-transformers 1.44.6
xformers 0.0.29.post2
xgrammar 0.1.18
xinference 1.2.2
xoscar 0.4.6
xxhash 3.5.0
yacs 0.1.8
yarl 1.9.9
zipp 3.20.1
zstandard 0.23.0
🐛 Describe the bug
FlashMLA V1 with FP8 KV cache not yet supported!
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
