-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Your current environment
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
tcpxResult_t tcpxInit(tcpxDebugLogger_t):379 NET/GPUDirectTCPX : GPUDirectTCPX enable: 1Warning: please use at least NVCC 12.9 for the best DeepGEMM performancetcpxResult_t tcpxInit(tcpxDebugLogger_t):379 NET/GPUDirectTCPX : GPUDirectTCPX enable: 1Warning: please use at least NVCC 12.9 for the best DeepGEMM performancetcpxResult_t tcpxInit(tcpxDebugLogger_t):379 NET/GPUDirectTCPX : GPUDirectTCPX enable: 1Warning: please use at least NVCC 12.9 for the best DeepGEMM performancetcpxResult_t tcpxInit(tcpxDebugLogger_t):379 NET/GPUDirectTCPX : GPUDirectTCPX enable: 1Warning: please use at least NVCC 12.9 for the best DeepGEMM performance(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] WorkerProc hit an exception.
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] Traceback (most recent call last):
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] output = func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] return func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] return func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1726, in execute_model
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] RuntimeError: CUDA error: device-side assert triggered
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596]
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] Traceback (most recent call last):
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 591, in worker_busy_loop
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] output = func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] return func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] return func(*args, **kwargs)
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1726, in execute_model
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] RuntimeError: CUDA error: device-side assert triggered
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker TP0 pid=406) ERROR 08-16 01:07:56 [multiproc_executor.py:596] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
🐛 Describe the bug
Running Qwen3MoE with FI sampler. This error goes away when export VLLM_USE_FLASHINFER_SAMPLER=0
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working