-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa #18175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Thank you! |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
Merged from main to resolve the failing v1 spec test. (Should be fixed by #18223) |
|
Tests are still failing, but for a different reason than before. |
@DarkLight1337 Yes. It seems to be a deeper bug related to Medusa. I have tried to reproduce the bug locally, but it seems a bit different than https://buildkite.com/vllm/ci/builds/20196#0196d7c4-85a1-404e-8459-e4b430c434fd. May I know how the CI environment is set up? Is using pytest locally equivalent to the CI test on the cloud? Thanks! Local reproduction code and results: pytest -v -s tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness_with_preemption> run_equality_correctness_test(vllm_runner,
common_llm_kwargs,
per_test_common_llm_kwargs,
baseline_llm_kwargs,
test_llm_kwargs,
batch_size,
max_output_len=output_len,
seed=seed,
temperature=0.0)
tests/spec_decode/e2e/test_medusa_correctness.py:249:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/spec_decode/e2e/conftest.py:211: in run_equality_correctness_test
with vllm_runner(**sd_args) as vllm_model:
tests/conftest.py:1037: in __exit__
cleanup_dist_env_and_memory()
vllm/distributed/parallel_state.py:1225: in cleanup_dist_env_and_memory
torch.cuda.empty_cache()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def empty_cache() -> None:
r"""Release all unoccupied cached memory currently held by the caching
allocator so that those can be used in other GPU application and visible in
`nvidia-smi`.
.. note::
:func:`~torch.cuda.empty_cache` doesn't increase the amount of GPU
memory available for PyTorch. However, it may help reduce fragmentation
of GPU memory in certain cases. See :ref:`cuda-memory-management` for
more details about GPU memory management.
"""
if is_initialized():
> torch._C._cuda_emptyCache()
E RuntimeError: CUDA error: device-side assert triggered
E CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
E For debugging consider passing CUDA_LAUNCH_BLOCKING=1
E Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. |
|
The environment should be same as |
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Head branch was pushed to by a user without write access
|
@DarkLight1337 The latest commit passed the medusa check locally. PTAL. Thank you! |
|
It seems all the V0/Spec decode tests have passed. The failed V1 test should be fixed by #18169 |
…vllm-project#18175) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Fix #18166 to make CI back to work.
cc: @WoosukKwon @LiuXiaoxuanPKU @robertgshaw2-redhat