Skip to content

Conversation

@wwl2755
Copy link
Contributor

@wwl2755 wwl2755 commented May 15, 2025

Fix #18166 to make CI back to work.

cc: @WoosukKwon @LiuXiaoxuanPKU @robertgshaw2-redhat

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@robertgshaw2-redhat
Copy link
Collaborator

Thank you!

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) May 16, 2025 01:48
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 16, 2025
@wwl2755
Copy link
Contributor Author

wwl2755 commented May 16, 2025

Merged from main to resolve the failing v1 spec test. (Should be fixed by #18223)

@DarkLight1337
Copy link
Member

Tests are still failing, but for a different reason than before.

@wwl2755
Copy link
Contributor Author

wwl2755 commented May 16, 2025

Tests are still failing, but for a different reason than before.

@DarkLight1337 Yes. It seems to be a deeper bug related to Medusa. I have tried to reproduce the bug locally, but it seems a bit different than https://buildkite.com/vllm/ci/builds/20196#0196d7c4-85a1-404e-8459-e4b430c434fd.

May I know how the CI environment is set up? Is using pytest locally equivalent to the CI test on the cloud? Thanks!

Local reproduction code and results:

pytest -v -s tests/spec_decode/e2e/test_medusa_correctness.py::test_medusa_e2e_greedy_correctness_with_preemption
>       run_equality_correctness_test(vllm_runner,
                                      common_llm_kwargs,
                                      per_test_common_llm_kwargs,
                                      baseline_llm_kwargs,
                                      test_llm_kwargs,
                                      batch_size,
                                      max_output_len=output_len,
                                      seed=seed,
                                      temperature=0.0)

tests/spec_decode/e2e/test_medusa_correctness.py:249: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/spec_decode/e2e/conftest.py:211: in run_equality_correctness_test
    with vllm_runner(**sd_args) as vllm_model:
tests/conftest.py:1037: in __exit__
    cleanup_dist_env_and_memory()
vllm/distributed/parallel_state.py:1225: in cleanup_dist_env_and_memory
    torch.cuda.empty_cache()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def empty_cache() -> None:
        r"""Release all unoccupied cached memory currently held by the caching
        allocator so that those can be used in other GPU application and visible in
        `nvidia-smi`.
    
        .. note::
            :func:`~torch.cuda.empty_cache` doesn't increase the amount of GPU
            memory available for PyTorch. However, it may help reduce fragmentation
            of GPU memory in certain cases. See :ref:`cuda-memory-management` for
            more details about GPU memory management.
        """
        if is_initialized():
>           torch._C._cuda_emptyCache()
E           RuntimeError: CUDA error: device-side assert triggered
E           CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
E           For debugging consider passing CUDA_LAUNCH_BLOCKING=1
E           Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@DarkLight1337
Copy link
Member

The environment should be same as requirements/test.txt. Have you tried running the same command as in .buildkite/test-pipeline.yaml? Perhaps it's a problem with cleanup when multiple tests are run in sequence.

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
auto-merge was automatically disabled May 18, 2025 22:49

Head branch was pushed to by a user without write access

@wwl2755 wwl2755 changed the title [Spec Decode][V0] Fix spec decode correctness test in V0 eagle [Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa May 18, 2025
@wwl2755
Copy link
Contributor Author

wwl2755 commented May 18, 2025

@DarkLight1337 The latest commit passed the medusa check locally. PTAL. Thank you!

@wwl2755
Copy link
Contributor Author

wwl2755 commented May 19, 2025

It seems all the V0/Spec decode tests have passed. The failed V1 test should be fixed by #18169

@vllm-bot vllm-bot merged commit 9da1095 into vllm-project:main May 19, 2025
66 of 68 checks passed
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
…vllm-project#18175)

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
@wwl2755 wwl2755 deleted the fix_v0_spec_test branch September 10, 2025 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HELP WANTED] Fix Failing Spec Decoding Test

5 participants