Skip to content

Conversation

@QiliangCui
Copy link
Contributor

@QiliangCui QiliangCui commented Jul 10, 2025

Purpose

Temporally remove the google/gemma-3-1b-it to unblock testing.

Because

  1. buildkite shows time out sometimes.
  2. local run always gives me error: AssertionError: Expected: 0.25 | Measured: 0.13191811978771797
  3. This PR added the test but it seems the test actually didn't pass in the buildkite run.

Test Plan

Test and see result on fast check.

Test Result

The test passes in fastcheck
image

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @QiliangCui, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request temporarily removes the 'google/gemma-3-1b-it' model from the accuracy tests to address timeout issues and assertion errors encountered during testing. This change aims to unblock the testing process and allow for further investigation of the model's behavior.

Highlights

  • Temporary Removal: Temporarily removed 'google/gemma-3-1b-it' from the list of models to unblock testing due to timeout and assertion errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request temporarily disables a failing test for the google/gemma-3-1b-it model by commenting it out to unblock testing. My review focuses on improving the maintainability of this temporary change. I've suggested adding TODO comments to provide context for why the test is disabled and to ensure it's re-enabled correctly in the future. This will help other developers understand the state of the tests at a glance.

@QiliangCui QiliangCui changed the title Temporally suspend google/gemma-3-1b-it. Temporarily suspend google/gemma-3-1b-it. Jul 10, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jul 10, 2025

We should only skip this test for TPU, since the regular CI passes

@QiliangCui QiliangCui force-pushed the update_test2 branch 2 times, most recently from dfc4587 to 32016fa Compare July 10, 2025 14:46
@QiliangCui
Copy link
Contributor Author

We should only skip this test for TPU, since the regular CI passes

I don't think other CI is using this.

If they do, I think it should fail today as well because the assertion of Expected: 0.25 is no longer true even for GPU.

And I as pointed out in the comment, this test was added in the first place because of TPU change.

@QiliangCui QiliangCui force-pushed the update_test2 branch 2 times, most recently from 9089fb2 to c22082d Compare July 10, 2025 23:34
@vanbasten23
Copy link
Collaborator

Thanks @QiliangCui for looking into it.

Some anecdote, at the time when the pr is merged, the test pass locally but failed in the CI. The failure in that CI was caused by a breaking change and fixed a few days later #16059

What's the current failure in the CI? If it is https://buildkite.com/vllm/ci/builds/23569/steps/canvas?jid=0197f302-6b71-4b36-bb01-9b250a0349b0, ie

INFO 07-10 06:49:59 [weight_utils.py:292] Using model weights format ['*.safetensors']
--
  | Fatal Python error: Segmentation fault
  |  
  | Current thread 0x00007b986ea00700 (most recent call first):
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 627 in xet_get
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1710 in _download_to_tmp_and_move
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1161 in _hf_hub_download_to_cache_dir
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1008 in hf_hub_download
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114 in _inner_fn
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 301 in _inner_hf_hub_download
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  | File "/usr/local/lib/python3.10/threading.py", line 953 in run

then the issue seems to be huggingface failed to download the model. If it's the case, then it's likely the issue will recur even we suspend this model. But I'm curious if other vLLM folks have seen the error (huggingface fails to download the model)

@DarkLight1337
Copy link
Member

According to Buildkite, the test is being run on main CI currently, and passes: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/650bf63b-9929-8ff0-8ddb-791931888d24?period=1day&tags=scm.branch%3Amain

@QiliangCui
Copy link
Contributor Author

Thanks @QiliangCui for looking into it.

Some anecdote, at the time when the pr is merged, the test pass locally but failed in the CI. The failure in that CI was caused by a breaking change and fixed a few days later #16059

What's the current failure in the CI? If it is https://buildkite.com/vllm/ci/builds/23569/steps/canvas?jid=0197f302-6b71-4b36-bb01-9b250a0349b0, ie

INFO 07-10 06:49:59 [weight_utils.py:292] Using model weights format ['*.safetensors']
--
  | Fatal Python error: Segmentation fault
  |  
  | Current thread 0x00007b986ea00700 (most recent call first):
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 627 in xet_get
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1710 in _download_to_tmp_and_move
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1161 in _hf_hub_download_to_cache_dir
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1008 in hf_hub_download
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114 in _inner_fn
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 301 in _inner_hf_hub_download
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  | File "/usr/local/lib/python3.10/threading.py", line 953 in run

then the issue seems to be huggingface failed to download the model. If it's the case, then it's likely the issue will recur even we suspend this model. But I'm curious if other vLLM folks have seen the error (huggingface fails to download the model)

Right! there are 2 common way it fails today:

  1. Segmentation fault. ----> which is luck.
  2. wait for 4 hours and timeout. --> which is worse. for example this

Since this is a known blocking issue, I think it is better way disable it for now, make the TPU V1 Test green, investigate the problem and re-enable. I don't thnk leaving it fail is a good.

Do you have a better idea?

@QiliangCui
Copy link
Contributor Author

According to Buildkite, the test is being run on main CI currently, and passes: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/650bf63b-9929-8ff0-8ddb-791931888d24?period=1day&tags=scm.branch%3Amain

Thank you for finding this. :)
Since it worked for someone, I will just skip it for TPU for now.

Copy link
Member

@DarkLight1337 DarkLight1337 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest.skip

Signed-off-by: Qiliang Cui <derrhein@gmail.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 11, 2025 06:12
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 11, 2025
@DarkLight1337 DarkLight1337 merged commit b4f0b5f into vllm-project:main Jul 11, 2025
44 of 47 checks passed
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants