Temporarily suspend google/gemma-3-1b-it. #20722

QiliangCui · 2025-07-10T00:06:56Z

Purpose

Temporally remove the google/gemma-3-1b-it to unblock testing.

Because

buildkite shows time out sometimes.
local run always gives me error: AssertionError: Expected: 0.25 | Measured: 0.13191811978771797
This PR added the test but it seems the test actually didn't pass in the buildkite run.

Test Plan

Test and see result on fast check.

Test Result

The test passes in fastcheck

github-actions · 2025-07-10T00:07:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @QiliangCui, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request temporarily removes the 'google/gemma-3-1b-it' model from the accuracy tests to address timeout issues and assertion errors encountered during testing. This change aims to unblock the testing process and allow for further investigation of the model's behavior.

Highlights

Temporary Removal: Temporarily removed 'google/gemma-3-1b-it' from the list of models to unblock testing due to timeout and assertion errors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request temporarily disables a failing test for the google/gemma-3-1b-it model by commenting it out to unblock testing. My review focuses on improving the maintainability of this temporary change. I've suggested adding TODO comments to provide context for why the test is disabled and to ensure it's re-enabled correctly in the future. This will help other developers understand the state of the tests at a glance.

tests/entrypoints/llm/test_accuracy.py

DarkLight1337 · 2025-07-10T05:31:44Z

We should only skip this test for TPU, since the regular CI passes

QiliangCui · 2025-07-10T21:15:32Z

We should only skip this test for TPU, since the regular CI passes

I don't think other CI is using this.

If they do, I think it should fail today as well because the assertion of Expected: 0.25 is no longer true even for GPU.

And I as pointed out in the comment, this test was added in the first place because of TPU change.

vanbasten23 · 2025-07-11T00:00:24Z

Thanks @QiliangCui for looking into it.

Some anecdote, at the time when the pr is merged, the test pass locally but failed in the CI. The failure in that CI was caused by a breaking change and fixed a few days later #16059

What's the current failure in the CI? If it is https://buildkite.com/vllm/ci/builds/23569/steps/canvas?jid=0197f302-6b71-4b36-bb01-9b250a0349b0, ie

INFO 07-10 06:49:59 [weight_utils.py:292] Using model weights format ['*.safetensors']
--
  | Fatal Python error: Segmentation fault
  |  
  | Current thread 0x00007b986ea00700 (most recent call first):
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 627 in xet_get
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1710 in _download_to_tmp_and_move
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1161 in _hf_hub_download_to_cache_dir
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1008 in hf_hub_download
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114 in _inner_fn
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 301 in _inner_hf_hub_download
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  | File "/usr/local/lib/python3.10/threading.py", line 953 in run

then the issue seems to be huggingface failed to download the model. If it's the case, then it's likely the issue will recur even we suspend this model. But I'm curious if other vLLM folks have seen the error (huggingface fails to download the model)

DarkLight1337 · 2025-07-11T03:43:26Z

According to Buildkite, the test is being run on main CI currently, and passes: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/650bf63b-9929-8ff0-8ddb-791931888d24?period=1day&tags=scm.branch%3Amain

QiliangCui · 2025-07-11T05:11:45Z

Thanks @QiliangCui for looking into it.

Some anecdote, at the time when the pr is merged, the test pass locally but failed in the CI. The failure in that CI was caused by a breaking change and fixed a few days later #16059

What's the current failure in the CI? If it is https://buildkite.com/vllm/ci/builds/23569/steps/canvas?jid=0197f302-6b71-4b36-bb01-9b250a0349b0, ie
INFO 07-10 06:49:59 [weight_utils.py:292] Using model weights format ['*.safetensors']
--
  | Fatal Python error: Segmentation fault
  |  
  | Current thread 0x00007b986ea00700 (most recent call first):
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 627 in xet_get
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1710 in _download_to_tmp_and_move
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1161 in _hf_hub_download_to_cache_dir
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1008 in hf_hub_download
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114 in _inner_fn
  | File "/usr/local/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 301 in _inner_hf_hub_download
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  | File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  | File "/usr/local/lib/python3.10/threading.py", line 953 in run
then the issue seems to be huggingface failed to download the model. If it's the case, then it's likely the issue will recur even we suspend this model. But I'm curious if other vLLM folks have seen the error (huggingface fails to download the model)

Right! there are 2 common way it fails today:

Segmentation fault. ----> which is luck.
wait for 4 hours and timeout. --> which is worse. for example this

Since this is a known blocking issue, I think it is better way disable it for now, make the TPU V1 Test green, investigate the problem and re-enable. I don't thnk leaving it fail is a good.

Do you have a better idea?

QiliangCui · 2025-07-11T05:43:43Z

According to Buildkite, the test is being run on main CI currently, and passes: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/650bf63b-9929-8ff0-8ddb-791931888d24?period=1day&tags=scm.branch%3Amain

Thank you for finding this. :)
Since it worked for someone, I will just skip it for TPU for now.

DarkLight1337 · 2025-07-11T05:56:18Z

tests/entrypoints/llm/test_accuracy.py

Use pytest.skip

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

QiliangCui requested review from DarkLight1337, aarnphm, robertgshaw2-redhat and simon-mo as code owners July 10, 2025 00:06

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

tests/entrypoints/llm/test_accuracy.py Outdated Show resolved Hide resolved

tests/entrypoints/llm/test_accuracy.py Outdated Show resolved Hide resolved

QiliangCui changed the title ~~Temporally suspend google/gemma-3-1b-it.~~ Temporarily suspend google/gemma-3-1b-it. Jul 10, 2025

QiliangCui force-pushed the update_test2 branch 2 times, most recently from dfc4587 to 32016fa Compare July 10, 2025 14:46

QiliangCui force-pushed the update_test2 branch 2 times, most recently from 9089fb2 to c22082d Compare July 10, 2025 23:34

QiliangCui force-pushed the update_test2 branch from c22082d to 08ef8de Compare July 11, 2025 05:41

DarkLight1337 reviewed Jul 11, 2025

View reviewed changes

tests/entrypoints/llm/test_accuracy.py Outdated

Copy link

Member

DarkLight1337 Jul 11, 2025 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest.skip

QiliangCui force-pushed the update_test2 branch from 08ef8de to 270fa2a Compare July 11, 2025 05:59

Temporally suspend google/gemma-3-1b-it.

29d0f6b

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

QiliangCui force-pushed the update_test2 branch from 270fa2a to 29d0f6b Compare July 11, 2025 06:11

DarkLight1337 approved these changes Jul 11, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) July 11, 2025 06:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 11, 2025

DarkLight1337 merged commit b4f0b5f into vllm-project:main Jul 11, 2025
44 of 47 checks passed

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

10a69fe

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

fd6b59d

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

80bd00c

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

ca17fe4

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

1824b26

Signed-off-by: Qiliang Cui <derrhein@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

Temporarily suspend google/gemma-3-1b-it. (vllm-project#20722)

11f6632

Signed-off-by: Qiliang Cui <derrhein@gmail.com>

Uh oh!

Temporarily suspend google/gemma-3-1b-it. #20722

Temporarily suspend google/gemma-3-1b-it. #20722

Uh oh!

Conversation

QiliangCui commented Jul 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QiliangCui commented Jul 10, 2025

Uh oh!

vanbasten23 commented Jul 11, 2025

Uh oh!

DarkLight1337 commented Jul 11, 2025

Uh oh!

QiliangCui commented Jul 11, 2025

Uh oh!

QiliangCui commented Jul 11, 2025

Uh oh!

DarkLight1337 Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QiliangCui commented Jul 10, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Jul 10, 2025 •

edited

Loading

DarkLight1337 Jul 11, 2025 •

edited

Loading