[CI Failure]: Quantized Models Test - models/quantization/test_gguf.py::test_models[1-5-32-half-model0]

### Name of failing test

`models/quantization/test_gguf.py::test_models[1-5-32-half-model0]`

### Basic information

- [x] Flaky test
- [ ] Can reproduce locally
- [ ] Caused by external libraries (e.g. bug in `transformers`)

### 🧪 Describe the failing test

This specific Llama 1B GGUF model test has been failing consistently in multiple PRs https://buildkite.com/vllm/ci/builds/21800/steps/waterfall?jid=01975af4-f581-4d43-a1e5-7175d960b2b7#01975af4-f581-4d43-a1e5-7175d960b2b7/212-6971

```

[2025-06-10T18:40:56Z] FAILED models/quantization/test_gguf.py::test_models[1-5-32-half-model0] - AssertionError: Test0:
[2025-06-10T18:40:56Z] Matched tokens:	[4897, 596, 4495, 13, 650, 4178, 44, 13656, 369]
[2025-06-10T18:40:56Z] original:	"That's correct. VLLM stands for Vision and Language Model, which is a type of large language model designed for both inference and serving. It's a"	{31541: Logprob(logprob=-1.6094070672988892, rank=1, decoded_token='ĠVision'), 28968: Logprob(logprob=-2.0000319480895996, rank=2, decoded_token='ĠVari'), 8519: Logprob(logprob=-2.5000319480895996, rank=3, decoded_token='ĠVideo'), 21382: Logprob(logprob=-2.6562819480895996, rank=4, decoded_token='ĠVirtual'), 20796: Logprob(logprob=-2.7187819480895996, rank=5, decoded_token='ĠVisual')}
[2025-06-10T18:40:56Z] gguf:	"That's correct. VLLM stands for Virtual Language Learning Model, which is a type of large language model designed for high-throughput and memory-efficient inference and"	{21382: Logprob(logprob=-1.9463169574737549, rank=1, decoded_token='ĠVirtual'), 330: Logprob(logprob=-2.274441957473755, rank=2, decoded_token='Ġ"'), 15668: Logprob(logprob=-2.383816957473755, rank=3, decoded_token='ĠVery'), 4196: Logprob(logprob=-2.446316957473755, rank=4, decoded_token='ĠVal'), 28968: Logprob(logprob=-2.540066957473755, rank=5, decoded_token='ĠVari')}

```

### 📝 History of failing test

Earliest failure I found was at Mon 26th May at 8:27 AM
[CI/Build] Split pooling and generation extended language models tests in CI (#18705)
https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/94a54396-ec5f-8d47-8b48-6c88a2d4e5cb?period=28days&tags=scm.branch%3Amain&execution_id=01970c90-0b2c-7f2b-b3ad-d7bcc06f340b

### CC List.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI Failure]: Quantized Models Test - models/quantization/test_gguf.py::test_models[1-5-32-half-model0] #19458

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[CI Failure]: Quantized Models Test - models/quantization/test_gguf.py::test_models[1-5-32-half-model0] #19458

Description

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions