Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ steps:
commands:
- bash standalone_tests/pytorch_nightly_dependency.sh

- label: Async Engine, Inputs, Utils, Worker Test # 36min
timeout_in_minutes: 50
- label: Async Engine, Inputs, Utils, Worker Test # 10min
timeout_in_minutes: 15
Comment on lines +51 to +52

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore realistic timeout for Async Engine test

This step previously carried a 36‑minute runtime annotation and a 50‑minute timeout. The change drops both to 10 and 15 minutes while the commands still execute the full multimodal and utils_ pytest suites. Nothing in this commit reduces the workload, so on current hardware the job will time out before completion and consistently fail AMD CI.

Useful? React with 👍 / 👎.

mirror_hardwares: [amdexperimental, amdproduction]
agent_pool: mi325_1
# grade: Blocking
Expand Down Expand Up @@ -616,9 +616,9 @@ steps:
- uv pip install --system torchao==0.13.0
- VLLM_TEST_FORCE_LOAD_FORMAT=auto pytest -v -s quantization/ --ignore quantization/test_blackwell_moe.py

- label: LM Eval Small Models # 53min
timeout_in_minutes: 75
mirror_hardwares: [amdexperimental]
- label: LM Eval Small Models # 15min
timeout_in_minutes: 20
mirror_hardwares: [amdexperimental, amdproduction]
Comment on lines +619 to +621

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid shrinking LM Eval timeout below historical runtime

LM Eval Small Models was previously documented to run for ~53 minutes with a 75‑minute timeout. The new configuration reduces the timeout to 20 minutes without altering the invoked pytest command. Unless the test workload was dramatically reduced elsewhere, this will cause deterministic timeouts when enabling the job on AMD production hardware.

Useful? React with 👍 / 👎.

agent_pool: mi325_1
# grade: Blocking
source_file_dependencies:
Expand All @@ -627,8 +627,8 @@ steps:
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-small.txt --tp-size=1

- label: OpenAI API correctness # 22min
timeout_in_minutes: 30
- label: OpenAI API correctness # 10min
timeout_in_minutes: 15
mirror_hardwares: [amdexperimental, amdproduction]
Comment on lines +630 to 632

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent OpenAI API correctness job from timing out

The OpenAI API correctness step ran in ~22 minutes and previously had a 30‑minute timeout. The change lowers the timeout to 15 minutes while the step continues to run the same entrypoints/openai/correctness tests. Without removing tests, this halved timeout will likely abort the job and break AMD CI.

Useful? React with 👍 / 👎.

agent_pool: mi325_1
# grade: Blocking
Expand Down Expand Up @@ -859,10 +859,10 @@ steps:
- pytest -v -s models/multimodal -m core_model --ignore models/multimodal/generation/test_whisper.py --ignore models/multimodal/processing
- cd .. && VLLM_WORKER_MULTIPROC_METHOD=spawn pytest -v -s tests/models/multimodal/generation/test_whisper.py -m core_model # Otherwise, mp_method="spawn" doesn't work

- label: Multi-Modal Accuracy Eval (Small Models) # 50min
mirror_hardwares: [amdexperimental]
- label: Multi-Modal Accuracy Eval (Small Models) # 10min
mirror_hardwares: [amdexperimental, amdproduction]
agent_pool: mi325_1
timeout_in_minutes: 70
timeout_in_minutes: 15
Comment on lines +862 to +865

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Multi-Modal accuracy eval timeout too short for workload

This step previously recorded a ~50 minute runtime with a 70‑minute timeout. It now expects 10 minutes and times out after 15 minutes while executing the same test_lm_eval_correctness.py invocation. No accompanying changes speed up the job, so enabling it on production agents will cause routine timeouts.

Useful? React with 👍 / 👎.

working_dir: "/vllm-workspace/.buildkite/lm-eval-harness"
source_file_dependencies:
- vllm/multimodal/
Expand Down