-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[V0 Deprecation] Remove VLLM_USE_V1 from tests
#26341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V0 Deprecation] Remove VLLM_USE_V1 from tests
#26341
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively removes the VLLM_USE_V1 flag from a large number of test files, which is a crucial step in deprecating the V0 engine and solidifying V1 as the default. The changes are consistent and well-executed across the entire test suite, correctly removing environment variable settings, conditional logic, and fixtures related to the V0 engine. This simplifies the tests and aligns them with the V1-only architecture. I've reviewed the changes thoroughly and found no issues of high or critical severity. The cleanup is well done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| @pytest.mark.parametrize("model", MODEL_NAMES) | ||
| def test_lm_eval_accuracy_v1_engine(model, monkeypatch: pytest.MonkeyPatch): | ||
| def test_lm_eval_accuracy_v1_engine(model): | ||
| """Run with the V1 Engine.""" | ||
|
|
||
| with monkeypatch.context() as m: | ||
| m.setenv("VLLM_USE_V1", "1") | ||
| more_args = None | ||
| if current_platform.is_tpu(): | ||
| # Limit compilation time for TPU V1 | ||
|
|
||
| more_args = None | ||
| if current_platform.is_tpu(): | ||
| # Limit compilation time for TPU V1 | ||
| more_args = "max_model_len=2048,max_num_seqs=64" | ||
|
|
||
| more_args = "max_model_len=2048,max_num_seqs=64" | ||
| # Add TP test (if provided) | ||
| if TPU_TP_TEST_STR: | ||
| more_args += ",{}".format(TPU_TP_TEST_STR) | ||
|
|
||
| # Add TP test (if provided) | ||
| if TPU_TP_TEST_STR: | ||
| more_args += ",{}".format(TPU_TP_TEST_STR) | ||
| run_test(model, more_args) | ||
|
|
||
| run_test(model, more_args) | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| not current_platform.is_cuda() and not current_platform.is_tpu(), | ||
| reason="V1 is currently only supported on CUDA and TPU", | ||
| ) | ||
| @pytest.mark.parametrize("model", FP8_KV_MODEL_NAMES) | ||
| def test_lm_eval_accuracy_v1_engine_fp8_kv_cache( | ||
| model, monkeypatch: pytest.MonkeyPatch | ||
| ): | ||
| def test_lm_eval_accuracy_v1_engine_fp8_kv_cache(model): | ||
| """Run with the V1 Engine.""" | ||
|
|
||
| with monkeypatch.context() as m: | ||
| m.setenv("VLLM_USE_V1", "1") | ||
|
|
||
| more_args = None | ||
| if current_platform.is_tpu(): | ||
| # Limit compilation time for TPU V1 | ||
| more_args = "max_model_len=2048,max_num_seqs=128,kv_cache_dtype=fp8" | ||
| more_args = None | ||
| if current_platform.is_tpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reinstate platform guard for LM-Eval accuracy tests
The two LM-Eval accuracy tests no longer have the @pytest.mark.skipif guard that previously skipped them when CUDA/TPU are unavailable. These tests spin up vLLM and run gsm8k with 500 concurrent requests, which assumes GPU or TPU support; running them on a CPU-only runner will make CI fail rather than skip as before. Please restore the skip condition (or an equivalent platform gate) so CPU environments don’t execute GPU-only workloads.
Useful? React with 👍 / 👎.
| def test_lm_eval_accuracy_v1_engine(): | ||
| """Run with the V1 Engine.""" | ||
|
|
||
| with monkeypatch.context() as m: | ||
| m.setenv("VLLM_USE_V1", "1") | ||
| more_args = [] | ||
| more_args = [] | ||
|
|
||
| # Limit compilation time for V1 | ||
| if current_platform.is_tpu(): | ||
| more_args = ["--max-num-seqs", "64"] | ||
| # Limit compilation time for V1 | ||
| if current_platform.is_tpu(): | ||
| more_args = ["--max-num-seqs", "64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accuracy correctness test now runs on unsupported platforms
The OpenAI correctness LM-Eval test also had its @pytest.mark.skipif removed and now runs unconditionally. It still launches a RemoteOpenAIServer and evaluates gsm8k with 500 concurrent requests, which requires CUDA/XPU/TPU. Executing this on CPU-only CI will fail rather than skip as intended earlier. A platform guard should be reintroduced to avoid running this GPU-only test on unsupported hardware.
Useful? React with 👍 / 👎.
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One nit about a TODO which can be removed now that monkeypatch has been removed.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
we can remove them. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
665bc50 to
24747f3
Compare
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Purpose
Remove
VLLM_USE_V1flag from all tests, except for:tests/basic_correctness/test_cumem.py: Should we delete the V0-specific code or just comment them out? cc @youkaichao @ywang96 ~~tests/v1/test_oracle.py: For obvious reasonsUpdates:
tests/v1/core/test_scheduler_e2e.pywas previously not run. ~~Updated CPU model runner to avoidassert num_tokens <= self.scheduler_config.max_num_batched_tokensassertion error in dummy run. However now the test causes the process to be terminated, so I'm skipping this for CPU. cc @bigPYJ1151 ~~ Running this in the GPU CI instead of CPU CItests/v1/sample/test_sampling_params_e2e.pywas previously not run. Removedtest_prioritybecause priority scheduling is now supported in [Core] feat: Implement Priority Scheduling in V1 Engine #19057.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.