-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[CI-Build] Fixed numeric issue of test_prefix_prefill on AMD #29019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CI-Build] Fixed numeric issue of test_prefix_prefill on AMD #29019
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a numerical precision issue on AMD hardware by relaxing the test tolerances for prefix prefill operations. The changes correctly introduce platform-specific logic to increase the absolute tolerance (atol) for assertions when running on ROCm, which is a standard approach for handling hardware-dependent numerical differences. My review identifies an area for improvement regarding code duplication. The logic for adjusting the tolerance is repeated in two separate test functions. I've suggested refactoring this into a shared helper function to enhance maintainability and prevent potential inconsistencies in the future.
| if current_platform.is_rocm() and op is chunked_prefill_paged_decode: | ||
| rocm_atol = 5e-3 if "fp8" in kv_cache_dtype else 2e-3 | ||
| atol = max(atol, rocm_atol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic block for adjusting tolerance on ROCm is duplicated from lines 317-323. This duplication, including the hardcoded tolerance values, can lead to maintenance issues, such as forgetting to update both locations if the tolerances need to be changed.
To improve maintainability, consider extracting this logic into a helper function. For example:
def _adjust_atol_for_rocm(atol: float, op: Callable, kv_cache_dtype: str) -> float:
if current_platform.is_rocm() and op is chunked_prefill_paged_decode:
rocm_atol = 5e-3 if "fp8" in kv_cache_dtype else 2e-3
return max(atol, rocm_atol)
return atolThen you could simplify the code at both locations to a single line:
atol = _adjust_atol_for_rocm(atol, op, kv_cache_dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion
| # compared to the CUDA kernel (see GH-28490). Relax the tolerance to | ||
| # avoid spurious failures while still keeping the assertion strict on | ||
| # other platforms. | ||
| rocm_atol = 5e-3 if "fp8" in kv_cache_dtype else 2e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to relax the non-fp8 case as well?
| if current_platform.is_rocm() and op is chunked_prefill_paged_decode: | ||
| rocm_atol = 5e-3 if "fp8" in kv_cache_dtype else 2e-3 | ||
| atol = max(atol, rocm_atol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion
|
This pull request has merge conflicts that must be resolved before it can be |
Resolves issue #28490.
Purpose
Test Plan
pytest -s -v 'tests/kernels/attention/test_prefix_prefill.py'
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.