-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Mamba V2 Test not Asserting Failures. #21379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an issue where tests were not asserting failures by replacing torch.allclose with torch.testing.assert_close. My main feedback is on the error tolerances, which have been significantly increased to make the tests pass. In one case, the tolerance is so high (50%) that it may render the test less effective at catching meaningful regressions. I've added comments suggesting ways to handle this, such as using conditional tolerances for different data types and adding TODOs to ensure they are tracked and addressed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tolerances in this test have been increased significantly. While this might be necessary to make the test pass with float16 on newer hardware, it's a very loose check, especially for float32. A 5% deviation can mask meaningful numerical issues. Could you please check if float32 can pass with a tighter tolerance? If so, it would be best to use different tolerances based on the itype.
| torch.testing.assert_close(Y[:, -1], Y_min[:, -1], atol=5e-2, rtol=5e-2) | |
| if itype == torch.float16: | |
| y_atol, y_rtol = 5e-2, 5e-2 | |
| state_atol, state_rtol = 1e-2, 5e-2 | |
| else: | |
| # Use original, tighter tolerances for float32 | |
| y_atol, y_rtol = 1e-3, 1e-3 | |
| state_atol, state_rtol = 1e-3, 1e-3 | |
| torch.testing.assert_close(Y[:, -1], Y_min[:, -1], atol=y_atol, rtol=y_rtol) | |
| torch.testing.assert_close(final_state[:, -1], | |
| final_state_min[:, -1].to(torch.float32), | |
| atol=state_atol, | |
| rtol=state_rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in #21379
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tolerance values atol=5e-1 and rtol=5e-1 are extremely high, allowing for a 50% relative difference, which may mask significant numerical discrepancies or bugs in the kernel implementation. Add a TODO comment in the code explaining why the tolerance is so high and that it needs to be investigated and tightened.
// TODO: Investigate and tighten tolerance. Current values are masking potential numerical issues.
torch.testing.assert_close(Y_eg, Y_min_eg, atol=5e-1, rtol=5e-1)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in #21379
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
b41cacb to
2360bd1
Compare
|
cc @tdoublep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the high tolerances need to be investigated as a follow-up, but having asserts with a high tolerance is better than having no asserts at all.
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
The
test_mamba_ssm_ssdintroduced in #10909 is currently problematic as it does not assert failures. Therefore the tests were not properly checked and were actually failing.test_mamba_chunk_scan_cont_batchis adjusted now to quite high, but the plan is to revisit this latercc: @tdoublep @tlrmchlsmth @cyang49
Test Plan
Test Result
(Optional) Documentation Update