tests: Add batch size 1 cases to test_trtllm_gen_attention.py that fail, marked xfail #1897
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📌 Description
Trtllm-gen's attention kernels have been discovered to fail tests when batch size is 1.
Current PR adds batch size 1 cases to:
test_trtllm_gen_prefill_deepseek: that triggers an IMA with the newly added parameterstest_trtllm_batch_decode: that produces incorrect outputs with newly added parametersThese test cases have been marked as
pytest.xfail(). To avoid a combinatorial growth of test parameter combinations, these batch size 1 cases were defined as separate test functions.B200 status before PR:
2052 passed, 264 skipped in 177.80s (0:02:57)B200 status after PR:
2052 passed, 264 skipped, 3 xfailed in 195.14s (0:03:15)Status tracked in Issue 1898
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes