-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "[KVCACHE] Improved schedule for prefill attention" #17466
Conversation
This reverts commit 79abc03.
@krishnaraj36 Hi! Thank you for the great contribution on the prefill attention improvement. Unfortunately we just ran into a correctness issue caused by this PR and thus decide to temporarily revert it first. Particularly, the prefill kernel produces incorrect results when Here is how you can reproduce the issue:
Then it should be able to show the error like
I think we are good to go with the improved kernels once the correctness issue is fixed. Would you mind taking a look at this issue? Thanks a lot in advance. |
BTW another information on the error: the kernel produces undetermined results, that is being said, if I run the test multiple times, each time the kernel produces a different results. |
@MasterJH5574 : Thanks for reporting this issue, Sure we will look into this issue. |
This PR reverts #17432 as we observe a correctness issue
when
num_attention_heads
is 28.The correctness issue leads to incorrect end-to-end results in LLM inference.