[KVCACHE] Improved schedule for prefill attention #17482
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improvements -
Added Tranpose to K for better Vectorization during Matmul. Improved Load Schedule.
Improved a bit more than 2x is most cases.
Llama-2 7B observation
-----------kernel----------------baseline----------optimized
This PR fixes the issue addressed in the PR #17446. The correctness issue is caused by incorrect code-generation during the unroll phase. Thus, we removed the explicit unroll and noticed little to no performance degradation.
We generated OpenCL kernels extracting the generated modules by setting
num_qo_heads=28
inhttps://github.qualcomm.com/gpgpu/apache-tvm/blob/85e15d494d5a42360859941cbc972c4f175c3b94/tests/python/relax/test_runtime_builtin_paged_attention_kv_cache_flashinfer.py#L36
Original PR Codegen
In the
O_store
block we notice large and incorrect pointer offsets were being generated during subsequent stages of unroll. This can be indirectly noted zero elements contained in the output and compute instability.Fusing the unroll loops to unroll together doesn't seem to resolve this.
Oddly enough, the initial test case doesn't seem to trigger the issue and works as intended.