Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Fail fast on empty query for BatchPrefillWithPagedKVCacheKernel #377

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Yard1
Copy link
Contributor

@Yard1 Yard1 commented Jul 17, 2024

Small optimization for CUDA graph use cases. According to profiling, this shaves off ~10% of kernel execution time for empty queries.

@Yard1
Copy link
Contributor Author

Yard1 commented Jul 17, 2024

Actually it looks like setting query length to 0 causes illegal memory access under some circumstances, with or without this PR...

@Yard1 Yard1 marked this pull request as draft July 17, 2024 20:20
@yzh119
Copy link
Collaborator

yzh119 commented Jul 19, 2024

Thanks for doing this, I'm now looking at the illegal memory access issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants