Skip to content

Conversation

@sanyalington
Copy link

@sanyalington sanyalington commented Oct 4, 2024

Enable 128K context length in custom PA.
Enable custom PA to write fp8 output with scaling, enabled this perf optimization for LLama. This optimization is only enabled on rocm custom PA when chunked prefill is disabled and environment variable VLLM_USE_ROCM_CUSTOM_PAGED_ATTN_FP8_OUT=1

@sanyalington sanyalington requested a review from gshtras October 4, 2024 15:41
Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it

@shajrawi shajrawi merged commit b51fe69 into main Oct 8, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants