Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR enhances the AttentionWithFusedQKV function of PagedKVCache so that it can now accept input qkv_data and o_data that have padding along the sequence dimension.

We introduce this enhancement to allow more flexibility for the caller of PagedKVCache to decide whether to pad the input qkv/o NDArrays or not.

This PR enhances the `AttentionWithFusedQKV` function of `PagedKVCache`
so that it can now accept input `qkv_data` and `o_data` that have
padding along the sequence dimension.

We introduce this enhancement to allow more flexibility for the caller
of PagedKVCache to decide whether to pad the input qkv/o NDArrays or
not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants