Skip to content

input_pos_maxp1 as a Python integer #2016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 6, 2025

Conversation

Andrei-Aksionov
Copy link
Contributor

Hi there 👋

While I was profiling the code to see whether I can improve the speed of speculative decoding, I noticed two weird things:

  1. Relatively long KV-cache call (that's whole another story)
  2. cudaStreamSynchronize call in CausalSelfAttention.forward from model.py

Note

All number are provided for Qwen2.5-7B-Instruct on Nvidia L4

Screenshot 2025-04-15 at 8 11 42 PM

This is caused by implicit call of .item() when doing slicing with a tensor.

litgpt/litgpt/model.py

Lines 418 to 421 in 3d66f32

if input_pos_maxp1 is not None:
# Subselect along sequence dimension
k = k[..., :input_pos_maxp1, :]
v = v[..., :input_pos_maxp1, :]

It's worth admitting, when @mseeger added input_pos_maxp1 he initially used it as a Python integer, but I insisted on changing it to a tensor, so it works properly with the rest of the code (for example a function that moves arguments to a device in sequential generation code).
Now it's time to roll it back 😊.


When doing a quick benchmark multiple times by generating 500 new tokens, the speed was improved by ~1 token/sec (16.19 in this PR vs 15.10 in main branch).

@mseeger
Copy link
Contributor

mseeger commented Apr 16, 2025

Hello, note that

#1934

redoes the whole KV caching, and removes input_pos_maxp1 altogether, which I now recognize as a bit of an ugly hack on my behalf.

@Andrei-Aksionov
Copy link
Contributor Author

Hello
Thanks for pointing this out.
If that PR lands faster - this one can be closed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Borda Borda merged commit 78c2171 into Lightning-AI:main May 6, 2025
15 checks passed
@Andrei-Aksionov Andrei-Aksionov deleted the input_pos_maxp1_as_int branch May 6, 2025 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants