Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch CUDA graphs with HF generate #27837

Open
tsengalb99 opened this issue Dec 4, 2023 · 11 comments
Open

torch CUDA graphs with HF generate #27837

tsengalb99 opened this issue Dec 4, 2023 · 11 comments
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@tsengalb99
Copy link

Feature request

In my experiments, I cannot get torch CUDA graphs to work with HF generate. CUDA graphs work fine when calling the forward pass of a model, but either due to static input/output sizes or something else, stream capture fails when calling .generate(). Can support for torch CUDA graphs be added?

Motivation

LLMs have a lot of kernel launches and CUDA graphs can remove most of the launch time. In my experiments with just forward call, CUDA graphs can be twice as fast as non-CUDA graph versions of the same model.

Your contribution

n/a

@ArthurZucker
Copy link
Collaborator

This is kind of planned as we want to support static caching to compile the models and have faster inference 😉 cc @gante might have already been asked in other issues as well

@gante
Copy link
Member

gante commented Dec 7, 2023

@tsengalb99 as Arthur wrote, we are working on it :D Expect to see updates soon

@tsengalb99
Copy link
Author

Are there any updates on this? And what is the main reason why cuda graphs don't work right now?

@ArthurZucker
Copy link
Collaborator

Follow this PR #27931 for update, the dynamic KV cache is an issue

@huggingface huggingface deleted a comment from github-actions bot Jan 30, 2024
@ArthurZucker
Copy link
Collaborator

PR is still very much active and now supports cuda graphs

@tsengalb99
Copy link
Author

tsengalb99 commented Feb 2, 2024 via email

@ArthurZucker
Copy link
Collaborator

Only needs a final review so this week 😉

@tsengalb99
Copy link
Author

tsengalb99 commented Feb 9, 2024 via email

@ArthurZucker
Copy link
Collaborator

Hey! Here is how I used it: https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb.
I used compiled, so not 100 sure how the explicit call will work! Feel free to reach out if it does not work!

Copy link

github-actions bot commented Mar 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 7, 2024
@ArthurZucker
Copy link
Collaborator

A PR is coming for this! #29374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants