torch CUDA graphs with HF generate #27837

tsengalb99 · 2023-12-04T18:39:35Z

Feature request

In my experiments, I cannot get torch CUDA graphs to work with HF generate. CUDA graphs work fine when calling the forward pass of a model, but either due to static input/output sizes or something else, stream capture fails when calling .generate(). Can support for torch CUDA graphs be added?

Motivation

LLMs have a lot of kernel launches and CUDA graphs can remove most of the launch time. In my experiments with just forward call, CUDA graphs can be twice as fast as non-CUDA graph versions of the same model.

Your contribution

n/a

ArthurZucker · 2023-12-05T17:05:02Z

This is kind of planned as we want to support static caching to compile the models and have faster inference 😉 cc @gante might have already been asked in other issues as well

gante · 2023-12-07T15:52:11Z

@tsengalb99 as Arthur wrote, we are working on it :D Expect to see updates soon

tsengalb99 · 2023-12-28T01:48:48Z

Are there any updates on this? And what is the main reason why cuda graphs don't work right now?

ArthurZucker · 2024-01-02T17:38:34Z

Follow this PR #27931 for update, the dynamic KV cache is an issue

ArthurZucker · 2024-01-30T08:46:15Z

PR is still very much active and now supports cuda graphs

tsengalb99 · 2024-02-02T21:51:29Z

Great, looking forward to seeing it merged! Do you have an ETA on when that will happen? From: Arthur ***@***.***> Sent: Tuesday, January 30, 2024 12:46 AM To: huggingface/transformers ***@***.***> Cc: Albert Tseng ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837) PR is still very much active and now supports cuda graphs — Reply to this email directly, view it on GitHub <#27837 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH6WZSDGXROQGEU3ISVVA7DYRCXOFAVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGM2DANZZHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ArthurZucker · 2024-02-05T02:24:02Z

Only needs a final review so this week 😉

tsengalb99 · 2024-02-09T07:00:34Z

Hi Arthur, I saw the PR got merged in - what is the recommended way to use cuda graphs during generation? I am wrapping the entire model with a torch cuda graph wrapper right now and am getting the same graph breaking errors as before. Thanks, Albert Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Arthur ***@***.***> Sent: Sunday, February 4, 2024 9:24:13 PM To: huggingface/transformers ***@***.***> Cc: Albert Tseng ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837) Only needs a final review so this week 😉 — Reply to this email directly, view it on GitHub<#27837 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSDUHK7DTUOHUS3KK4LYSA7E3AVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWGEYTCMRSGA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ArthurZucker · 2024-02-12T05:44:11Z

Hey! Here is how I used it: https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb.
I used compiled, so not 100 sure how the explicit call will work! Feel free to reach out if it does not work!

github-actions · 2024-03-07T08:06:10Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-03-07T09:25:44Z

A PR is coming for this! #29374

tsengalb99 mentioned this issue Dec 6, 2023

Add QuIP# support oobabooga/text-generation-webui#4803

Merged

huggingface deleted a comment from github-actions bot Jan 30, 2024

ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 7, 2024

ArthurZucker mentioned this issue Mar 7, 2024

DO NOT MERGE: generate compatible with torch.compile(fullgraph=True) #29374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch CUDA graphs with HF generate #27837

torch CUDA graphs with HF generate #27837

tsengalb99 commented Dec 4, 2023

ArthurZucker commented Dec 5, 2023

gante commented Dec 7, 2023

tsengalb99 commented Dec 28, 2023

ArthurZucker commented Jan 2, 2024

ArthurZucker commented Jan 30, 2024

tsengalb99 commented Feb 2, 2024 via email

ArthurZucker commented Feb 5, 2024

tsengalb99 commented Feb 9, 2024 via email

ArthurZucker commented Feb 12, 2024

github-actions bot commented Mar 7, 2024

ArthurZucker commented Mar 7, 2024

torch CUDA graphs with HF generate #27837

torch CUDA graphs with HF generate #27837

Comments

tsengalb99 commented Dec 4, 2023

Feature request

Motivation

Your contribution

ArthurZucker commented Dec 5, 2023

gante commented Dec 7, 2023

tsengalb99 commented Dec 28, 2023

ArthurZucker commented Jan 2, 2024

ArthurZucker commented Jan 30, 2024

tsengalb99 commented Feb 2, 2024 via email

ArthurZucker commented Feb 5, 2024

tsengalb99 commented Feb 9, 2024 via email

ArthurZucker commented Feb 12, 2024

github-actions bot commented Mar 7, 2024

ArthurZucker commented Mar 7, 2024