Skip device placement for past key values in decoder models #23919

sgugger · 2023-05-31T18:25:10Z

What does this PR do?

This PR skips the device placement for the past_key_values in big models, which is responsible for a lot of time lost according to the analysis in this issue. The idea is that the past key values (one per layer) are all generated on the device of the layer they correspond to and never need to be moved.

Needs to be tested with Accelerate at huggingface/accelerate#1491 (until this PR is merged).

HuggingFaceDocBuilderDev · 2023-05-31T18:44:46Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik

LGTM! Should this be an array in case we want to skip several? It would make it more future-proof in case a model has such requirements over two variables

sgugger · 2023-05-31T19:32:17Z

Accelerate handles both strings and arrays :-)

LysandreJik · 2023-05-31T19:32:31Z

thanks boss!

…ace#23919)

Skip device placement for past key values in decoder models

9c743f2

sgugger requested review from LysandreJik and younesbelkada May 31, 2023 18:25

LysandreJik approved these changes May 31, 2023

View reviewed changes

sgugger merged commit fabe17a into main May 31, 2023

sgugger deleted the skip_keys_big_model branch May 31, 2023 19:32

sheonhan pushed a commit to sheonhan/transformers that referenced this pull request Jun 1, 2023

Skip device placement for past key values in decoder models (huggingf…

4438921

…ace#23919)

gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023

Skip device placement for past key values in decoder models (huggingf…

790e661

…ace#23919)

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Skip device placement for past key values in decoder models (huggingf…

dfcb3a1

…ace#23919)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip device placement for past key values in decoder models #23919

Skip device placement for past key values in decoder models #23919

sgugger commented May 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 31, 2023 •

edited

Loading

LysandreJik left a comment

sgugger commented May 31, 2023

LysandreJik commented May 31, 2023

Skip device placement for past key values in decoder models #23919

Skip device placement for past key values in decoder models #23919

Conversation

sgugger commented May 31, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented May 31, 2023 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger commented May 31, 2023

LysandreJik commented May 31, 2023

sgugger commented May 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 31, 2023 •

edited

Loading