[RFC] Adding overrides for max cache seq length #1449

SalmanMohammadi · 2024-08-29T15:55:30Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

#1364

Changelog

This PR:

Adds support for overriding the maximum sequence length used when setting up KV-caches.
Adds support for correctly setting up caches for self-attention, cross-attention, and fusion layers by exposing encoder and decoder max_seq_len args. These arguments are exposed to the top-level transformer class (i.e. TransformerDecoder, DeepFusionModel, so that the API for all models remains the same model.setup_caches(bsz, dtype, encoder_max_seq_len, decoder_max_seq_len)
We also remove the use of input_pos to update and retrieve from the KV-cache. Instead, the KV cache tracks its own position.

Test plan

Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.)

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Example of docstring:

torchtune/torchtune/modules/vision_transformer.py

Line 285 in 6a7951f

Examples:

Example in our docs: https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models

I did not change any public API;
I have added an example to docs or docstrings;

pytorch-bot · 2024-08-29T15:55:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1449

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3fc1135 with merge base bc2c013 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2024-08-29T16:26:25Z

Codecov Report

Attention: Patch coverage is 81.40704% with 37 lines in your changes missing coverage. Please review.

Project coverage is 72.90%. Comparing base (726abb0) to head (3fc1135).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/modules/transformer.py	55.88%	30 Missing ⚠️
torchtune/models/gemma/transformer.py	0.00%	6 Missing ⚠️
recipes/eleuther_eval.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1449      +/-   ##
==========================================
+ Coverage   70.72%   72.90%   +2.18%     
==========================================
  Files         288      289       +1     
  Lines       14213    14336     +123     
==========================================
+ Hits        10052    10452     +400     
+ Misses       4161     3884     -277

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

torchtune/utils/_generation.py

SalmanMohammadi · 2024-09-06T19:09:31Z

What's up with the eleuther eval tests? They're passing locally for me.

torchtune/modules/model_fusion/_fusion.py

torchtune/modules/attention.py

ebsmothers

Overall it looks reasonable to me, will leave it to @pbontrager or @joecummings for the final sign-off

torchtune/modules/model_fusion/_fusion.py

torchtune/modules/transformer.py

torchtune/utils/_generation.py

SalmanMohammadi · 2024-09-15T14:33:27Z

Thanks so much for the reviews @ebsmothers. Will address once #1424 lands and things are merged.

joecummings

No concerns, but pls address @ebsmothers comments.

refactoring setup_caches

e7f892b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2024

SalmanMohammadi changed the title ~~[WIP] Refactoring KV-cache setup, adding overrides for max cache seq length~~ [WIP][RFC] Refactoring KV-cache setup, adding overrides for max cache seq length Aug 29, 2024

SalmanMohammadi added 4 commits September 4, 2024 20:39

merging

895e937

fixing conflicts

ab11776

refactoring setup_cache and kv_cache

854609c

updating test

2ec46bb

SalmanMohammadi changed the title ~~[WIP][RFC] Refactoring KV-cache setup, adding overrides for max cache seq length~~ [RFC] Refactoring KV-cache setup, adding overrides for max cache seq length Sep 4, 2024

SalmanMohammadi marked this pull request as ready for review September 4, 2024 22:02

SalmanMohammadi added 2 commits September 4, 2024 23:04

removing setup_caches function

aa4be32

adding tests

f37c803

SalmanMohammadi requested review from pbontrager and joecummings September 4, 2024 22:50

SalmanMohammadi changed the title ~~[RFC] Refactoring KV-cache setup, adding overrides for max cache seq length~~ [RFC] Adding overrides for max cache seq length Sep 4, 2024

SalmanMohammadi added 13 commits September 5, 2024 13:52

updating tests

e175592

fixing tests for real

e9b4871

fixing docs

e26c781

removing cache_max_seq_len

7e5625b

updating docs

0350689

updating docs

432adef

i'm a muppet

1831803

adding to generation tests

a898bc8

updating kv cache transformerdecoder tests

1b0e7a9

merge conflicts

8d715a8

adding support for encoder-decoder max seq len overrides

807f8b2

bug in generate

36ccc35

updating typign

232a91c

SalmanMohammadi commented Sep 6, 2024

View reviewed changes

torchtune/utils/_generation.py Outdated Show resolved Hide resolved

SalmanMohammadi commented Sep 6, 2024

View reviewed changes

torchtune/modules/model_fusion/_fusion.py Outdated Show resolved Hide resolved

SalmanMohammadi commented Sep 9, 2024

View reviewed changes

torchtune/modules/attention.py Show resolved Hide resolved

ebsmothers reviewed Sep 13, 2024

View reviewed changes

SalmanMohammadi mentioned this pull request Sep 15, 2024

[RFC] Batched inference 🤝 KV-cache 🤝 compile #1424

Merged

13 tasks

joecummings approved these changes Sep 16, 2024

View reviewed changes

SalmanMohammadi added 6 commits September 16, 2024 19:53

addressing comments

6ccac65

undoing changes

f00a795

undoing changes

4677ff5

nits

33ad2f5

NITS

0dd43dc

fixing eval recipe

3fc1135

joecummings mentioned this pull request Sep 16, 2024

Generate Command phi3 Error #1581

Open

SalmanMohammadi merged commit 1e9dc42 into pytorch:main Sep 16, 2024
17 checks passed

SalmanMohammadi mentioned this pull request Sep 16, 2024

Set max cache seq len in generate recipe #1603

Merged

13 tasks

SalmanMohammadi deleted the setup_cache_refactor branch September 17, 2024 09:58

SalmanMohammadi mentioned this pull request Sep 24, 2024

Fixing recompiles in KV-cache + compile #1663

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Adding overrides for max cache seq length #1449

[RFC] Adding overrides for max cache seq length #1449

SalmanMohammadi commented Aug 29, 2024 •

edited

Loading

pytorch-bot bot commented Aug 29, 2024 •

edited

Loading

codecov-commenter commented Aug 29, 2024 •

edited

Loading

SalmanMohammadi commented Sep 6, 2024 •

edited

Loading

ebsmothers left a comment

SalmanMohammadi commented Sep 15, 2024

joecummings left a comment

[RFC] Adding overrides for max cache seq length #1449

[RFC] Adding overrides for max cache seq length #1449

Conversation

SalmanMohammadi commented Aug 29, 2024 • edited Loading

Context

Changelog

Test plan

UX

pytorch-bot bot commented Aug 29, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1449

✅ No Failures

codecov-commenter commented Aug 29, 2024 • edited Loading

Codecov Report

SalmanMohammadi commented Sep 6, 2024 • edited Loading

ebsmothers left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Sep 15, 2024

joecummings left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Aug 29, 2024 •

edited

Loading

pytorch-bot bot commented Aug 29, 2024 •

edited

Loading

codecov-commenter commented Aug 29, 2024 •

edited

Loading

SalmanMohammadi commented Sep 6, 2024 •

edited

Loading