Transformer building blocks tutorial #3075

mikaylagawarecki · 2024-10-04T21:59:01Z

Description

This adds the tutorial for transformer building blocks following the outline discussed in nn/optim triage on Friday (9/27/24) here https://docs.google.com/document/d/1TMrd0bDiM9-lcFHi079edkMRP1Ux5MTxt4lI1diiAKI/edit

This tutorial also links to a repo https://github.com/mikaylagawarecki/temp which

has examples of implementing the rest of the nn.Transformer-related layers in pytorch in a NJT friendly manner (basically no more *_padding_mask)
Notes some cases that we don't intend to demonstrate (e.g. see here)
removes fast path logic from MHA/TEL/TE
sanity checks that for MHA/TEL/TDL over kwargs: new_layer + NJT + compile we have correctness + perf gains over nn.layer + dense + mask + compile (as we expect :)). (TE, TD and T are just higher level wrappers so we didn't test those)

To run this tutorial with correctness, we likely need torch 2.6

In the future we can add the following

KV caching: NJT index_put_ + support in torch.compile for mutation of non-contiguous subclass instances (KV caching section) Add support for index_put_ in NT pytorch#135722
Grouped Query Attention + NJT (not sure if there is a plan for this yet)

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

pytorch-bot · 2024-10-04T21:59:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

GLIBC not found in Nova workflows

✅ No Failures

As of commit f666842 with merge base 24c42d2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jbschlosser

(discussed offline) more thorough review coming when the rendered docs are available. Sadly, we require nightly for the NJT stuff to work.

intermediate_source/transformer_building_blocks.py

jbschlosser · 2024-10-09T17:16:53Z

intermediate_source/transformer_building_blocks.py

+# is actually the same as an ``nn.TransformerEncoderLayer`` with ``is_causal=True``.
+
+# We  demonstrate examples of implementing the rest of the nn layers
+# `here <https://github.com/mikaylagawarecki/temp>`_ but omit that from this


I assume the name / hosting for this repo will change before publishing the tutorial?

name yes, hosting, perhaps we can make that a followup -- I don't think we want to host this at pytorch/ yet as that lends a lot of "official"-ness to this repo which I don't want it to have just yet

jbschlosser

Looking really good! Love how it reads, and I think you did a good job introducing the primitives and covering the background.

I left a bunch of silly editorial nits but nothing too major.

I guess we still need some flex + NJT demonstration once #136792 lands?

index.rst

intermediate_source/transformer_building_blocks.py

jbschlosser · 2024-10-21T16:43:19Z

intermediate_source/transformer_building_blocks.py

+# of the attention layer would be NaN. See `issue <https://github.com/pytorch/pytorch/issues/41508>`_.
+# This is because the softmax operation would divide by zero.
+# 
+# Thanks to `this PR <https://github.com/pytorch/pytorch/pull/133882>`_


nit: I'm not sure it's within scope of a tutorial to mention specific PRs, but I think it is valuable to say that rolling a custom MHA doesn't run into the same NaN issues as the old nn.MHA because we're not employing a fused kernel with this problem (i.e. the fastpath case, which I think still exhibits the NaN behavior even after @drisspg's fix)

If you wanted, you could mention that NJT's ability to model raggedness appropriately makes it possible to distinguish when there is an empty sequence

Hm I think Driss' PR description is pretty great so kind of wanted to link to it. If you feel strongly that tutorials should not link to PRs I'm happy to remove this though

also lmk whether the rewording in this section sounds reasonable

.jenkins/validate_tutorials_built.py

intermediate_source/transformer_building_blocks.py

jbschlosser

Looking really nice, thanks!

intermediate_source/transformer_building_blocks.py

jbschlosser · 2024-11-05T22:17:06Z

intermediate_source/transformer_building_blocks.py

+value = (
+    value.unflatten(-1, [n_heads, D]).transpose(1, 2).detach().requires_grad_()
+)
+out_flex2 = flex_attention(query, key, value, score_mod=alibi_score_mod)


nice example here! I think it's also worth showing a block_mask example since it's a little different (there's a new create_nested_block_mask() helper).

Here's one from testing: https://github.com/pytorch/pytorch/blob/b09eb6ed6a22476746d8b7d5f6e464e34f89747a/test/test_nestedtensor.py#L7043-L7051

lmk if you need help with this

intermediate_source/transformer_building_blocks.py

jbschlosser

Thanks for the hard work on this; I think it looks great!

Just got one more suggestion on the Flex + NJT component, but otherwise LGTM :)

svekars

Looks great, @mikaylagawarecki! Just a few editorial suggestions - let me know if you have any questions.

intermediate_source/transformer_building_blocks.py

svekars

This tutorial looks good to me - should we wait for 2.6RC and test against it?

mikaylagawarecki · 2024-11-14T16:20:38Z

Could we merge first and I'll update as necessary to make it runnable when 2.6 RC is available (I have ran locally and verified that it runs)

svekars · 2024-11-14T20:20:37Z

Merging withe plan to remove the top note from the tutorial and adding back to the build

facebook-github-bot added the cla signed label Oct 4, 2024

mikaylagawarecki requested a review from jbschlosser October 4, 2024 22:01

jbschlosser reviewed Oct 9, 2024

View reviewed changes

jbschlosser reviewed Oct 21, 2024

View reviewed changes

svekars reviewed Oct 21, 2024

View reviewed changes

.jenkins/validate_tutorials_built.py Outdated Show resolved Hide resolved

mikaylagawarecki force-pushed the transformer branch from 09642b4 to 8c7ec76 Compare October 30, 2024 20:22

mikaylagawarecki and others added 9 commits October 30, 2024 13:48

transformer building blocks tutorial

eb49580

some wording fixes

7a6718d

Fix wording

06903c5

Add to NOT_RUN list

c2f707a

Update .jenkins/validate_tutorials_built.py

3b17b9c

Address most comments

a83f66c

pyspelling + linkcheck + flex

c11927a

Try adding dropdown for utilities section

ca34438

rendering fixes

d83f14b

mikaylagawarecki force-pushed the transformer branch from f64196c to d83f14b Compare October 30, 2024 20:48

mikaylagawarecki added 7 commits October 30, 2024 14:00

more spelling + rendering

cae22fa

Try again dropdown

7890de7

Try dropdown again

9c90d44

rendering + some reordering + add sample outputs

311751d

rendering

ee5482c

spelling

e856813

rendering again

bb3106a

mikaylagawarecki commented Oct 31, 2024

View reviewed changes

intermediate_source/transformer_building_blocks.py Outdated Show resolved Hide resolved

mikaylagawarecki requested a review from jbschlosser October 31, 2024 22:24

mikaylagawarecki marked this pull request as ready for review October 31, 2024 22:24

renderrr

aaed759

mikaylagawarecki force-pushed the transformer branch from a48e2a5 to aaed759 Compare October 31, 2024 22:47

Fix ..code::

32a7e99

jbschlosser reviewed Nov 5, 2024

View reviewed changes

mikaylagawarecki added 2 commits November 8, 2024 12:52

Fix messaging

f27bc5c

fix spelling

1340a86

mikaylagawarecki requested a review from jbschlosser November 8, 2024 22:45

jbschlosser reviewed Nov 12, 2024

View reviewed changes

intermediate_source/transformer_building_blocks.py Show resolved Hide resolved

jbschlosser approved these changes Nov 12, 2024

View reviewed changes

svekars reviewed Nov 12, 2024

View reviewed changes

mikaylagawarecki added 2 commits November 13, 2024 12:48

Address all comments

94bde8f

Fix render

5195537

mikaylagawarecki requested a review from svekars November 14, 2024 00:33

svekars reviewed Nov 14, 2024

View reviewed changes

Merge branch 'main' into transformer

acb4f2c

svekars approved these changes Nov 14, 2024

View reviewed changes

svekars and others added 2 commits November 14, 2024 08:55

Merge branch 'main' into transformer

7bc042b

Add note re nightly

f666842

svekars merged commit 1fcb66e into pytorch:main Nov 14, 2024
20 checks passed

svekars mentioned this pull request Nov 14, 2024

Test and enable transformer_building_blocks in the build #3161

Closed

jbschlosser mentioned this pull request Dec 10, 2024

[Tracker] Move nested tensors to beta pytorch/pytorch#112398

Open

52 tasks

Transformer building blocks tutorial #3075

Transformer building blocks tutorial #3075

Uh oh!

Conversation

mikaylagawarecki commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

pytorch-bot bot commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3075

❗ 1 Active SEVs

✅ No Failures

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbschlosser Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbschlosser Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbschlosser Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

svekars left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svekars left a comment

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Nov 14, 2024

Uh oh!

Uh oh!

svekars commented Nov 14, 2024

Uh oh!

Uh oh!

mikaylagawarecki commented Oct 4, 2024 •

edited

Loading

pytorch-bot bot commented Oct 4, 2024 •

edited

Loading