Delete torchtune's inplace copy definition for NF4 #1294

ebsmothers · 2024-08-09T04:21:05Z

Thanks to @msaroufim for root causing this tricky bug. This addresses pytorch/ao#642 on our end as well as #1246.

Update: Instead of reenabling our inplace copy, @gau-nernst has found the upstream fix in torchao (described here). With this fix, we can fully delete the inplace copy from torchtune. Until we're on the ao version with the fix our QLoRA will continue to be slow, but after that things should be back to normal.

###Stuff below this line is outdated but kept as background###

We were defining our own override of aten copy op in torchtune prior to torchao version 0.2, when they added their own. After that we version-gated this override to ao < 0.2. But it turns out that the version we have is faster. Tbh idk why that is yet, but it should be safe to re-enable this to get us back to our previous perf on QLoRA state dict load.

Tested by logging time from recipe startup to the beginning of the train loop.

On main: init time: 101.48360734200105
On this PR: init time: 15.445241551031359

pytorch-bot · 2024-08-09T04:21:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1294

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9ebfb42 with merge base f9f75bb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ebsmothers · 2024-08-09T04:22:58Z

torchtune/modules/low_precision/_register_nf4_dispatch_ops.py

@@ -6,7 +6,6 @@

 import torch
 from torchao.dtypes.nf4tensor import implements as nf4_tensor_impl, to_nf4
-from torchtune.modules.low_precision._utils import _get_torchao_version


Strictly speaking this util is now no longer used anywhere in our library, but I am inclined to keep it in for possible (likely?) future usage

Nope - get rid of it until we need it again. It's easy enough to find in git history.

ebsmothers · 2024-08-09T05:05:30Z

@weifengpy seems we may need to add an override for aten.to.dtype_layout to handle this line when creating FSDP params. Please lmk if I have the right understanding here

weifengpy · 2024-08-09T06:17:26Z

@weifengpy seems we may need to add an override for aten.to.dtype_layout to handle this line when creating FSDP params. Please lmk if I have the right understanding here

yes, the error msg looks relevant. but I do not fully understand why torchtune implementation dispatch aten.to.dtype_layout but torchao’s implementation does not need it

FSDP2 needs to support around 9 tensor ops but aten.to.dtype_layout is totally new (1st time seen this)

weifengpy · 2024-08-23T00:21:35Z

I see the PR is accepted. curious how did we resolve the local_tensor=nan issue?

msaroufim

very nice! glad the version check code is also gone 💀

Override ops.aten.copy_ in torchtune with inplace version

f68876e

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2024

ebsmothers requested review from weifengpy, msaroufim, pbontrager and joecummings August 9, 2024 04:21

ebsmothers commented Aug 9, 2024

View reviewed changes

joecummings mentioned this pull request Aug 9, 2024

Llama3 qlora load_state_dict takes forever #1246

Closed

ebsmothers mentioned this pull request Aug 10, 2024

NF4 quantization slower on 0.3 vs 0.1 pytorch/ao#642

Closed

joecummings approved these changes Aug 22, 2024

View reviewed changes

ebsmothers added 2 commits August 22, 2024 21:56

Merge branch 'main' into redefine-inplace-copy

ce86906

delete inplace copy override

9ebfb42

ebsmothers changed the title ~~Override ops.aten.copy_ in torchtune with inplace version~~ Delete torchtune's inplace copy definition for NF4 Aug 23, 2024

msaroufim approved these changes Aug 23, 2024

View reviewed changes

ebsmothers merged commit 3d7ca38 into pytorch:main Aug 23, 2024
20 checks passed

ebsmothers deleted the redefine-inplace-copy branch August 23, 2024 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete torchtune's inplace copy definition for NF4 #1294

Delete torchtune's inplace copy definition for NF4 #1294

ebsmothers commented Aug 9, 2024 •

edited

Loading

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

ebsmothers Aug 9, 2024

joecummings Aug 9, 2024

ebsmothers commented Aug 9, 2024

weifengpy commented Aug 9, 2024 •

edited

Loading

weifengpy commented Aug 23, 2024

msaroufim left a comment

Delete torchtune's inplace copy definition for NF4 #1294

Delete torchtune's inplace copy definition for NF4 #1294

Conversation

ebsmothers commented Aug 9, 2024 • edited Loading

pytorch-bot bot commented Aug 9, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1294

✅ No Failures

ebsmothers Aug 9, 2024

Choose a reason for hiding this comment

joecummings Aug 9, 2024

Choose a reason for hiding this comment

ebsmothers commented Aug 9, 2024

weifengpy commented Aug 9, 2024 • edited Loading

weifengpy commented Aug 23, 2024

msaroufim left a comment

Choose a reason for hiding this comment

ebsmothers commented Aug 9, 2024 •

edited

Loading

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

weifengpy commented Aug 9, 2024 •

edited

Loading