Move non-NF4 tensor to device prior to quantization on copy #737

ebsmothers · 2024-08-23T05:04:15Z

Addressing #642 based on @gau-nernst's suggestion here. Tested in torchtune that (a) init time and memory are similar to torchtune's inplace copy here and (b) FSDP2 recipes still succeed (since they don't with torchtune copy enabled).

pytorch-bot · 2024-08-23T05:04:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/737

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5622179 with merge base 68e4643 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim

Stamping so we can merge fast to unblock but one thing that might help here is a test ideally here since we'd rather catch issues early or at the very least in tune since right now if another load time regression happens we won't catch it either

ebsmothers · 2024-08-23T05:10:23Z

Stamping so we can merge fast to unblock but one thing that might help here is a test ideally here since we'd rather catch issues early or at the very least in tune since right now if another load time regression happens we won't catch it either

Yeah good point. We don't currently have any perf-related testing but it actually shouldn't be too hard to add something like that on our end. Honestly just general sanity checks for QLoRA around load time, peak memory, etc will do us a lot of good on this front

ebsmothers added 2 commits August 22, 2024 22:00

move non-nf4 to device prior to quantization on copy

3cab1bf

typo

5622179

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 23, 2024

ebsmothers changed the title ~~Nf4 to device~~ Move non-NF4 tensor to device prior to quantization on copy Aug 23, 2024

msaroufim self-requested a review August 23, 2024 05:05

ebsmothers mentioned this pull request Aug 23, 2024

NF4 quantization slower on 0.3 vs 0.1 #642

Closed

msaroufim approved these changes Aug 23, 2024

View reviewed changes

ebsmothers mentioned this pull request Aug 23, 2024

Delete torchtune's inplace copy definition for NF4 pytorch/torchtune#1294

Merged

msaroufim merged commit 0ed3090 into pytorch:main Aug 23, 2024
16 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

create sections to make readme CI debug easier (pytorch#737)

aa7267e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move non-NF4 tensor to device prior to quantization on copy #737

Move non-NF4 tensor to device prior to quantization on copy #737

ebsmothers commented Aug 23, 2024

pytorch-bot bot commented Aug 23, 2024 •

edited

Loading

msaroufim left a comment

ebsmothers commented Aug 23, 2024

Move non-NF4 tensor to device prior to quantization on copy #737

Move non-NF4 tensor to device prior to quantization on copy #737

Conversation

ebsmothers commented Aug 23, 2024

pytorch-bot bot commented Aug 23, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/737

✅ No Failures

msaroufim left a comment

Choose a reason for hiding this comment

ebsmothers commented Aug 23, 2024

pytorch-bot bot commented Aug 23, 2024 •

edited

Loading