[AOTI] Remove the original model weights in Python deployment #1337

desertfire · 2024-11-01T02:50:44Z

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, if we also keep the eager model weights, it will be a wast of GPU memory and even triggers OOMs. This PR releases the corresponding eager model weights in the AOTI-Python deployment path.

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, we need to release the corresponding eager model weights in the Python deployment path.

pytorch-bot · 2024-11-01T02:50:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1337

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 978baa3 with merge base 9480258 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

desertfire · 2024-11-01T12:58:15Z

@Jack-Khuu , is there any known problem with CI? Many of the failures here seem unrelated to my change.

Jack-Khuu · 2024-11-01T18:10:21Z

Yeah, we're checking with the DevInfra folk

Jack-Khuu · 2024-11-01T19:16:01Z

Heads up CI is broken in pt/pt and at higher level

https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.0

byjlw · 2024-11-02T02:40:27Z

@desertfire it should be fixed now. We just need to churn through the tests

mikekgfb · 2024-11-02T16:45:35Z

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, if we also keep the eager model weights, it will be a waste of GPU memory and even triggers OOMs. This PR releases the corresponding eager model weights in the AOTI-Python deployment path.

In the long term we might ask whether there's a point in going to all the expense of building the model if we just need the config. The model build process had a config_only bool argument with the intent of allowing suppression of actual model build (that was never implemented though).

This is particularly noteworthy (aka expensive because GGUF requires much more than weight mmaps) when building a model from GGUF, as @metascroy pointed out a long time ago when he implemented the GGUF reader.

This reverts commit 962ec0d.

[AOTI] Remove the original model weights in Python deployment

962ec0d

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, we need to release the corresponding eager model weights in the Python deployment path.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 1, 2024

desertfire mentioned this pull request Nov 1, 2024

Out of memory AOTI using llama 3.1 8b on RTX 4090 #1302

Closed

Merge branch 'main' into desertfire/1302

6c27b00

desertfire added 4 commits November 5, 2024 09:04

Revert "[AOTI] Remove the original model weights in Python deployment"

e3acb5c

This reverts commit 962ec0d.

Refactor the code

9671810

Merge branch 'main' into desertfire/1302

a31ab39

Add setup_cache for aoti_package_path

978baa3

desertfire requested a review from Jack-Khuu November 6, 2024 02:43

Jack-Khuu approved these changes Nov 6, 2024

View reviewed changes

Jack-Khuu merged commit 4a7dab8 into main Nov 6, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTI] Remove the original model weights in Python deployment #1337

[AOTI] Remove the original model weights in Python deployment #1337

Uh oh!

desertfire commented Nov 1, 2024

Uh oh!

pytorch-bot bot commented Nov 1, 2024 •

edited

Loading

Uh oh!

desertfire commented Nov 1, 2024 •

edited

Loading

Uh oh!

Jack-Khuu commented Nov 1, 2024

Uh oh!

Jack-Khuu commented Nov 1, 2024

Uh oh!

byjlw commented Nov 2, 2024

Uh oh!

mikekgfb commented Nov 2, 2024

Uh oh!

Uh oh!

Uh oh!

[AOTI] Remove the original model weights in Python deployment #1337

[AOTI] Remove the original model weights in Python deployment #1337

Uh oh!

Conversation

desertfire commented Nov 1, 2024

Uh oh!

pytorch-bot bot commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1337

✅ No Failures

Uh oh!

desertfire commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jack-Khuu commented Nov 1, 2024

Uh oh!

Jack-Khuu commented Nov 1, 2024

Uh oh!

byjlw commented Nov 2, 2024

Uh oh!

mikekgfb commented Nov 2, 2024

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 1, 2024 •

edited

Loading

desertfire commented Nov 1, 2024 •

edited

Loading