Skip to content

[AOTI] Remove the original model weights in Python deployment #1337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 6, 2024

Conversation

desertfire
Copy link
Contributor

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, if we also keep the eager model weights, it will be a wast of GPU memory and even triggers OOMs. This PR releases the corresponding eager model weights in the AOTI-Python deployment path.

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, we need to release the corresponding eager model weights in the Python deployment path.
Copy link

pytorch-bot bot commented Nov 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1337

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 978baa3 with merge base 9480258 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 1, 2024
@desertfire
Copy link
Contributor Author

desertfire commented Nov 1, 2024

@Jack-Khuu , is there any known problem with CI? Many of the failures here seem unrelated to my change.

@Jack-Khuu
Copy link
Contributor

Yeah, we're checking with the DevInfra folk

@Jack-Khuu
Copy link
Contributor

Heads up CI is broken in pt/pt and at higher level

https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.0

@byjlw
Copy link
Contributor

byjlw commented Nov 2, 2024

@desertfire it should be fixed now. We just need to churn through the tests

@mikekgfb
Copy link
Contributor

mikekgfb commented Nov 2, 2024

Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, if we also keep the eager model weights, it will be a waste of GPU memory and even triggers OOMs. This PR releases the corresponding eager model weights in the AOTI-Python deployment path.

In the long term we might ask whether there's a point in going to all the expense of building the model if we just need the config. The model build process had a config_only bool argument with the intent of allowing suppression of actual model build (that was never implemented though).

This is particularly noteworthy (aka expensive because GGUF requires much more than weight mmaps) when building a model from GGUF, as @metascroy pointed out a long time ago when he implemented the GGUF reader.

@desertfire desertfire requested a review from Jack-Khuu November 6, 2024 02:43
@Jack-Khuu Jack-Khuu merged commit 4a7dab8 into main Nov 6, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Out of memory AOTI using llama 3.1 8b on RTX 4090
5 participants