-
Notifications
You must be signed in to change notification settings - Fork 250
[AOTI] Remove the original model weights in Python deployment #1337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, we need to release the corresponding eager model weights in the Python deployment path.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1337
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 978baa3 with merge base 9480258 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@Jack-Khuu , is there any known problem with CI? Many of the failures here seem unrelated to my change. |
Yeah, we're checking with the DevInfra folk |
Heads up CI is broken in pt/pt and at higher level https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.0 |
@desertfire it should be fixed now. We just need to churn through the tests |
In the long term we might ask whether there's a point in going to all the expense of building the model if we just need the config. The model build process had a config_only bool argument with the intent of allowing suppression of actual model build (that was never implemented though). This is particularly noteworthy (aka expensive because GGUF requires much more than weight mmaps) when building a model from GGUF, as @metascroy pointed out a long time ago when he implemented the GGUF reader. |
Summary: Fixes #1302. Because AOTI-compiled model contains a copy of model weights, if we also keep the eager model weights, it will be a wast of GPU memory and even triggers OOMs. This PR releases the corresponding eager model weights in the AOTI-Python deployment path.