-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[bugfix] remove unused parameters to reduce unnecessary vram usage #26789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] remove unused parameters to reduce unnecessary vram usage #26789
Conversation
Signed-off-by: Reinforce-II <fate@eastal.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix to reduce VRAM usage by deleting unused model parameters after they have been processed. Specifically, in CompressedTensorsW4A4MoeMethod, the w13_weight_packed and w2_weight_packed parameters are deleted after their data has been used to initialize w13_weight and w2_weight respectively. This is a correct and important optimization, as it allows the memory for the original packed tensors to be reclaimed, especially when the weights are subsequently reordered or repacked, which creates new tensors. The provided test results clearly demonstrate a significant reduction in VRAM consumption, confirming the effectiveness of this change. The implementation is clean and directly addresses the issue of unnecessary memory retention.
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, do you know why we could have accuracy improvement?
This is a random occurrence; multiple evaluations show fluctuations of about 0.01. |
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
|
This solves the issue! thanks! |
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
…llm-project#26789) Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Purpose
Solve #26788
Remove unused parameters to reduce unnecessary VRAM usage
Test Plan
Test Result
Before
After
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.