Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Hf quantizer refactor #28703
[DO NOT MERGE] Hf quantizer refactor #28703
Changes from 56 commits
e0650b2
42adf9d
7f57f26
f1f5da0
a94d3a7
0b30de4
4cdaf0d
0db1107
89d1177
0c71b00
2b4122a
02ad562
3e51d51
2569367
3259243
ab61417
95e44cd
b936cfb
0b40d21
c53a3fb
dbd93f2
e34bd58
fcd5a7a
954c5e6
49e163f
f8b9e07
5eaf9ac
d678d99
7c9c49b
0d739d3
b5f2bab
af33463
d4af5f1
94f2cc7
ec77d10
f5b9849
fb37bb8
cdc71c8
e6df6ed
1c433f5
60781dd
842391a
0803440
594d1a9
a771ab7
aa4ec34
53619de
cd4aa90
a988d01
a911e7d
3886559
43e5e70
c1dcaa3
b0ac4a7
1575c47
ad8d7f6
89cf6cf
0ebaf4e
adaae05
f0b5f96
30e1fc2
3b7e625
493d117
48c5761
3744fb1
c4995ab
17f95bf
abb4db3
7e5a5b8
122b494
901ace5
2da5233
2ab7fd5
242682c
e387f68
4c0c33e
c37b222
ca40b04
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to make sure its
HfQuantizer
vsQuantizer
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would structure this in 4 section. 1 per function that are the most important!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to emphasize that the AWQ / GPTQ approach with is preferred to that of
bnb
. It is nice to see that after_process_model_before_weight_loading ()
all required quantization params fall into proper buffers, and there is no need for manipulations in_load_state_dict_into_meta_model()
. And whatever magic happens increate_quantized_param()
would happen in postprocessing phase.Ideally, bnb quantizers could be rewritten with same interface as AWQ.