-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Examples ] E2E Examples #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 143 files changed here, could you break out the removal of the copyright into a separate PR?
oneshot( | ||
model=model, dataset=ds, | ||
recipe=recipe, | ||
max_seq_length=MAX_SEQUENCE_LENGTH, | ||
num_calibration_samples=NUM_CALIBRATION_SAMPLES, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will same an uncompressed copy of the model, if that isn't desired we should set ouput_dir=SAVE_DIR
here then lines 103-105 will overwrite the uncompressed model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to not save via output_dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set output_dir to None explicitly, by default it saves to "./output"
Im not seeing the 143 files? |
Nevermind, when I reviewed it was showing removal of all the copyright headers as changes, just checked again and its fixed |
We first select the quantization algorithm. | ||
|
||
In our case, we will apply the default GPTQ recipe for `int4` (which uses static group size 128 scales) to all linear layers. | ||
> See the `Recipes` documentation for more information on making complex recipes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a link here or does this documentation not exist yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exist yet
With the dataset ready, we will now apply quantization. | ||
|
||
We first select the quantization algorithm. In our case, we will apply the default recipe for `fp8` (which uses static-per-tensor weights and static-per-tensor activations) to all linear layers. | ||
> See the `Recipes` documentation for more information on making complex recipes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exist yet
* Quantize the weights to 8 bits with channelwise scales using GPTQ | ||
* Quantize the activations with dynamic per token strategy | ||
|
||
> See the `Recipes` documentation for more information on recipes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not exist yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will we need to re-run the evals after the preset schemes land?
The evals take 10s so easy to do. The whole flow end to end is 10min on an H100 |
@Satrat how do I run the linting? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just not a fan of the current modifier scheme syntax i.e. recipe = QuantizationModifier(targets="Linear", scheme="FP8", ignore=["lm_head"])
, I would like to import and use python objects directly for the schemes for easy lookup in source.
Co-authored-by: Michael Goin <michael@neuralmagic.com>
* draft * add memoryless * run bin.quant * before tests, correctness verified * specify sparszoo version * remove sparsezoo Co-authored-by: Benjamin Fineran <benjaminfineran@gmail.com>
SUMMARY:
FOLLOW UP PRs: