You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case of using round-to-nearest or static scaling for quantization formats like FP8 weights i.e. using QuantizationModifier on weights, there is no need for calibration data. Ideally, there should be no forward pass required at all.
Hey @mgoin, we do have a PR up for this in compressed-tensors: neuralmagic/compressed-tensors#108 where the weight/scale just get initialized when quantization is applied. Still some more work to do here as it doesn't account for the per-token use case that also doesn't require calibration.
Very nice, thanks @Satrat. Could you share an example of what the UX would be like? I'm not sure how quantization is applied outside of oneshot, or if it will run without the dataset argument.
In the case of using round-to-nearest or static scaling for quantization formats like FP8 weights i.e. using
QuantizationModifier
on weights, there is no need for calibration data. Ideally, there should be no forward pass required at all.Proposed UX:
The text was updated successfully, but these errors were encountered: