-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor quantization tensor data representation #2479
Conversation
…ms unknown at compile-time) Instead, the qparams are stored in the TensorData bytes so we can pack/unpack them based on the scheme
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2479 +/- ##
==========================================
+ Coverage 82.91% 82.95% +0.03%
==========================================
Files 810 811 +1
Lines 104904 105057 +153
==========================================
+ Hits 86984 87151 +167
+ Misses 17920 17906 -14 ☔ View full report in Codecov by Sentry. |
qparams: JitQuantizationParameters::new( | ||
q.scale.elem(), | ||
Some(q.offset.elem()), | ||
qparams.scale, | ||
qparams.offset, | ||
device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to have the scale and the offset in the representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! With these changes we could go a step further and have the QJitTensor
keep the packed values w/ the appended qparams 🤔 we would need a new CubeType
to achieve the same thing.
The kernel could probably use a refactor in a follow-up PR, there have been a lot of improvements in cubecl since 😅
Checklist
run-checks all
script has been executed.Changes
DType::QFloat
no longer contains theQuantizationStrategy
with quantization parameters as it is not known at compile-time. Instead, use the existingQuantizationScheme
enum which only indicates the method and dtype.Also changed the representation to pack the quantized values (as was previously done in
burn-jit
exclusively) and include the quantization parameters as the bytes.Testing
Added unit tests for packed/unpacked quantization parameters.