-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ScalarQuantizer to use full bucket range #4074
base: main
Are you sure you want to change the base?
Conversation
Summary: `TestScalarQuantizer.test_4variants_ivf` shouldn't mix IVF probing misses in with its evaluation. As is, it probes 4/64 centroids, so FP16 has only 73% recall. This fixes it to *still exercise residual encoding* and the resulting distributions, but exhaustively scan the index. Differential Revision: D66909687
Summary: Scalar quantizers have an off-by-one bug. Even when the quantizer is trained to cover the full trained data range, it only ends up encoding the maximum quantized value for exact matches on the maximum value instead of having the upper bound correspond to the upper bound of the last bucket. This means a 4-bit quantizer often only uses 15 / 16 possible values, effectively 3.9 bits. Existing tests show significant movements for 4-bit quantization results; +9% recall in one of the unit tests. The fix is to use `2^n - eps` everywhere `2^n - 1` is used. This would break existing stored indices, though, so a backwards compatibility fix is included; ranges are adjusted at deserialization time to simulate old behavior. Differential Revision: D66909688
This pull request was exported from Phabricator. Differential Revision: D66909688 |
@ddrcoder this causes multiple backward compatibility problems. Add a new quantization type? |
I agree that the current bucket allocation is suboptimal. The bucket training uses parameters Could you define a new |
That's addressed; the ranges are adapted to produce the same values as before. I do still need to prove that with a unit test, though. |
Please address my comment above. |
Summary:
Scalar quantizers have an off-by-one bug. Even when the quantizer is trained to cover the full trained data range, it only ends up encoding the maximum quantized value for exact matches on the maximum value instead of having the upper bound correspond to the upper bound of the last bucket. This means a 4-bit quantizer often only uses 15 / 16 possible values, effectively 3.9 bits.
Existing tests show significant movements for 4-bit quantization results; +9% recall in one of the unit tests.
The fix is to use
2^n - eps
everywhere2^n - 1
is used.This would break existing stored indices, though, so a backwards compatibility fix is included; ranges are adjusted at deserialization time to simulate old behavior.
Differential Revision: D66909688