You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think there is any real blocker to support them apart from a bit of time + coding.
That being said, from what I understood, at 6-bit and 8-bit, simpler quantization algorithms (such as integer) quantization performs pretty good, and the need for codebook is relatively lower. This was the reason they weren't supported initially. Happy to hear your thoughts though!
In short, we're conducting research on dynamic bitwidth quantization (per-layer bitwidth and such) and we're trying to come up with some sort of unified theory for a certain class of codebook quantization.
We need higher bitwidth support because some layers really need ~6bpw even when the average is around 3bpw.
And having a unified entry point for efficient inference of said grids could be extremely handy.
On a related question, Dan and I have been discussing codebook/vector quantization. FLUTE doesn't support this out of the box, but we have a vectorized LUT operation loosely similar to what you might be looking into. Let me know if this is something you need clarification etc too!
Hi!
Would that be possible to support 6-bit and 8-bit codebooks, or are there any hard limitations on codebook size that wouldn't allow it?
The text was updated successfully, but these errors were encountered: