You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On low target bpw like 2.1 to 3.2. With the tested accuracy on some layer can be very high like 0.995+, especially true for larger model(see below data), if we can get 1.0/1.5 bpw on those layer, remaining quota can go to other more important layer potentially give overall better result.
Solution
Introduce something like "0.05:2b_64g/0.95:1b_64g s4" and "0.5:2b_64g/0.5:1b_64g s4". (I'm not sure if it can work this way)
Explanation
(Green is ~14B Red is ~30B)
1-5 data is from ~1B-~70B quantization log from v0.2.4. You can see 70B+ models/purple acc can get very high to 0.99-0.999 at low bpw.
Acknowledgements
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
It's difficult to go below 2 bits per weight, simply because that's the minimum amount of bits required to represent a value that can be either positive, negative or zero. Technically you can do it in ~1.58 bits, but this requires a grouped encoding (e.g. 20 weights in a 32 bit field), and that complicates the kernels a lot.
At 1 bit per weight you've only got positive and negative weights within each group, and in the experiments I've done that's the point at which things completely break down. Keep in mind those trend lines will have to diverge to -inf somewhere between 0 bpw and 2.13 bpw.
I do still plan to revisit quantization at some point, and I'm considering some options that might achieve less than 2 bpw on average without completely breaking the model. But they will need a different encoding scheme, I think, and currently I'm stuck on vision models it seems. 🤷
Problem
On low target bpw like 2.1 to 3.2. With the tested accuracy on some layer can be very high like 0.995+, especially true for larger model(see below data), if we can get 1.0/1.5 bpw on those layer, remaining quota can go to other more important layer potentially give overall better result.
Solution
Introduce something like "0.05:2b_64g/0.95:1b_64g s4" and "0.5:2b_64g/0.5:1b_64g s4". (I'm not sure if it can work this way)
Explanation
(Green is ~14B Red is ~30B)
1-5 data is from ~1B-~70B quantization log from v0.2.4. You can see 70B+ models/purple acc can get very high to 0.99-0.999 at low bpw.
Acknowledgements
The text was updated successfully, but these errors were encountered: