[REQUEST] Can we have 1.0/1.5 bpw internally? #675

Originalimoc · 2024-11-17T17:04:15Z

Problem

On low target bpw like 2.1 to 3.2. With the tested accuracy on some layer can be very high like 0.995+, especially true for larger model(see below data), if we can get 1.0/1.5 bpw on those layer, remaining quota can go to other more important layer potentially give overall better result.

Solution

Introduce something like "0.05:2b_64g/0.95:1b_64g s4" and "0.5:2b_64g/0.5:1b_64g s4". (I'm not sure if it can work this way)

Explanation

(Green is ~14B Red is ~30B)
1-5 data is from ~1B-~70B quantization log from v0.2.4. You can see 70B+ models/purple acc can get very high to 0.99-0.999 at low bpw.

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

turboderp · 2024-11-18T13:55:35Z

It's difficult to go below 2 bits per weight, simply because that's the minimum amount of bits required to represent a value that can be either positive, negative or zero. Technically you can do it in ~1.58 bits, but this requires a grouped encoding (e.g. 20 weights in a 32 bit field), and that complicates the kernels a lot.

At 1 bit per weight you've only got positive and negative weights within each group, and in the experiments I've done that's the point at which things completely break down. Keep in mind those trend lines will have to diverge to -inf somewhere between 0 bpw and 2.13 bpw.

I do still plan to revisit quantization at some point, and I'm considering some options that might achieve less than 2 bpw on average without completely breaking the model. But they will need a different encoding scheme, I think, and currently I'm stuck on vision models it seems. 🤷

DocShotgun mentioned this issue Dec 18, 2024

[BUG] 1.0 bpw not possible? #700

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

Originalimoc commented Nov 17, 2024 •

edited

Loading

turboderp commented Nov 18, 2024

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

Comments

Originalimoc commented Nov 17, 2024 • edited Loading

Problem

Solution

Explanation

Acknowledgements

turboderp commented Nov 18, 2024

Originalimoc commented Nov 17, 2024 •

edited

Loading