How to support activation 4bit quantization? #346

Ther-nullptr · 2024-11-04T09:59:21Z

As mentioned in title.

dacorvo · 2024-11-04T10:10:16Z

Why 4-bit activations are not supported in quanto ?

Activations are quantized dynamically based on the recording of scales during calibration (unlike weights that are quantized statically), adding an extra cost to the inference.
To make it worth it, we need to benefit from an accelerated matmul with the quantized weights.
Unfortunately, at this stage the only accelerated operations available are for scalar quantization scales, that give terrible results with 4-bit weights (you need group-wise scales to preserve accuracy).

How could you still use 4-bit activations ?

You would need to modify some code here and there:

add support for QBitsTensor in activation tensors,
add support for group-wise input and output scales in quantized modules,
add support for group-wise scales calibration.

github-actions · 2024-12-05T02:09:28Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2024-12-10T02:10:29Z

This issue was closed because it has been stalled for 5 days with no activity.

github-actions bot added the Stale label Dec 5, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to support activation 4bit quantization? #346

How to support activation 4bit quantization? #346

Ther-nullptr commented Nov 4, 2024

dacorvo commented Nov 4, 2024 •

edited

Loading

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 10, 2024

How to support activation 4bit quantization? #346

How to support activation 4bit quantization? #346

Comments

Ther-nullptr commented Nov 4, 2024

dacorvo commented Nov 4, 2024 • edited Loading

Why 4-bit activations are not supported in quanto ?

How could you still use 4-bit activations ?

github-actions bot commented Dec 5, 2024

github-actions bot commented Dec 10, 2024

dacorvo commented Nov 4, 2024 •

edited

Loading