Add FP8 KV Cache quant example #113

mgoin · 2024-08-26T16:22:29Z

* Add example for quantization kv cache to fp8 * Add eval

* compute zp, scale if weight exists in module * WIP, gets through 1 forward pass * fix for zeroed out scales * fix model load * style * offload helper fns * pass tests * add test to check that observers are used to populate zp and scale in initialization * fix no calibration case * clean up for PR * fix test * update dependencies * fix forward bug * don't calibrate on weights * dont calib weight in forward * fix zp load * check calibration --------- Co-authored-by: George Ohashi <george@neuralmagic.com>

mgoin added 2 commits August 26, 2024 15:58

Add example for quantization kv cache to fp8

946f748

Add eval

a1413cc

mgoin mentioned this pull request Aug 26, 2024

[Usage] How to do KV cache quantization? #111

Closed

mgoin merged commit ac673b5 into main Aug 27, 2024
4 of 7 checks passed

mgoin deleted the kv-cache-fp8-example branch August 27, 2024 23:55

kylesayrs pushed a commit that referenced this pull request Aug 28, 2024

Add FP8 KV Cache quant example (#113)

9cdb07c

* Add example for quantization kv cache to fp8 * Add eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FP8 KV Cache quant example #113

Add FP8 KV Cache quant example #113

mgoin commented Aug 26, 2024 •

edited

Loading

Add FP8 KV Cache quant example #113

Add FP8 KV Cache quant example #113

Conversation

mgoin commented Aug 26, 2024 • edited Loading

mgoin commented Aug 26, 2024 •

edited

Loading