Allow bf16 kv-cache #69

ikawrakow · 2024-09-29T06:02:42Z

On the CPU I get the exact same PPL with and without FA using bf16 for kv-cache. But on CUDA the bf16 kv-cache result is about the same as the fp16 kv-cache CPU result, so I'm missing some conversion somewhere. Either way, we can now run on all platforms supported here with bf16 kv-cache.

On the CPU I get the exact same PPL with and without FA using bf16 for kv-cache. But on CUDA the bf16 kv-cache result is about the same as the fp16 kv-cache CPU result, so I'm missing some conversion somewhere.

Allow bf16 kv-cache

d12d0e9

On the CPU I get the exact same PPL with and without FA using bf16 for kv-cache. But on CUDA the bf16 kv-cache result is about the same as the fp16 kv-cache CPU result, so I'm missing some conversion somewhere.

ikawrakow merged commit fd20638 into main Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow bf16 kv-cache #69

Allow bf16 kv-cache #69

ikawrakow commented Sep 29, 2024

Allow bf16 kv-cache #69

Allow bf16 kv-cache #69

Conversation

ikawrakow commented Sep 29, 2024