ggml : use F16C conversion when available #1506

ggerganov · 2023-05-17T17:08:09Z

Somehow this has been replace via lookup table even if F16C intrinsics are available

slaren · 2023-05-17T17:19:13Z

I tried this a while ago, but IIRC converting a single value from FP16 to FP32 with a lookup table is faster. If we used SIMD to convert 8 values at a time, it is likely that F16C would be faster that way.

sw · 2023-05-18T10:26:37Z

Agree with @slaren that the table look-up is faster.

This PR makes Q5_1 inference about 7% slower for me.

The upcoming AVX-NE-CONVERT instructions may change that, though (namely _mm256_bcstnesh_ps).

ggerganov · 2023-05-20T07:40:36Z

I did some tests as well and I agree - it does not help

ggml : use F16C conversion when available

40ec488

ggerganov added the performance Speed related topics label May 17, 2023

ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label May 20, 2023

ggerganov closed this May 20, 2023

slaren mentioned this pull request Apr 13, 2024

CPU F16->F32 conversion speed improvement #6648

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : use F16C conversion when available #1506

ggml : use F16C conversion when available #1506

ggerganov commented May 17, 2023

slaren commented May 17, 2023

sw commented May 18, 2023 •

edited

Loading

ggerganov commented May 20, 2023

ggml : use F16C conversion when available #1506

ggml : use F16C conversion when available #1506

Conversation

ggerganov commented May 17, 2023

slaren commented May 17, 2023

sw commented May 18, 2023 • edited Loading

ggerganov commented May 20, 2023

sw commented May 18, 2023 •

edited

Loading