Skip to content

ggml : use F16C conversion when available #1506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

ggml : use F16C conversion when available #1506

wants to merge 1 commit into from

Conversation

ggerganov
Copy link
Member

Somehow this has been replace via lookup table even if F16C intrinsics are available

@slaren
Copy link
Member

slaren commented May 17, 2023

I tried this a while ago, but IIRC converting a single value from FP16 to FP32 with a lookup table is faster. If we used SIMD to convert 8 values at a time, it is likely that F16C would be faster that way.

@ggerganov ggerganov added the performance Speed related topics label May 17, 2023
@sw
Copy link
Contributor

sw commented May 18, 2023

Agree with @slaren that the table look-up is faster.

This PR makes Q5_1 inference about 7% slower for me.

The upcoming AVX-NE-CONVERT instructions may change that, though (namely _mm256_bcstnesh_ps).

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label May 20, 2023
@ggerganov
Copy link
Member Author

I did some tests as well and I agree - it does not help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants