Skip to content

Add AVX2 implementation of dequantize_row_q4_1 #505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 25, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Mar 25, 2023

Initial tests are promising, similar gains with BLAS as with q4_0.

@slaren
Copy link
Member Author

slaren commented Mar 25, 2023

Base (no BLAS):
    60.97 seconds per pass - ETA 11.09 hours
    [1]4.4948,[2]4.9721,[3]5.8697,

BLAS:
    34.60 seconds per pass - ETA 6.29 hours
    [1]4.4305,[2]4.8844,[3]5.7737,

BLAS + AVX:
    32.06 seconds per pass - ETA 5.83 hours
    [1]4.4305,[2]4.8844,[3]5.7737,

Most of the improvement comes from BLAS, but it is still a gain.

Not sure if the lower perplexity from using BLAS is just a fluke in the first chunks, but interesting regardless.

@ggerganov
Copy link
Member

ggerganov commented Mar 25, 2023

Not sure if the lower perplexity from using BLAS is just a fluke in the first chunks, but interesting regardless.

I also observed it and it is a bit worrying because it means the non-BLAS SIMD matrix multiplication is significantly less accurate compared to BLAS. The problem is that during normal inference for text generation, after processing the prompt, we switch from BLAS to non-BLAS. Using BLAS for single token inference is terribly slow.

So I think there is a risk that we will measure too good perplexity thanks to BLAS and then in reality it will be worse due to the SIMD implementation. But I don't see a good solution less than disabling BLAS for perplexity computations.

@ggerganov ggerganov merged commit 459e93c into ggml-org:master Mar 25, 2023
@slaren slaren deleted the avx-dequantize-q4_1 branch March 25, 2023 18:43
@gjmulder gjmulder added enhancement New feature or request performance Speed related topics labels Mar 26, 2023
@slaren slaren restored the avx-dequantize-q4_1 branch March 26, 2023 18:50
@slaren slaren deleted the avx-dequantize-q4_1 branch March 26, 2023 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants