Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX implementations for remove-vzip #1370

Merged
merged 1 commit into from
May 8, 2023
Merged

Conversation

sw
Copy link
Contributor

@sw sw commented May 8, 2023

This adds AVX/AVX2 optimizations for PR #1305. It also has some improvements for the scalar implementations and the README and SHA256SUMS (incomplete).

As x86 doesn't seem to gain from this change that breaks file compatibility, I think quite a few people will not be very happy with it. But maybe someone can find an improvement to my AVX code.

The address sanitizer builds are failing because of some problem with the CI machines, I don't think it's caused by our code.

@sw sw requested a review from ggerganov May 8, 2023 19:07
@ggerganov ggerganov merged commit 948d124 into ggerganov:remove-vzip May 8, 2023
@sw sw deleted the shuffle-avx branch May 9, 2023 17:31
ggerganov added a commit that referenced this pull request May 11, 2023
ggerganov pushed a commit that referenced this pull request May 11, 2023
ggerganov added a commit that referenced this pull request May 11, 2023
ggerganov added a commit that referenced this pull request May 11, 2023
* ggml : remove Q4_0 bit shufling (ARM NEON)

* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)

* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)

* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)

* ggml : remove Q5_0 bit shuffling (ARM NEON)

* ggml : 2x faster scalar implementations

* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)

* ggml : simplify scalar dot

* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit

* ggml : fix Q4_1 quantization

* ggml : update cuBLAS + normalize variable names

* ggml : remove Q4_2 mode

* ggml : minor formatting

* ggml : fix Q5_0 quantization

* scripts : add script for measuring the time per token

* AVX implementations (#1370)

* ggml : uniform 5th bit extraction

* llama : produce error upon loading old model files

* llama : fix model magic/version write

* ggml : speed-up Q5_0 + Q5_1 at 4 threads

* ggml : preserve old Q4 and Q5 formats

* ggml : simplify Q8_1 - no need for low / high sums anymore

* ggml : fix Q8_0 and Q8_1 rounding

* Revert "AVX implementations (#1370)"

This reverts commit 948d124.

* ggml : fix AVX2 implementation

* sha : update hashes for 7B and 13B

* readme : update timings + remove warning banner

* llama : update v2 PR number to 1405

* ggml : fix WASM comments

* ggml : back to original bit order

* readme : add note that Q4 and Q5 have been changed

* llama : fix return for unknown version

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants