Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

Closed
santapo opened this issue Apr 22, 2023 · 1 comment
Closed

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

santapo opened this issue Apr 22, 2023 · 1 comment

Comments

@santapo
Copy link

santapo commented Apr 22, 2023

Is there any source that provides the detail of these q4_0, q4_1, q4_2, q4_3 method? I tried to read the C++ code but it's hard for me to understand how they work and difference between them.

@Folko-Ven
Copy link
Contributor

Hi. You can see more about the different types of quantization here - #406. But in short, q4_0 - worse accuracy but higher speed, q4_1 - more accurate but slower. q4_2 and q4_3 are like new generations of q4_0 and q4_1. q4_2 should be more accurate q4_0 and just as fast, and q4_3 should be similarly more accurate than q4_1.

@ggml-org ggml-org locked and limited conversation to collaborators Apr 22, 2023
@sw sw converted this issue into discussion #1121 Apr 22, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants