Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add q-cache 6 and 8 support for Exllamav2 #6280

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

randoentity
Copy link
Contributor

Checklist:

@GodEmperor785
Copy link
Contributor

@oobabooga could this be merged to main? It would be useful for models that can get unstable with Q4 cache quantization (like Qwen or Mistral Nemo as reported by some people), also the current 8bit cache seems to be old and author of exllamav2 says that Q8 is better (even Q4 can be better while taking less space).

@ZedOud
Copy link

ZedOud commented Oct 25, 2024

Some of the names and references have changes, but otherwise this works.

@randoentity
Copy link
Contributor Author

@ZedOud thanks for testing. Can you propose changes for the names and references so I can remain lazy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants