GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

irthomasthomas · 2024-01-08T21:26:22Z

[GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga](https://www.reddit.com/r/Oobabooga/comments/178yqmg/gptq_vs_exl2_vs_awq_vs_q4_k_m_model_sizes/ GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga)

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes

Mod Post
Size (mb) Model
16560 Phind_Phind-CodeLlama-34B-v2-EXL2-4.000b
17053 Phind_Phind-CodeLlama-34B-v2-EXL2-4.125b
17463 Phind-CodeLlama-34B-v2-AWQ-4bit-128g
17480 Phind-CodeLlama-34B-v2-GPTQ-4bit-128g-actorder
17548 Phind_Phind-CodeLlama-34B-v2-EXL2-4.250b
18143 Phind_Phind-CodeLlama-34B-v2-EXL2-4.400b
19133 Phind_Phind-CodeLlama-34B-v2-EXL2-4.650b
19284 phind-codellama-34b-v2.Q4_K_M.gguf
19320 Phind-CodeLlama-34B-v2-AWQ-4bit-32g
19337 Phind-CodeLlama-34B-v2-GPTQ-4bit-32g-actorder
I created all these EXL2 quants to compare them to GPTQ and AWQ. The preliminary result is that EXL2 4.4b seems to outperform GPTQ-4bit-32g while EXL2 4.125b seems to outperform GPTQ-4bit-128g while using less VRAM in both cases.

I couldn't test AWQ yet because my quantization ended up broken, possibly due to this particular model using NTK scaling, so I'll probably have to go through the fun of burning my GPU for 16 hours again to quantize and evaluate another model so that a conclusion can be reached.

Also no idea if Phind-CodeLlama is actually good. WizardCoder-Python might be better.

Suggested labels

"LLM-Quantization"

This was referenced Feb 27, 2024

Guide to choosing quants and engines : r/LocalLLaMA #641

Open

Qwen-1.5-8x7B : r/LocalLLaMA #647

Open

List of Foreign Data Wrappers from PostgreSQL Extension Network #688

Open

irthomasthomas mentioned this issue Mar 16, 2024

Vector Database Benchmarks - Qdrant #738

Open

1 task

ShellLM removed the llama label May 9, 2024

ShellLM mentioned this issue Jul 30, 2024

hsiehjackson/RULER: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models? #848

Open

1 task

This was referenced Aug 11, 2024

Xgboost 2.0.0 · dmlc/xgboost #878

Open

Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints #886

Open

irthomasthomas removed the ExLlamaV2 label Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

irthomasthomas commented Jan 8, 2024

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

Comments

irthomasthomas commented Jan 8, 2024

Suggested labels

"LLM-Quantization"