-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Activation Ordering #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good with respect to the Hessian memory management. I'd still like to see an e2e test in for activation reordering that tests perplexity and reloading. You can see tests/llmcompressor/transformers/compression/test_quantization.py for an example of this. I believe it should just be a matter of adding a new recipe and config, let me know if you need help with doing that
Preformed tests and got the same accuracy and latency results |
Using compressed_tensors main branch, I confirmed that |
Summary
Add support for compressed-tensors models which have been quantized using activation ordering (group-wise quantization in decreasing order of activation)
Usage Script
compress_actorder.py
Evaluation
Accuracy
Full Precision
Group Quantization Only
Group Quantization Only on main (regression test)
Activation Ordering
Latency Regression
Group Quantization Only
Activation Ordering
PR Dependencies
Activation Ordering Support (neuralmagic/compressed-tensors#97)