Quantization on a HuggingFace model #13775
Unanswered
Impasse52
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is using quantization in pipelines possible at all? I'm trying to use a 7B LLaMA based model on a TITAN V (12GB VRAM) but I can't load it as I go out of memory. I've been looking for a solution for a while but I can't seem to find any hints on how to approach this, except for re-implementing the pipeline logic myself which is far from ideal.
Beta Was this translation helpful? Give feedback.
All reactions