Quantization on a HuggingFace model #13775

Impasse52 · 2025-03-20T15:36:50Z

Impasse52
Mar 20, 2025

Is using quantization in pipelines possible at all? I'm trying to use a 7B LLaMA based model on a TITAN V (12GB VRAM) but I can't load it as I go out of memory. I've been looking for a solution for a while but I can't seem to find any hints on how to approach this, except for re-implementing the pipeline logic myself which is far from ideal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization on a HuggingFace model #13775

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Quantization on a HuggingFace model #13775

Impasse52 Mar 20, 2025

Replies: 0 comments

Impasse52
Mar 20, 2025