-
Notifications
You must be signed in to change notification settings - Fork 315
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I am using the master build of llmcompressor from git clone.
When quantizing with AWQ 4bit with CPU only, it works fine, if slow.
When its set to "auto" it crashes from OOM errors.
Trying to use the following:
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID, device_map="auto", torch_dtype="auto", max_memory={0: "23GIB", 1: "23GIB", "cpu": "90GIB"}
I'm sure the max_memory is wrong, im copying that from autoawq. I only added that because of the crashes.
Environment:
Python 3.11
Debian 12 64bit
2x3090
7950x
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working