Skip to content

Master Build - AWQ - OOM when using GPU #1509

@frenzybiscuit

Description

@frenzybiscuit

I am using the master build of llmcompressor from git clone.

When quantizing with AWQ 4bit with CPU only, it works fine, if slow.

When its set to "auto" it crashes from OOM errors.

Trying to use the following:

model = AutoModelForCausalLM.from_pretrained(
MODEL_ID, device_map="auto", torch_dtype="auto", max_memory={0: "23GIB", 1: "23GIB", "cpu": "90GIB"}

I'm sure the max_memory is wrong, im copying that from autoawq. I only added that because of the crashes.

Environment:

Python 3.11
Debian 12 64bit
2x3090
7950x

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions