Master Build - AWQ - OOM when using GPU

I am using the master build of llmcompressor from git clone.

When quantizing with AWQ 4bit with CPU only, it works fine, if slow.

When its set to "auto" it crashes from OOM errors.

Trying to use the following:

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto", max_memory={0: "23GIB", 1: "23GIB", "cpu": "90GIB"}

I'm sure the max_memory is wrong, im copying that from autoawq. I only added that because of the crashes.

Environment:

Python 3.11
Debian 12 64bit
2x3090
7950x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Master Build - AWQ - OOM when using GPU #1509

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Master Build - AWQ - OOM when using GPU #1509

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions