[`BigModeling`] Final fix for dispatch int8 and fp4 models #1660

younesbelkada · 2023-06-28T14:50:00Z

What does this PR do?

Fixes a silent bug regarding buffers that are not in the state dict of the model and when using quantized models. Currently the following snippet fails:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="ybelkada/gpt2-xl-8bit"

config = AutoConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_path)

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0]))

With the following error:

  File "/home/younes_huggingface_co/code/transformers/src/transformers/models/gpt2/modeling_gpt2.py", line 203, in _attn
    attn_weights = torch.where(causal_mask, attn_weights.to(attn_weights.dtype), mask_value)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Because in #1652 we introduced a check to do nothing in case the model is quantized. This leads to buffers that are not in the state dict being on CPU.

As the .to operation is not supported for quantized models, the fix is to always force-dispatch the model if the model is quantized.

cc @sgugger @SunMarc

Currently the corresponding test is implemented in huggingface/transformers#24543 as it relies on the main branch of transformers.

src/accelerate/big_modeling.py

sgugger

Thanks for the fix!

HuggingFaceDocBuilderDev · 2023-06-28T14:54:32Z

The documentation is not available anymore as the PR was closed or merged.

src/accelerate/big_modeling.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

SunMarc

LGTM !

final fix for dispatch int8 and fp4 models

7070e46

younesbelkada requested review from sgugger and SunMarc June 28, 2023 14:50

younesbelkada commented Jun 28, 2023

View reviewed changes

src/accelerate/big_modeling.py Show resolved Hide resolved

sgugger approved these changes Jun 28, 2023

View reviewed changes

younesbelkada mentioned this pull request Jun 28, 2023

[gpt2-int8] Add gpt2-xl int8 test huggingface/transformers#24543

Merged

SunMarc reviewed Jun 28, 2023

View reviewed changes

src/accelerate/big_modeling.py Outdated Show resolved Hide resolved

Update src/accelerate/big_modeling.py

d0d3830

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

SunMarc approved these changes Jun 28, 2023

View reviewed changes

SunMarc merged commit bc234c0 into main Jun 28, 2023

SunMarc deleted the fix-dispatch-int8-buffers branch June 28, 2023 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`BigModeling`] Final fix for dispatch int8 and fp4 models #1660

[`BigModeling`] Final fix for dispatch int8 and fp4 models #1660

younesbelkada commented Jun 28, 2023

sgugger left a comment

HuggingFaceDocBuilderDev commented Jun 28, 2023 •

edited

Loading

SunMarc left a comment

[BigModeling] Final fix for dispatch int8 and fp4 models #1660

[BigModeling] Final fix for dispatch int8 and fp4 models #1660

Conversation

younesbelkada commented Jun 28, 2023

What does this PR do?

sgugger left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 28, 2023 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

[`BigModeling`] Final fix for dispatch int8 and fp4 models #1660

[`BigModeling`] Final fix for dispatch int8 and fp4 models #1660

HuggingFaceDocBuilderDev commented Jun 28, 2023 •

edited

Loading