Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigModeling] Final fix for dispatch int8 and fp4 models #1660

Merged
merged 2 commits into from
Jun 28, 2023

Conversation

younesbelkada
Copy link
Contributor

What does this PR do?

Fixes a silent bug regarding buffers that are not in the state dict of the model and when using quantized models. Currently the following snippet fails:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="ybelkada/gpt2-xl-8bit"

config = AutoConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_path)

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0]))

With the following error:

  File "/home/younes_huggingface_co/code/transformers/src/transformers/models/gpt2/modeling_gpt2.py", line 203, in _attn
    attn_weights = torch.where(causal_mask, attn_weights.to(attn_weights.dtype), mask_value)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Because in #1652 we introduced a check to do nothing in case the model is quantized. This leads to buffers that are not in the state dict being on CPU.

As the .to operation is not supported for quantized models, the fix is to always force-dispatch the model if the model is quantized.

cc @sgugger @SunMarc

Currently the corresponding test is implemented in huggingface/transformers#24543 as it relies on the main branch of transformers.

@younesbelkada younesbelkada requested review from sgugger and SunMarc June 28, 2023 14:50
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 28, 2023

The documentation is not available anymore as the PR was closed or merged.

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@SunMarc SunMarc merged commit bc234c0 into main Jun 28, 2023
@SunMarc SunMarc deleted the fix-dispatch-int8-buffers branch June 28, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants