-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix cpu bnb path #34647
fix cpu bnb path #34647
Conversation
Looks good, but do you have an example of code that failed before, that is fixed by this PR? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I run this case in a CPU-only device from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "Felladrin/Llama-68M-Chat-v1"
text = ["I am happy because", "This is"]
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(text, return_tensors="pt", padding=True)
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)
model.generation_config.cache_implementation = "static"
model.generate(**input_ids) Error: |
Hi @Rocketknight1 . The script can reproduce this error easily both on AWQ and BNB. Besides, it's not safe to get index without checking the list length. |
Hi @Rocketknight1 @ArthurZucker @SunMarc @gante @zucchini-nlp , do you mind reviewing this change? Thanks |
Hi @Titus-von-Koeller . This change is needed for bitsandbytes cpu path, can you help to review it? Besides, it's also needed for AWQ cpu path which is already enabled here: #33460 |
The following script can reproduce the AWQ error: from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AwqConfig
model_id = "PrunaAI/JackFram-llama-68m-AWQ-4bit-smashed"
text = ["I am happy because", "This is"]
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(text, return_tensors="pt", padding=True)
# quantization_config = BitsAndBytesConfig(load_in_8bit=True)
quantization_config = AwqConfig(version="ipex")
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)
model.generation_config.cache_implementation = "static"
model.generate(**input_ids) |
Hi @aymeric-roucher @LysandreJik , do you have time to review this bug fix? Thanks! |
@jiqing-feng sorry for late reply, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix ! I've suggested a similar fix that follows more closely what we have in accelerate. LMK if this works for you !
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Hi @SunMarc , I have applied your changes, thanks! |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤗
* fix cpu bnb path * Update src/transformers/generation/utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix awq quantizer env check * fix awq quantizer device check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Hi @Rocketknight1 @ArthurZucker @SunMarc @gante
We have "cpu" in hf_device_map when using bnb model in CPU. The cpu device bnb model should be accepted by transformers because CPU backend has been enabled in BNB. Please take a review, thx!