[`core`] Fix quantization issues with transformers==4.36.0 #249

younesbelkada · 2023-12-11T14:41:48Z

What does this PR do?

The recent transformers release that included a cache refactor: huggingface/transformers#26681 broke some internal assumptions in autoawq about the shapes of attention masks and input embeddings.

This PR fixes this issue by simply updating layer_kwargs thanks to the prepare_inputs_for_generation method that automatically takes care of updating the inputs and attention masks in the correct format (should be also compatible with previous versions)

cc @casper-hansen @TheBloke

fix

7918009

younesbelkada requested a review from casper-hansen December 11, 2023 14:42

more comments

e19fd2c

casper-hansen merged commit 78b59d7 into main Dec 11, 2023

younesbelkada deleted the fix-transformers-release branch December 11, 2023 16:12

younesbelkada mentioned this pull request Dec 14, 2023

An error occurred when using AWQ Fused modules huggingface/transformers#28028

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`] Fix quantization issues with transformers==4.36.0 #249

[`core`] Fix quantization issues with transformers==4.36.0 #249

younesbelkada commented Dec 11, 2023

[core] Fix quantization issues with transformers==4.36.0 #249

[core] Fix quantization issues with transformers==4.36.0 #249

Conversation

younesbelkada commented Dec 11, 2023

What does this PR do?

[`core`] Fix quantization issues with transformers==4.36.0 #249

[`core`] Fix quantization issues with transformers==4.36.0 #249