-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'Catcher' object has no attribute 'self_attn' #29352 #29783
Comments
Hi @andinus, thanks for raising an issue! Could you:
cc @ArthurZucker as it seems like a possible regression |
It's not really a regression, as I mentioned on the other PR, |
Hello, I'm very sorry, I won't be able to provide these immediately. OCR of the tracebackException: 'Catcher' object has no attribute 'self_attn Traceback (most recent call last): File "/root/qex/framework/run.py", line 318, in child_process Generator( input_queue, output_queue ).run() File "/root/qex/franework/run.py", line 284, in run self .quantize() File "/root/qex/framework/run.py", line 189, in quantize self. finetuningmodel_engine.quantize() File "/root/qex/framework/engine_vilm.py", line 129, in quantize model.quantize( tokenizer, quant_config=quant_config ) File "/usr/local/lib/python3.1@/dist-packages/torch/utils/_contextlib.py", line 115, in decorate.context return func(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/awq/models/base.py", line 161, in quantize self .quantizer = AwqQuantizer( File "/usr/local/lib/python3.16/dist-packages/awq/quantize/quantizer.py", line 59, in __init__ self.modules, self.module_kwargs, self.inps = self.init_quant() File "/usr/local/1ib/python3.18/dist-packages/awq/quantize/quantizer.py", line 478, in init_quant self .nodel (samples. to(next(self .model .paraneters()) device) File "/usr/local/1ib/python3.18/dist-packages/torch/nn/modules/module.py", line 1518, in wrapped_call_inp] return self..call_impl(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/torch/nn/modules/module.py", line 1527, in .call_inp] return forward_call(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/accelerate/hooks.py", line 166, in new_forward output = module..old_forward(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/transformers/nodels/llama/nodeling_llana.py", line 1196, in forward outputs = self.nodel( File "/usr/local/1ib/python3.18/dist-packages/torch/nn/modules/module.py", line 1518, in .wrapped.call_inp] return self..call_impl(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/torch/nn/modules/module. py", Line 1527, in -call_impl return forvard_call(*args, **kwargs) File "/usr/local/1ib/python3.18/dist-packages/transformers/nodels/llama/nodeling_llana.py", line 998, in forward causal_nask = self._update_causal_nask(attention_mask, inputs_embeds, cache_position) File "/usr/local/1ib/python3.10/dist-packages/transformers/nodels/1lana/nodeling_llana.py", line 1867, in _update_causal_mask if hasattr(self.layers[@].self_attn, "past_key_value"): # static cache File "/usr/local/1ib/python3.18/dist-packages/torch/nn/modules/module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).._name__}' object has no attribute '{nane}'") AttributeError: 'Catcher' object has no attribute 'self_attn' |
cc @casper-hansen is this what you mentioned in your tweet about breaking change? |
Hi @ArthurZucker, yes this is one of the issues. I have released 0.2.4 which has pinned transformers<=4.38.2 as a temporary fix for quantization and inference. On the inference issue, I am not sure how to patch it without replacing the whole LlamaForCausalLM which is a big task. This kind of pattern of accessing modules will break most (if not all) packages that try to utilize transformers to patch/optimize certain parts of the model. I would recommend creating some abstractions that avoid such direct access to modules.
Reference: I fixed the quantization issue, but there was another issue with inference following quantization that I did not have time to resolve. casper-hansen/AutoAWQ#407 (comment) |
I'll have a look. We can make another patch to fix both issue given the huge user base of AWQ it makes sense! |
Thanks @ArthurZucker, I appreciate collaboration here to make the best of quantized models. At present time, I will not be able to provide support for quantizing newer models (e.g. QWen2MoE) due to these breaking changes. Do you have an idea of when a fix could be implemented? |
In around 12h I'll do a fix + a patch with #29895 |
Hi! I also meet the same issue when using awq to quantize the gemma model. Please let me know when you release the usable version! Thanks for your help. |
This issue seems to still be unresolved. |
System Info
transformers
version: 4.39 (downgrading to 4.38.2 fixes this)Platform: Linux-5.4.0-163-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.21.4 - Safetensors version: 0.4.2 - Accelerate version: 0.28.0
Accelerate config: not found - PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: parallel
Related: #29352
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Same as #29352
Expected behavior
Same as #29352 (downgrading to 4.38.2 fixes this)
The text was updated successfully, but these errors were encountered: