-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is T5 model supported ? #150
Comments
Hi! This is a transformers question since you are using HQQ via the transformers lib, so I don't know exactly what's going on. |
Yes - error shows in both cases: when loading already quantized model and when quantizing it on-the-fly. But you are right that it may be more of an issue in transformers lib. There was similar problem while back with different model - huggingface/transformers#30727. |
That fix should have fixed this issue too since it's independent of the model actually. |
I'm not sure I understand the question :) Do you mean T5 or Flux pipeline ? |
Are you fine with loading the whole T5 model on cpu first then quantize it to run on the GPU later? |
Yes I'm fine with that. |
Then you can load it on CPU, and quantize the linear layers and dispatch to gpu via |
Something like this: def quantize_model(model, quant_config, compute_dtype, device = 'cuda:0'):
from hqq.core.quantize import HQQLinear, BaseQuantizeConfig
#Patch
def _patch_linear(model):
for name, layer in model.named_children():
if isinstance(layer, (torch.nn.Linear)):
layer = HQQLinear(layer, quant_config=quant_config, compute_dtype=compute_dtype, device=device)
setattr(model, name, layer)
else:
_patch_linear(layer)
_patch_linear(model)
#Nove the rest to the right device
model = model.to(device=device, dtype=compute_dtype)
#Autoname
for name, module in model.named_modules():
module.name = name
quantize_model(model, BaseQuantizeConfig(nbits=4, group_size=64), torch.bfloat16) |
I've created and save quantized version like this:
During inference I create flux pipeline:
But when I actually start inference I always get this error:
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/transformers/models/t5/modeling_t5.py", line 339, in forward
forwarded_states = self.DenseReluDense(forwarded_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/transformers/models/t5/modeling_t5.py", line 316, in forward
isinstance(self.wo.weight, torch.Tensor)
^^^^^^^^^^^^^^
File "/home/szwagros/anaconda3/envs/image/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1931, in getattr
raise AttributeError(
AttributeError: 'HQQLinear' object has no attribute 'weight'
Is it because T5 is not supported or am I doing something wrong ?
The text was updated successfully, but these errors were encountered: