Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers 4.27 compatability #227

Open
gunesevitan opened this issue Mar 30, 2023 · 7 comments
Open

transformers 4.27 compatability #227

gunesevitan opened this issue Mar 30, 2023 · 7 comments

Comments

@gunesevitan
Copy link

gunesevitan commented Mar 30, 2023

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

@HuangChiEn
Copy link

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

yes, i just ask the same question yesterday, we need to downgrade the version of transformer..
you can see that requirement.txt have constraint the version of transformer package transformers>=4.25.0,<4.27
so it should less then 4.27!

at least 4.25 will work (i take this version)

@gunesevitan
Copy link
Author

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

yes, i just ask the same question yesterday, we need to downgrade the version of transformer.. you can see that requirement.txt have constraint the version of transformer package transformers>=4.25.0,<4.27 so it should less then 4.27!

at least 4.25 will work (i take this version)

Yeah, I figured that out but I have to use transformers 4.27 :/

@LiJunnan1992
Copy link
Contributor

We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.

@gunesevitan
Copy link
Author

We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.

Does BLIP model work with transformers>=4.27 too?

@LiJunnan1992
Copy link
Contributor

BLIP model does not work with transformers>=4.27.

@Alchemistyui
Copy link

BLIP model does not work with transformers>=4.27.

May I know the reason why BLIP doesn't work with transformers>=4.27? I have to use transformers>4.27, is it possible that I modify transformers>4.27 locally to fit BLIP model? Thank you in advance.

@LiJunnan1992
Copy link
Contributor

LiJunnan1992 commented Jul 4, 2023

@Alchemistyui You may refer to this change and this change that affect BLIP model's generate function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants