transformers 4.27 compatability #227

gunesevitan · 2023-03-30T09:44:44Z

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

The text was updated successfully, but these errors were encountered:

HuangChiEn · 2023-03-31T00:07:35Z

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

yes, i just ask the same question yesterday, we need to downgrade the version of transformer..
you can see that requirement.txt have constraint the version of transformer package transformers>=4.25.0,<4.27
so it should less then 4.27!

at least 4.25 will work (i take this version)

gunesevitan · 2023-03-31T13:10:11Z

I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.


╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168   │
│ in <module>                                                                                      │
│                                                                                                  │
│   165 │   for step, inputs in enumerate(progress_bar):                                           │
│   166 │   │                                                                                      │
│   167 │   │   inputs = inputs.to(device)                                                         │
│ ❱ 168 │   │   batch_predictions = predict_blip(                                                  │
│   169 │   │   │   inputs=inputs,                                                                 │
│   170 │   │   │   model=blip_model,                                                              │
│   171 │   │   │   nucleus_sampling=False,                                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │
│ predict_blip                                                                                     │
│                                                                                                  │
│    89 │   """                                                                                    │
│    90 │                                                                                          │
│    91 │   with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16):    │
│ ❱  92 │   │   outputs = model.generate(                                                          │
│    93 │   │   │   samples={'image': inputs},                                                     │
│    94 │   │   │   use_nucleus_sampling=nucleus_sampling,                                         │
│    95 │   │   │   num_beams=num_beams,                                                           │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/blip_models/blip_caption.py:188 in generate                               │
│                                                                                                  │
│   185 │   │   prompt.input_ids = prompt.input_ids[:, :-1]                                        │
│   186 │   │                                                                                      │
│   187 │   │   # get decoded text                                                                 │
│ ❱ 188 │   │   decoder_out = self.text_decoder.generate_from_encoder(                             │
│   189 │   │   │   tokenized_prompt=prompt,                                                       │
│   190 │   │   │   visual_embeds=image_embeds,                                                    │
│   191 │   │   │   sep_token_id=self.tokenizer.sep_token_id,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1363 in generate_from_encoder                                      │
│                                                                                                  │
│   1360 │   │   │   )                                                                             │
│   1361 │   │   else:                                                                             │
│   1362 │   │   │   # beam search                                                                 │
│ ❱ 1363 │   │   │   outputs = self.generate(                                                      │
│   1364 │   │   │   │   input_ids=tokenized_prompt.input_ids,                                     │
│   1365 │   │   │   │   max_length=max_length,                                                    │
│   1366 │   │   │   │   min_length=min_length,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/autograd/grad_mode.py:27 in decorate_context                                     │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:1490 in generate                                      │
│                                                                                                  │
│   1487 │   │   │   │   **model_kwargs,                                                           │
│   1488 │   │   │   )                                                                             │
│   1489 │   │   │   # 13. run beam search                                                         │
│ ❱ 1490 │   │   │   return self.beam_search(                                                      │
│   1491 │   │   │   │   input_ids,                                                                │
│   1492 │   │   │   │   beam_scorer,                                                              │
│   1493 │   │   │   │   logits_processor=logits_processor,                                        │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/transformers/generation/utils.py:2749 in beam_search                                   │
│                                                                                                  │
│   2746 │   │   │                                                                                 │
│   2747 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2748 │   │   │                                                                                 │
│ ❱ 2749 │   │   │   outputs = self(                                                               │
│   2750 │   │   │   │   **model_inputs,                                                           │
│   2751 │   │   │   │   return_dict=True,                                                         │
│   2752 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:1213 in forward                                                    │
│                                                                                                  │
│   1210 │   │   if labels is not None:                                                            │
│   1211 │   │   │   use_cache = False                                                             │
│   1212 │   │                                                                                     │
│ ❱ 1213 │   │   outputs = self.bert(                                                              │
│   1214 │   │   │   input_ids,                                                                    │
│   1215 │   │   │   attention_mask=attention_mask,                                                │
│   1216 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:977 in forward                                                     │
│                                                                                                  │
│    974 │   │   else:                                                                             │
│    975 │   │   │   embedding_output = encoder_embeds                                             │
│    976 │   │                                                                                     │
│ ❱  977 │   │   encoder_outputs = self.encoder(                                                   │
│    978 │   │   │   embedding_output,                                                             │
│    979 │   │   │   attention_mask=extended_attention_mask,                                       │
│    980 │   │   │   head_mask=head_mask,                                                          │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:595 in forward                                                     │
│                                                                                                  │
│    592 │   │   │   │   │   mode=mode,                                                            │
│    593 │   │   │   │   )                                                                         │
│    594 │   │   │   else:                                                                         │
│ ❱  595 │   │   │   │   layer_outputs = layer_module(                                             │
│    596 │   │   │   │   │   hidden_states,                                                        │
│    597 │   │   │   │   │   attention_mask,                                                       │
│    598 │   │   │   │   │   layer_head_mask,                                                      │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:478 in forward                                                     │
│                                                                                                  │
│    475 │   │   │   │   outputs = outputs + cross_attention_outputs[1:-1]                         │
│    476 │   │   │                                                                                 │
│    477 │   │   │   else:                                                                         │
│ ❱  478 │   │   │   │   cross_attention_outputs = self.crossattention(                            │
│    479 │   │   │   │   │   attention_output,                                                     │
│    480 │   │   │   │   │   attention_mask,                                                       │
│    481 │   │   │   │   │   head_mask,                                                            │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:349 in forward                                                     │
│                                                                                                  │
│    346 │   │   past_key_value=None,                                                              │
│    347 │   │   output_attentions=False,                                                          │
│    348 │   ):                                                                                    │
│ ❱  349 │   │   self_outputs = self.self(                                                         │
│    350 │   │   │   hidden_states,                                                                │
│    351 │   │   │   attention_mask,                                                               │
│    352 │   │   │   head_mask,                                                                    │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/torch/nn/modules/module.py:1194 in _call_impl                                          │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │
│ -packages/lavis/models/med.py:222 in forward                                                     │
│                                                                                                  │
│    219 │   │   print('query', query_layer.shape)                                                 │
│    220 │   │   print('key', key_layer.shape)                                                     │
│    221 │   │   print('key t', key_layer.transpose(-1, -2).shape)                                 │
│ ❱  222 │   │   attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))         │
│    223 │   │                                                                                     │
│    224 │   │   if (                                                                              │
│    225 │   │   │   self.position_embedding_type == "relative_key"                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0

I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?

yes, i just ask the same question yesterday, we need to downgrade the version of transformer.. you can see that requirement.txt have constraint the version of transformer package transformers>=4.25.0,<4.27 so it should less then 4.27!

at least 4.25 will work (i take this version)

Yeah, I figured that out but I have to use transformers 4.27 :/

LiJunnan1992 · 2023-05-06T14:29:23Z

We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.

gunesevitan · 2023-05-07T06:33:02Z

We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.

Does BLIP model work with transformers>=4.27 too?

LiJunnan1992 · 2023-05-07T09:55:13Z

BLIP model does not work with transformers>=4.27.

Alchemistyui · 2023-07-04T08:09:23Z

BLIP model does not work with transformers>=4.27.

May I know the reason why BLIP doesn't work with transformers>=4.27? I have to use transformers>4.27, is it possible that I modify transformers>4.27 locally to fit BLIP model? Thank you in advance.

LiJunnan1992 · 2023-07-04T08:50:26Z

@Alchemistyui You may refer to this change and this change that affect BLIP model's generate function.

denis-ismailaj mentioned this issue Apr 2, 2023

Regression in the blip_caption / base_coco downloads? #211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformers 4.27 compatability #227

transformers 4.27 compatability #227

gunesevitan commented Mar 30, 2023 •

edited

Loading

HuangChiEn commented Mar 31, 2023

gunesevitan commented Mar 31, 2023

LiJunnan1992 commented May 6, 2023

gunesevitan commented May 7, 2023

LiJunnan1992 commented May 7, 2023

Alchemistyui commented Jul 4, 2023

LiJunnan1992 commented Jul 4, 2023 •

edited

Loading

transformers 4.27 compatability #227

transformers 4.27 compatability #227

Comments

gunesevitan commented Mar 30, 2023 • edited Loading

HuangChiEn commented Mar 31, 2023

gunesevitan commented Mar 31, 2023

LiJunnan1992 commented May 6, 2023

gunesevitan commented May 7, 2023

LiJunnan1992 commented May 7, 2023

Alchemistyui commented Jul 4, 2023

LiJunnan1992 commented Jul 4, 2023 • edited Loading

gunesevitan commented Mar 30, 2023 •

edited

Loading

LiJunnan1992 commented Jul 4, 2023 •

edited

Loading