Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy processing is triggered even when only pure text is input in llava without img_token #35526

Closed
wants to merge 3 commits into from

Conversation

jp1924
Copy link
Contributor

@jp1924 jp1924 commented Jan 6, 2025

What does this PR do?

When training llava, it sometimes mixes vision instruction data with text instruction data. However, an error occurs when only text instruction data is input.

AttributeError
'NoneType' object has no attribute 'shape'
  File "/root/transformers/src/transformers/models/vipllava/modeling_vipllava.py", line 299, in _merge_input_ids_with_image_features
    num_images, num_image_patches, embed_dim = image_features.shape
  File "/root/transformers/src/transformers/models/vipllava/modeling_vipllava.py", line 483, in forward
    inputs_embeds, attention_mask, labels, position_ids = self._merge_input_ids_with_image_features(
  File "/root/llava.py", line 13, in <module>
    model(**outputs)
AttributeError: 'NoneType' object has no attribute 'shape'

The reason for this issue is that legacy_processing becomes true.
If we add a check for image_feature here, it will handle the case smoothly even when only text is input.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@zucchini-nlp

@jp1924
Copy link
Contributor Author

jp1924 commented Jan 6, 2025

reproduction code

from transformers import AutoModelForImageTextToText, AutoProcessor


name = "llava-hf/vip-llava-7b-hf"
model, processor = AutoModelForImageTextToText.from_pretrained(name), AutoProcessor.from_pretrained(name)

prompts = [
    "USER: What are the things I should be cautious about when I visit this place? What should I bring with me? ASSISTANT:",
    "USER: What is this? ASSISTANT:",
]

outputs = processor(prompts, return_tensors="pt", padding=True, truncation=True)
model(**outputs)

Env

Copy-and-paste the text below in your GitHub issue.

  • huggingface_hub version: 0.26.2
  • Platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Running in iPython ?: No
  • Running in notebook ?: No
  • Running in Google Colab ?: No
  • Running in Google Colab Enterprise ?: No
  • Token path ?: /root/.cache/huggingface/token
  • Has saved token ?: True
  • Who am I ?: jp1924
  • Configured git credential helpers:
  • FastAI: N/A
  • Tensorflow: 2.15.1
  • Torch: 2.5.1+cu121
  • Jinja2: 3.1.4
  • Graphviz: N/A
  • keras: 2.15.0
  • Pydot: N/A
  • Pillow: 11.0.0
  • hf_transfer: N/A
  • gradio: N/A
  • tensorboard: 2.6.2.2
  • numpy: 1.26.4
  • pydantic: 2.9.2
  • aiohttp: 3.11.6
  • ENDPOINT: https://huggingface.co
  • HF_HUB_CACHE: /root/.cache/huggingface/hub
  • HF_ASSETS_CACHE: /root/.cache/huggingface/assets
  • HF_TOKEN_PATH: /root/.cache/huggingface/token
  • HF_STORED_TOKENS_PATH: /root/.cache/huggingface/stored_tokens
  • HF_HUB_OFFLINE: False
  • HF_HUB_DISABLE_TELEMETRY: False
  • HF_HUB_DISABLE_PROGRESS_BARS: None
  • HF_HUB_DISABLE_SYMLINKS_WARNING: False
  • HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
  • HF_HUB_DISABLE_IMPLICIT_TOKEN: False
  • HF_HUB_ENABLE_HF_TRANSFER: False
  • HF_HUB_ETAG_TIMEOUT: 10
  • HF_HUB_DOWNLOAD_TIMEOUT: 10

@zucchini-nlp
Copy link
Member

Will be fixed soon by #34502, this was caused by an incorrect indentation when processing image features, which should go one more indent further

@jp1924
Copy link
Contributor Author

jp1924 commented Jan 7, 2025

Ah, you were already working on it. I'll modify the code and work on it until it's resolved. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants