Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero-shot classification task for BLIP-2 #25300

Open
youssefadr opened this issue Aug 3, 2023 · 10 comments · May be fixed by #25474
Open

Add zero-shot classification task for BLIP-2 #25300

youssefadr opened this issue Aug 3, 2023 · 10 comments · May be fixed by #25474
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!

Comments

@youssefadr
Copy link
Contributor

Feature request

I would like to add the support for the zero-shot classification task using BLIP2, computing text-image similarities with the normalized embeddings, that would be accessed from BLIP2 feature extractor.

The idea is to enable calling the zero-shot classification pipeline using BLIP2, by implementing the get_image_featureand get_text_featuresmethods.

I would love more guidance, if possible, on the criteria for accepting the PR.

Motivation

This is related to the following the discussion on this issue on the hub, and the comment left by @NielsRogge here https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions/3#64cbe5e487ec96aa473a1f54 .

Your contribution

I would like to submit a PR to contribute for this feature.

@NielsRogge
Copy link
Contributor

NielsRogge commented Aug 4, 2023

Yes so ideally you can add get_image_feature and get_text_feature to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .

@ayushtues
Copy link
Contributor

@youssefadr let me know if you need any help in this PR, I am also in need of adding multimodal feature extraction from the Blip2Qformer

@youssefadr
Copy link
Contributor Author

Hello, thanks for your message, I will tackle it this week 👍

@github-actions
Copy link

github-actions bot commented Sep 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@youssefadr
Copy link
Contributor Author

Sorry, I have been caught be in work. Will finalize the PR today!

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@JhonDan1999
Copy link

JhonDan1999 commented Jan 2, 2024

Yes so ideally you can add get_image_feature and get_text_feature to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .

Hi I want to know if this has been done?
because I am trying to use get_image_feature but I am getting this error AttributeError: 'Blip2ForConditionalGeneration' object has no attribute 'get_image_feature'

and I can not use Blip2Model because I have to use load_in_8bit that come with Blip2ForConditionalGeneration

@NielsRogge
Copy link
Contributor

Hi, no this feature hasn't been added yet.

@NielsRogge NielsRogge reopened this Jan 2, 2024
@NielsRogge NielsRogge added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Jan 2, 2024
@JhonDan1999
Copy link

Hi, no this feature hasn't been added yet.

Thank you for your prompt response
I have the following questions I would appreciate your input:
Q1: is there any way to extract the feature of an image using BLIP-2 from hugging face checkpoints with load_in_8bit?
Q2: is the feature extraction in this notebook https://github.com/salesforce/LAVIS/blob/main/examples/blip2_feature_extraction.ipynb works in the same way as get_image_feature ?
Q3: if I want to extract or convert an image into a Victor so I can use it by another model and do you have any recommendation of the best way to do this other than using Clip model because it did not give me a good result.

@kirillsemenov1314
Copy link

@youssefadr hi, lmk please if help is needed here, would love to give a try to push things forward. It would actually be my first contribution, but I'm quite familiar with the BLIP2 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants