Add zero-shot classification task for BLIP-2 #25300

youssefadr · 2023-08-03T19:53:46Z

Feature request

I would like to add the support for the zero-shot classification task using BLIP2, computing text-image similarities with the normalized embeddings, that would be accessed from BLIP2 feature extractor.

The idea is to enable calling the zero-shot classification pipeline using BLIP2, by implementing the get_image_featureand get_text_featuresmethods.

I would love more guidance, if possible, on the criteria for accepting the PR.

Motivation

This is related to the following the discussion on this issue on the hub, and the comment left by @NielsRogge here https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions/3#64cbe5e487ec96aa473a1f54 .

Your contribution

I would like to submit a PR to contribute for this feature.

The text was updated successfully, but these errors were encountered:

NielsRogge · 2023-08-04T06:59:05Z

Yes so ideally you can add get_image_feature and get_text_feature to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .

ayushtues · 2023-08-08T08:51:56Z

@youssefadr let me know if you need any help in this PR, I am also in need of adding multimodal feature extraction from the Blip2Qformer

youssefadr · 2023-08-08T09:04:11Z

Hello, thanks for your message, I will tackle it this week 👍

github-actions · 2023-09-03T08:02:14Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

youssefadr · 2023-09-03T08:10:11Z

Sorry, I have been caught be in work. Will finalize the PR today!

github-actions · 2023-09-28T08:03:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

JhonDan1999 · 2024-01-02T10:47:35Z

Yes so ideally you can add get_image_feature and get_text_feature to the Blip2ForConditionalGeneration class. For that you can refer to the original implementation .

Hi I want to know if this has been done?
because I am trying to use get_image_feature but I am getting this error AttributeError: 'Blip2ForConditionalGeneration' object has no attribute 'get_image_feature'

and I can not use Blip2Model because I have to use load_in_8bit that come with Blip2ForConditionalGeneration

NielsRogge · 2024-01-02T13:54:08Z

Hi, no this feature hasn't been added yet.

JhonDan1999 · 2024-01-02T15:20:55Z

Hi, no this feature hasn't been added yet.

Thank you for your prompt response
I have the following questions I would appreciate your input:
Q1: is there any way to extract the feature of an image using BLIP-2 from hugging face checkpoints with load_in_8bit?
Q2: is the feature extraction in this notebook https://github.com/salesforce/LAVIS/blob/main/examples/blip2_feature_extraction.ipynb works in the same way as get_image_feature ?
Q3: if I want to extract or convert an image into a Victor so I can use it by another model and do you have any recommendation of the best way to do this other than using Clip model because it did not give me a good result.

kirillsemenov1314 · 2024-01-20T12:58:34Z

@youssefadr hi, lmk please if help is needed here, would love to give a try to push things forward. It would actually be my first contribution, but I'm quite familiar with the BLIP2 model.

youssefadr linked a pull request Aug 12, 2023 that will close this issue

Add support for BLIP-2 multimodal feature extraction #25474

Draft

5 tasks

jpizarrom mentioned this issue Aug 19, 2023

Add Blip2ForImageTextRetrieval for multimodal feature extraction #25612

Closed

12 tasks

github-actions bot closed this as completed Oct 11, 2023

NielsRogge reopened this Jan 2, 2024

NielsRogge added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Jan 2, 2024

jpizarrom mentioned this issue Feb 23, 2024

🚨 Add Blip2ForImageTextRetrieval #29261

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zero-shot classification task for BLIP-2 #25300

Add zero-shot classification task for BLIP-2 #25300

youssefadr commented Aug 3, 2023

NielsRogge commented Aug 4, 2023 •

edited

Loading

ayushtues commented Aug 8, 2023

youssefadr commented Aug 8, 2023

github-actions bot commented Sep 3, 2023

youssefadr commented Sep 3, 2023

github-actions bot commented Sep 28, 2023

JhonDan1999 commented Jan 2, 2024 •

edited

Loading

NielsRogge commented Jan 2, 2024

JhonDan1999 commented Jan 2, 2024

kirillsemenov1314 commented Jan 20, 2024

Add zero-shot classification task for BLIP-2 #25300

Add zero-shot classification task for BLIP-2 #25300

Comments

youssefadr commented Aug 3, 2023

Feature request

Motivation

Your contribution

NielsRogge commented Aug 4, 2023 • edited Loading

ayushtues commented Aug 8, 2023

youssefadr commented Aug 8, 2023

github-actions bot commented Sep 3, 2023

youssefadr commented Sep 3, 2023

github-actions bot commented Sep 28, 2023

JhonDan1999 commented Jan 2, 2024 • edited Loading

NielsRogge commented Jan 2, 2024

JhonDan1999 commented Jan 2, 2024

kirillsemenov1314 commented Jan 20, 2024

NielsRogge commented Aug 4, 2023 •

edited

Loading

JhonDan1999 commented Jan 2, 2024 •

edited

Loading