Fixing class embedding selection in owl-vit #23157

orrzohar · 2023-05-05T00:11:44Z

What does this PR do?

Fixes # For OWL-ViT image-guided object detection, there is a mistake in selecting the best embedding (the most distinct one with high IoU). Specifically;
selected_inds is a [num_inds, 1] dimensional tensor, where the indexes indicate which queries had a high IoU with target object bbox. But, as selected_inds[0] was selected, only the first of all the possible queries is selected. Specifically, instead of selected_embeddings being a [num_inds, D] dimensional tensor, it is a [1, D] dimensional tensor. This led ultimately to the first query always being selected, not the most unique one as required.

An error is not raised. To see this is the case, just add a print statement of 'torch.argmin(mean_sim)' here:

transformers/src/transformers/models/owlvit/modeling_owlvit.py

Line 1505 in 01734db

best_box_ind = selected_inds[torch.argmin(mean_sim)]

& you will see it is always 0.

transformers version: 4.28.1
Platform: Linux-5.4.0-109-generic-x86_64-with-glibc2.31
Python version: 3.11.3
Huggingface_hub version: 0.14.1
Safetensors version: not installed
PyTorch version (GPU?): 2.0.0+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger
@NielsRogge
@alaradirik
@amyeroberts

HuggingFaceDocBuilderDev · 2023-05-05T00:26:30Z

The documentation is not available anymore as the PR was closed or merged.

alaradirik · 2023-05-05T08:08:09Z

@orrzohar thank you for opening the PR! I'll take double check the original code and the forward pass shortly.

alaradirik

Thank you for catching this and opening the PR! I double checked the corresponding part in the original repo and the change indeed fixes the issue (as documented in issue #21206).

Adding @amyeroberts for the final approval.

MaslikovEgor · 2023-05-09T12:47:40Z

After this fix I have problem with predictions.

I Use Colab demo for owl-vit
And on example photo with cats I have only one prediction. Looks suspicious and not the same as original repo

@alaradirik

MaslikovEgor · 2023-05-09T13:01:58Z

Hi @MaslikovEgor, Hub demos are deployed once and not updated unless triggered. I'm rebooting the demo to reflect the changes.

Nice to meet you!

Yeah, I understand this. But in google colab demo we install fresh transformers from source:

!pip install git+https://github.com/huggingface/transformers.git

So this is the problem with this changes

@alaradirik

orrzohar · 2023-05-09T15:00:14Z

I found when evaluating COCO that mAP@0.5 increases from 6 to 37. This is still below the expected 44+, but closer to the reported/expected performance.

I am still trying to figure out why.

fixing class embedding selection in owl-vit

fixing class embedding selection in owl-vit

a78c923

orrzohar mentioned this pull request May 5, 2023

Adds image-guided object detection support to OWL-ViT #20136

Merged

1 task

alaradirik self-requested a review May 8, 2023 11:21

alaradirik approved these changes May 8, 2023

View reviewed changes

alaradirik requested a review from amyeroberts May 8, 2023 11:26

sgugger approved these changes May 8, 2023

View reviewed changes

sgugger merged commit 843fdf2 into huggingface:main May 8, 2023

MaslikovEgor mentioned this pull request May 9, 2023

OwlVit gives different results compared to original colab version #21206

Closed

4 tasks

ydshieh mentioned this pull request May 15, 2023

Fix OwlViTForObjectDetection.image_guided_detection doc example #23370

Merged

gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023

Fixing class embedding selection in owl-vit (huggingface#23157)

00af405

fixing class embedding selection in owl-vit

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Fixing class embedding selection in owl-vit (huggingface#23157)

c3d126f

fixing class embedding selection in owl-vit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing class embedding selection in owl-vit #23157

Fixing class embedding selection in owl-vit #23157

orrzohar commented May 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 5, 2023 •

edited

Loading

alaradirik commented May 5, 2023

alaradirik left a comment •

edited

Loading

MaslikovEgor commented May 9, 2023 •

edited

Loading

MaslikovEgor commented May 9, 2023

orrzohar commented May 9, 2023

Fixing class embedding selection in owl-vit #23157

Fixing class embedding selection in owl-vit #23157

Conversation

orrzohar commented May 5, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented May 5, 2023 • edited Loading

alaradirik commented May 5, 2023

alaradirik left a comment • edited Loading

Choose a reason for hiding this comment

MaslikovEgor commented May 9, 2023 • edited Loading

MaslikovEgor commented May 9, 2023

orrzohar commented May 9, 2023

orrzohar commented May 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 5, 2023 •

edited

Loading

alaradirik left a comment •

edited

Loading

MaslikovEgor commented May 9, 2023 •

edited

Loading