Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing class embedding selection in owl-vit #23157

Merged
merged 1 commit into from
May 8, 2023
Merged

Conversation

orrzohar
Copy link
Contributor

@orrzohar orrzohar commented May 5, 2023

What does this PR do?

Fixes # For OWL-ViT image-guided object detection, there is a mistake in selecting the best embedding (the most distinct one with high IoU). Specifically;
selected_inds is a [num_inds, 1] dimensional tensor, where the indexes indicate which queries had a high IoU with target object bbox. But, as selected_inds[0] was selected, only the first of all the possible queries is selected. Specifically, instead of selected_embeddings being a [num_inds, D] dimensional tensor, it is a [1, D] dimensional tensor. This led ultimately to the first query always being selected, not the most unique one as required.

An error is not raised. To see this is the case, just add a print statement of 'torch.argmin(mean_sim)' here:

best_box_ind = selected_inds[torch.argmin(mean_sim)]

& you will see it is always 0.

  • transformers version: 4.28.1
  • Platform: Linux-5.4.0-109-generic-x86_64-with-glibc2.31
  • Python version: 3.11.3
  • Huggingface_hub version: 0.14.1
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.0.0+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger
@NielsRogge
@alaradirik
@amyeroberts

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@alaradirik
Copy link
Contributor

@orrzohar thank you for opening the PR! I'll take double check the original code and the forward pass shortly.

@alaradirik alaradirik self-requested a review May 8, 2023 11:21
Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this and opening the PR! I double checked the corresponding part in the original repo and the change indeed fixes the issue (as documented in issue #21206).

Adding @amyeroberts for the final approval.

@alaradirik alaradirik requested a review from amyeroberts May 8, 2023 11:26
@sgugger sgugger merged commit 843fdf2 into huggingface:main May 8, 2023
@MaslikovEgor
Copy link

MaslikovEgor commented May 9, 2023

After this fix I have problem with predictions.

I Use Colab demo for owl-vit
And on example photo with cats I have only one prediction. Looks suspicious and not the same as original repo

Screenshot 2023-05-09 at 16 13 13

@alaradirik

@MaslikovEgor
Copy link

Hi @MaslikovEgor, Hub demos are deployed once and not updated unless triggered. I'm rebooting the demo to reflect the changes.

Nice to meet you!

Yeah, I understand this. But in google colab demo we install fresh transformers from source:

!pip install git+https://github.com/huggingface/transformers.git

So this is the problem with this changes

@alaradirik

@orrzohar
Copy link
Contributor Author

orrzohar commented May 9, 2023

I found when evaluating COCO that mAP@0.5 increases from 6 to 37. This is still below the expected 44+, but closer to the reported/expected performance.

I am still trying to figure out why.

gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
fixing class embedding selection in owl-vit
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
fixing class embedding selection in owl-vit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants