-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing class embedding selection in owl-vit #23157
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@orrzohar thank you for opening the PR! I'll take double check the original code and the forward pass shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this and opening the PR! I double checked the corresponding part in the original repo and the change indeed fixes the issue (as documented in issue #21206).
Adding @amyeroberts for the final approval.
After this fix I have problem with predictions. I Use Colab demo for owl-vit |
Nice to meet you! Yeah, I understand this. But in google colab demo we install fresh transformers from source:
So this is the problem with this changes |
I found when evaluating COCO that mAP@0.5 increases from 6 to 37. This is still below the expected 44+, but closer to the reported/expected performance. I am still trying to figure out why. |
fixing class embedding selection in owl-vit
fixing class embedding selection in owl-vit
What does this PR do?
Fixes # For OWL-ViT image-guided object detection, there is a mistake in selecting the best embedding (the most distinct one with high IoU). Specifically;
selected_inds is a [num_inds, 1] dimensional tensor, where the indexes indicate which queries had a high IoU with target object bbox. But, as selected_inds[0] was selected, only the first of all the possible queries is selected. Specifically, instead of selected_embeddings being a [num_inds, D] dimensional tensor, it is a [1, D] dimensional tensor. This led ultimately to the first query always being selected, not the most unique one as required.
An error is not raised. To see this is the case, just add a print statement of 'torch.argmin(mean_sim)' here:
transformers/src/transformers/models/owlvit/modeling_owlvit.py
Line 1505 in 01734db
& you will see it is always 0.
transformers
version: 4.28.1Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sgugger
@NielsRogge
@alaradirik
@amyeroberts