-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OwlVit gives different results compared to original colab version #21206
Comments
Yes we had a hard time making the Space output the same bounding boxes as in Colab (eventually it worked on the cats image). It had to do with the Pillow version. So I'm guessing there might be a difference in Pillow versions here as well Cc @alaradirik |
@darwinharianto thanks for bringing the issue up, I'm looking into it! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Kindly bumping |
Kindly reminder |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
cc @alaradirik and @amyeroberts |
I got the same issues. And this is huggingface demo.
The With lvis-api, the performance is not reproduced. (mAP = 0.095) |
It seems problem still exist. I mentioned about problem here. Maybe the best way is to cover with model predictions end-to-end tests on batch of images. This approach help us to be sure about changes |
@MaslikovEgor I agree with you. I have end-to-end test with lvis-api (both huggingface owlvit and google/scenic owl-vit). But owl vit in huggingface is not reproduced. (mAP = 0.095)
|
I want to fix this problem, but it would be efficient if I knew where to start. Can you give me a suggestion? @alaradirik |
Hi @MaslikovEgor, The demo didn't work before this fix as well (see #20136). Try running coco evaluation with image conditioning before/after this fix, mAP@0.5 increases from 6 to 37. This is still below the expected 44, but closer to the reported/expected performance. I am still trying to figure out why. |
@RRoundTable, the issues you are reporting seem to do with the text-conditioned evaluation. This means that the issues probably stem from the forward pass/post-processing. In your LVIS eval, did you make sure to implement a new post-processor that incorporates all the changes needed for eval? If helpful, I can add my function to 'processor' or something, please notice there are a few changes compared with normal inference. |
@orrzohar, Yes. I tested with text-conditioned evaluation. In my LVIS eval, I just used huggingface's postprocessor and preprocessor. It would be helpful if you contribute some functions.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @RRoundTable , I added a PR with the appropriate evaluation protocol Best, |
Hi! @alaradirik, |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi folks, I've investigated the difference, will be solved in PR below. TLDR: image preprocessing is done differently in the original Colab (involves padding the image to a square), whereas the HF implementation used center cropping. The model itself is fine, logits are exactly the same as original implementation on the same inputs. |
Hi folks, since OWLv2 was now added in #26668, you will see that results match one-on-one with the original Google Colab notebook provided by the authors. If you also want to get one-on-one matching results for OWLv1, then you will need to use |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@RRoundTable I have trying to reproduce the results(AP) values on lvis dataset using the example script that you have provided. Did you manage to reproduce the results? |
@NielsRogge I am using the Owlv2 processor, but still not able to get the same results. |
@rishabh-akridata please provide a script that reproduces your issue |
@NielsRogge Please find the script below.
|
@NielsRogge Also I tried to use the below processor as well. But the facing the same issue. |
@NielsRogge When I reduce the conf_threshold to 0.1, then I get some detections but with very low confidence and also the boxes are not the same as described in the official colab notebook. |
@NielsRogge Please ignore this one, I am looking into the results of different model variant. I am able to get the same results as mentioned in the colab notebook. Sorry for inconvenience caused. Thanks. |
@RRoundTable May I ask if you succeeded in replicating it at last? May I ask you for the code? It will be of great help to me. Thank you! |
Hi @iMayuqi to reproduce the results I would recommend using |
System Info
Using huggingface space and google colab
Who can help?
@adirik
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
cat picture from http://images.cocodataset.org/val2017/000000039769.jpg
remote control image from https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSRUGcH7a3DO5Iz1sknxU5oauEq9T_q4hyU3nuTFHiO0NMSg37x
Expected behavior
Being excited with the results of OwlVit, I tried to input some random image to see the results.
Having no experience on jax, my first option is to search out on huggingface space.
Given a query of remote control, and a cat picture, I wanted to get picture of remote controls.
https://huggingface.co/spaces/adirik/image-guided-owlvit
The results is not really what I expected (no box on remotes).
Then I checked for results on colab version, if they behave the same way.
https://colab.research.google.com/github/google-research/scenic/blob/main/scenic/projects/owl_vit/notebooks/OWL_ViT_inference_playground.ipynb#scrollTo=AQGAM16fReow
It correctly draw boxes on the remotes.
I am not sure what is happening, which part should I look at to determine what causes this difference?
The text was updated successfully, but these errors were encountered: