You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for your great work!
I want to know whether the model can be trained without regions. In other words, I have caption and image only without any bbox info, how can I make the model work?
Thank you so much!
The text was updated successfully, but these errors were encountered:
@Decalogue You will need to adjust the image input (e.g., feature vectors and positional encodings) accordingly. If you intend to take CNN activations as the input, you might want to refer to our recent work ClipBERT: https://github.com/jayleicn/ClipBERT
Hi, Thanks for your great work!
I want to know whether the model can be trained without regions. In other words, I have caption and image only without any bbox info, how can I make the model work?
Thank you so much!
The text was updated successfully, but these errors were encountered: