Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Open
1 of 2 tasks
rbavery opened this issue Feb 13, 2025 · 0 comments
Open
1 of 2 tasks

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

rbavery opened this issue Feb 13, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@rbavery
Copy link

rbavery commented Feb 13, 2025

Search before asking

  • I have searched the Multimodal Maestro issues and found no similar feature requests.

Description

https://huggingface.co/docs/transformers/en/model_doc/owlv2

Be able to finetune Owlv2 for grounded object detection using JSONL referencing 3-channel imagery. N-channel imagery would be extra dope. Ideally with high bit depth TIFF support, since my imagery comes in .tif. I see Pillow in the requirements so high bit depth TIFF support might not be possible today without more work to change how imagery is loaded.

Use case

I've played around with OWLv2 a bit and compared it to GroundingDINO and Qwen 2.5 and it seems to do a better job at producing bounding boxes on hard images with small objects (satellite images) whereas the other models produce nothing. This makes me think it is a better candidate for fine-tuning potentially. But I'm definitely not certain and have more testing to do.

Additional

In the geospatial computer vision domain we are in the very earliest of days toward applying VLMs to solve actual problems on massive imagery corpuses. There have been some cool experiments recently that have inspired me to try fine-tuning VLMs to test their limits on remotely sensed imagery using modest sized datasets for fine-tuning.

Can't commit to a PR right now (but might be able to in the future.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@rbavery rbavery added the enhancement New feature or request label Feb 13, 2025
@rbavery rbavery changed the title Scaling Open Vocabulary Object Detection, Owlv2 Scaling Open Vocabulary Object Detection, Owlv2 finetuning Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant