Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

rbavery · 2025-02-13T06:36:20Z

Search before asking

I have searched the Multimodal Maestro issues and found no similar feature requests.

Description

https://huggingface.co/docs/transformers/en/model_doc/owlv2

Be able to finetune Owlv2 for grounded object detection using JSONL referencing 3-channel imagery. N-channel imagery would be extra dope. Ideally with high bit depth TIFF support, since my imagery comes in .tif. I see Pillow in the requirements so high bit depth TIFF support might not be possible today without more work to change how imagery is loaded.

Use case

I've played around with OWLv2 a bit and compared it to GroundingDINO and Qwen 2.5 and it seems to do a better job at producing bounding boxes on hard images with small objects (satellite images) whereas the other models produce nothing. This makes me think it is a better candidate for fine-tuning potentially. But I'm definitely not certain and have more testing to do.

Additional

In the geospatial computer vision domain we are in the very earliest of days toward applying VLMs to solve actual problems on massive imagery corpuses. There have been some cool experiments recently that have inspired me to try fine-tuning VLMs to test their limits on remotely sensed imagery using modest sized datasets for fine-tuning.

Can't commit to a PR right now (but might be able to in the future.

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

rbavery added the enhancement New feature or request label Feb 13, 2025

rbavery changed the title ~~Scaling Open Vocabulary Object Detection, Owlv2~~ Scaling Open Vocabulary Object Detection, Owlv2 finetuning Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

rbavery commented Feb 13, 2025 •

edited

Loading

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Comments

rbavery commented Feb 13, 2025 • edited Loading

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

rbavery commented Feb 13, 2025 •

edited

Loading