[Feature Request] OmniParser visual prompt #31

dandansamax · 2024-09-05T13:31:15Z

OmniParser is a visual prompt method, including a finetuned interactable icon detection model, a finetuned icon description model, and an OCR module. It should be more accurate than GroundingDino.

https://arxiv.org/abs/2408.00203

dandansamax mentioned this issue Sep 5, 2024

[Roadmap] Research papers and tools integration #6

Open

5 tasks

camel-ai deleted a comment Sep 5, 2024

dandansamax changed the title ~~OmniParser for Pure Vision Based GUI Agent: https://arxiv.org/abs/2408.00203~~ [Feature Request] OmniParser visual prompt Sep 5, 2024

dandansamax added enhancement New feature or request visual prompt labels Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] OmniParser visual prompt #31

[Feature Request] OmniParser visual prompt #31

dandansamax commented Sep 5, 2024 •

edited

Loading

[Feature Request] OmniParser visual prompt #31

[Feature Request] OmniParser visual prompt #31

Comments

dandansamax commented Sep 5, 2024 • edited Loading

dandansamax commented Sep 5, 2024 •

edited

Loading