You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the Issue Tracker that this hasn't already been reported. (+1 or comment there if it has.)
Motivation
The current object detection visual prompt (GroundingDino) only finds the icon box. We want to get semantic descriptions for each icon to help agent understand UI.
Solution
The first step can be using VLLM to generate the description after passing through the object detection.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
Motivation
The current object detection visual prompt (GroundingDino) only finds the icon box. We want to get semantic descriptions for each icon to help agent understand UI.
Solution
The first step can be using VLLM to generate the description after passing through the object detection.
Additional context
No response
The text was updated successfully, but these errors were encountered: