Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some question about the detection ability #47

Open
1benwu1 opened this issue Nov 20, 2024 · 0 comments
Open

some question about the detection ability #47

1benwu1 opened this issue Nov 20, 2024 · 0 comments

Comments

@1benwu1
Copy link

1benwu1 commented Nov 20, 2024

I hope to input a screenshot of a webpage/app and have the model help me detect some components in the image, such as icons, textboxes, etc., but it is completely unable to do so. The only successful attempt is that when prompt="icon" is entered, the model can detect a certain number of icons/images. In other cases, it is almost impossible to detect anything. I want to know if the model has not been trained specifically in this area, or if my prompt is not written well enough. There is also a special case where prompt="icon. textbox". If a prompt has one more word (textbox), almost nothing can be detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant