tool: add image reader tool for local vision inputs#1306
tool: add image reader tool for local vision inputs#1306Wangmerlyn wants to merge 4 commits intoOpenHands:mainfrom
Conversation
|
[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help! |
enyst
left a comment
There was a problem hiding this comment.
Thank you for this. I think this raises an interesting question.
If the file_editor tool supports images already, do we need a separate image reader tool? WDYT?
I'm not sure. A quick thought is just: maybe? To note, one detail here is that we are looking to maybe try other tools, potentially replacing file editor, for GPT-5 and Gemini 3, and I'm not sure if they work for images.
On the other hand, to my knowledge, there's data that agents don't work well with too many tools, so adding duplicates maybe is not ideal.
Oh yes, thank you for looking into this! The background is that I wanted my agent to look at an image (e.g., a repo diagram), but it kept scanning the whole repo or large files instead. For debugging, I temporarily disabled the Later, thanks to @xingyaoww, I realized that I’m happy to close this PR or adjust it depending on what direction you think makes the most sense. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @Wangmerlyn, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
To whom it may concern, the built-in
file_editortool can provide agent with image input, a big shoutout to @xingyaoww for pointing it out.file editor tool
image input test
So the functionality of this tool is completely covered by
file_editor.