Implement gemini-style file editing tools#1199
Conversation
- Created separate tools (read_file, write_file, edit, list_directory) instead of one command-based tool - read_file: Supports pagination with offset/limit parameters - write_file: Full file overwrites with automatic directory creation - edit: Find/replace functionality with expected_replacements validation - list_directory: Lists directory contents with metadata (size, modified time, recursive option) - Added comprehensive tests covering success and error cases - All tools follow gemini-cli's design philosophy Fixes #1194 Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help! |
openhands-tools/openhands/tools/gemini_file_editor/list_directory.py
Outdated
Show resolved
Hide resolved
|
[Automatic Post]: This PR seems to be currently waiting for review. @jpshackelford, @enyst, could you please take a look when you have a chance? |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
I'd really like to get this working, will try to make some time. |
|
@OpenHands reflect enyst's comments. Once that is done, I'd like you to run the integration tests with these new tools with the gemini-3-pro language model. I will provide you an API key. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Address review comments from enyst: - Split gemini_file_editor into separate tool directories (read_file, write_file, edit, list_directory) - Each tool now follows the same structure as other tools in the repo - Removed rich.console usage from list_directory, using plain text formatting instead - Updated tests to use new import paths and converted to synchronous tests Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've addressed enyst's review comments on PR #1199 by restructuring the gemini-style file editing tools. The changes have been committed and pushed to the Changes Made1. Restructured tools into separate directories (addressing enyst's comment #1)
2. Removed rich.console usage (addressing enyst's comment #2)
3. Updated tests
4. All checks pass
Regarding Integration TestsThe integration tests require
Note: The current integration tests use the existing PR Link: #1199 |
Restructure the gemini-style file editing tools to be under a 'gemini' parent directory for better organization and to clearly indicate these are gemini-style tools. Structure: - openhands/tools/gemini/ - read_file/ - write_file/ - edit/ - list_directory/ Co-authored-by: openhands <openhands@all-hands.dev>
Add a GEMINI_FILE_TOOLS constant that makes it easy to swap the
claude-style FileEditorTool for gemini-style tools with a one-line change:
# Before (claude-style)
tools=[Tool(name=FileEditorTool.name)]
# After (gemini-style)
tools=[*GEMINI_FILE_TOOLS]
Also add example 30_gemini_file_tools.py demonstrating the usage.
Co-authored-by: openhands <openhands@all-hands.dev>
|
This needs eval. |
|
@OpenHands download and understand the OpenHands/benchmarks repo. Think carefully about how you can run this PR with the swe-bench benchmark, paying particular attention to what you need to do to make sure that the tools introduced in this benchmark are used instead of the default editing tools. I'll provide you with the API keys you need to perform the evaluation. Run on 50 examples first. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Add gemini preset with register_gemini_tools function and register gemini tools in the agent server's tool_router.py so they are available when using remote workspace. Co-authored-by: openhands <openhands@all-hands.dev>
Summary: SWE-bench Evaluation with Gemini-Style File Editing ToolsI successfully ran the SWE-bench benchmark evaluation using the gemini-style file editing tools from PR #1199 ( Evaluation Results
Changes Made to Enable Gemini Tools Evaluation
Output LocationThe evaluation results are stored at: Key files:
Notes
To run the full SWE-bench evaluation or with different parameters, use: uv run python benchmarks/swebench/run_infer.py \
.llm_config/gemini-3-pro.json \
--dataset princeton-nlp/SWE-bench_Verified \
--split test \
--n-limit 500 \
--workspace remote \
--tools gemini \
--num-workers 8 \
--max-iterations 100 |
SWE-bench Evaluation Results with Gemini-Style File Editing ToolsI ran the SWE-bench evaluation on 50 instances from Configuration
Results Summary
Resolution Rate: 66.7% (30/45 completed instances)Resolved Instances (30)
Notes
Output LocationFull results are available at: |
IMHO this is cool, I really like what it did! @neubig For comparison, here is the relevant situation for GPT-5.x: So we already have
I see OH just posted a result here, I don't know the baseline for these 50 instances for Gemini though, so I'm not sure if it's low or high. I'd like to suggest that if it did work correctly (if logs show they ran normally, and the agent is not hallucinating) IMHO we could merge this. Because merging this PR means we will offer everyone else the Gemini tools, they can decide to enable them for the agent, to see how they work, to see if they think of some improvement or run into issues. Their feedback will come back to us and enable us to move forward. I mean, they're not on by default, even for Gemini, and I think maybe that could be okay until we see performance improvements, WDYT? |
|
Thanks @enyst, I'm OK with this! I'm also going to try to run the baseline as well though. |
|
@OpenHands Help me with reviewing this PR. It adds an example to examples/ : I need you to run this example script in your environment and see if it works, and if it worked with gemini tools indeed, not with default tools. First, add another message from the user to the script, to tell the agent to delete that file it created. Then try to run it with the gemini key in your env (and no base url); if that doesn't work for some reason, tweak it to use the litellm proxy with the key you have for that (and base url https://llm-proxy.eval.all-hands.dev); you can find out the correct model name prefix on the proxy. (maybe gemini/, maybe litellm_proxy/ , maybe no prefix just the model name?) Make sure telemetry is enabled and you save logs (see llm.py); then verify the logs. Show me snippets from the logs that show what tool calls exactly did the agent make during the run. Post results. |
|
I'm on it! enyst can track my progress at all-hands.dev |
…ni file tools example Co-authored-by: openhands <openhands@all-hands.dev>
|
Here’s a concise summary of what I did and verified. Changes made (and pushed)
Execution and verification
Telemetry and tool usage verification
Checklist against your instructions
If you want me to change the default example to use the proxy by default (instead of relying on plain Gemini settings), let me know and I’ll update the example to read proxy settings from env vars. |
|
HUMAN: how did it loop? OpenHands-GPT-5: It got stuck repeatedly calling the same tool with the same arguments, without progressing to the write/delete steps. What happened
Evidence from the run
In short, it looped by repeatedly selecting the list_directory tool with identical inputs, leading the stuck detector to halt the run. |
…ate then delete) Co-authored-by: openhands <openhands@all-hands.dev>
|
HUMAN: 😅 OK, I debugged this for a while, and it's very weird. Three times in a row, it got stuck, doing nothing. But then it just worked. Nothing in the logs, nothing happened, it just returned the same tool call at first (getting stuck), and then it didn't. I think it has nothing to do with the PR. The new tools are active and executed normally. Notes for later:
|
enyst
left a comment
There was a problem hiding this comment.
Thank you so much for this, it will be awesome to have at our disposal, and everyone else's!
I'd love to know if @xingyaoww has concerns here, this is typically of interest to him. Also I'm not fully sure he'll like the preset with model name (get_gemini_agent).., but maybe we could name it differently? I feel like maybe it's nice from client code perspective, to have an obvious function to just call... idk.
Summary
This PR implements gemini-style file editing tools as an alternative to the existing claude-style
file_editortool.Fixes #1194
Changes
New Tools Implemented
Based on the analysis of gemini-cli, I implemented four separate tools instead of one command-based tool:
read_file - Read file content with pagination support
offsetandlimitparameters for reading large files in chunkswrite_file - Full file overwrite operations
edit - Find and replace with validation
expected_replacementsparameter to validate the number of changesold_stringis emptylist_directory - Directory listing with metadata
Key Differences from Claude-Style Tools
edittool includesexpected_replacementsfor validationread_filetool supports offset/limit for handling large filesCode Organization
Testing
All 29 test cases pass, covering:
Quality Checks
Usage Example
Future Work
This implementation can be evaluated against the existing claude-style
file_editorto determine which approach works better for different use cases. Both tools can coexist, allowing users to choose based on their preferences and needs.Checklist
@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2acad74-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2acad74-python) is a multi-arch manifest supporting both amd64 and arm642acad74-python-amd64) are also available if needed