Implement gemini-style file editing tools by neubig · Pull Request #1199 · OpenHands/software-agent-sdk

neubig · 2025-11-18T23:06:31Z

Summary

This PR implements gemini-style file editing tools as an alternative to the existing claude-style file_editor tool.

Fixes #1194

Changes

New Tools Implemented

Based on the analysis of gemini-cli, I implemented four separate tools instead of one command-based tool:

read_file - Read file content with pagination support
- Supports offset and limit parameters for reading large files in chunks
- Automatically numbers lines for better context
- Truncates large files with clear indicators
write_file - Full file overwrite operations
- Creates new files or overwrites existing ones
- Automatically creates parent directories
- Shows diff visualization between old and new content
edit - Find and replace with validation
- Performs exact string matching and replacement
- Optional expected_replacements parameter to validate the number of changes
- Creates new files when old_string is empty
- Shows diff visualization of changes
list_directory - Directory listing with metadata
- Lists files and directories with size and modification time
- Supports recursive listing (up to 2 levels deep)
- Handles both relative and absolute paths

Key Differences from Claude-Style Tools

Separate tools vs. commands: Each operation is a distinct tool rather than commands within a single tool
More explicit: Tool names directly indicate their purpose
Validation built-in: The edit tool includes expected_replacements for validation
Pagination: The read_file tool supports offset/limit for handling large files

Code Organization

openhands-tools/openhands/tools/gemini_file_editor/
├── __init__.py           # Module exports
├── read_file.py          # Read file tool definition
├── write_file.py         # Write file tool definition
├── edit.py               # Edit tool definition
├── list_directory.py     # List directory tool definition
└── executor.py           # All executor implementations

tests/tools/gemini_file_editor/
├── test_read_file.py     # 7 test cases
├── test_write_file.py    # 6 test cases
├── test_edit.py          # 9 test cases
└── test_list_directory.py # 7 test cases

Testing

All 29 test cases pass, covering:

✅ Basic functionality for each tool
✅ Error handling (file not found, invalid paths, etc.)
✅ Edge cases (empty files, large files, special characters)
✅ Pagination and truncation
✅ Recursive directory listing
✅ Find/replace validation

Quality Checks

✅ All tests passing
✅ Type checking with pyright
✅ Linting with ruff
✅ Code formatting with ruff format
✅ PEP8 style compliance

Usage Example

from openhands.tools.gemini_file_editor import (
    ReadFileTool,
    WriteFileTool,
    EditTool,
    ListDirectoryTool,
)

# Read a file
read_tool = ReadFileTool.create(conv_state)
action = ReadFileAction(file_path="example.py")
obs = await read_tool.executor(action)

# Edit a file
edit_tool = EditTool.create(conv_state)
action = EditAction(
    file_path="example.py",
    old_string="def old_function():",
    new_string="def new_function():",
    expected_replacements=1,
)
obs = await edit_tool.executor(action)

Future Work

This implementation can be evaluated against the existing claude-style file_editor to determine which approach works better for different use cases. Both tools can coexist, allowing users to choose based on their preferences and needs.

Checklist

Implementation complete for all four tools
Comprehensive tests added (29 test cases)
All tests passing
Pre-commit checks passing (ruff, pyright, pycodestyle)
Code follows repository conventions
Documentation in docstrings and tool descriptions

@neubig can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:2acad74-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-2acad74-python \
  ghcr.io/openhands/agent-server:2acad74-python

All tags pushed for this build

ghcr.io/openhands/agent-server:2acad74-golang-amd64
ghcr.io/openhands/agent-server:2acad74-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:2acad74-golang-arm64
ghcr.io/openhands/agent-server:2acad74-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:2acad74-java-amd64
ghcr.io/openhands/agent-server:2acad74-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:2acad74-java-arm64
ghcr.io/openhands/agent-server:2acad74-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:2acad74-python-amd64
ghcr.io/openhands/agent-server:2acad74-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:2acad74-python-arm64
ghcr.io/openhands/agent-server:2acad74-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:2acad74-golang
ghcr.io/openhands/agent-server:2acad74-java
ghcr.io/openhands/agent-server:2acad74-python

About Multi-Architecture Support

Each variant tag (e.g., 2acad74-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 2acad74-python-amd64) are also available if needed

- Created separate tools (read_file, write_file, edit, list_directory) instead of one command-based tool - read_file: Supports pagination with offset/limit parameters - write_file: Full file overwrites with automatic directory creation - edit: Find/replace functionality with expected_replacements validation - list_directory: Lists directory contents with metadata (size, modified time, recursive option) - Added comprehensive tests covering success and error cases - All tools follow gemini-cli's design philosophy Fixes #1194 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-11-18T23:13:49Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-agent-server/openhands/agent_server
tool_router.py	13	2	84%	21–22
openhands-tools/openhands/tools/gemini/edit
definition.py	46	22	52%	68, 70–73, 75–78, 80–81, 89–90, 92–93, 100–101, 145, 147, 149–150, 156
impl.py	54	46	14%	24, 41–44, 47–48, 50, 53–55, 64, 66, 69–70, 72, 81–82, 86–87, 93–94, 102–103, 108, 110–111, 114–115, 124, 126–127, 137–138, 141, 151, 154–155, 165–166, 168–169, 178–179, 183–184
openhands-tools/openhands/tools/gemini/list_directory
definition.py	63	34	46%	64, 66–69, 71–73, 75–76, 79–81, 83–85, 88, 90, 93, 97, 99–100, 105, 109–114, 153, 155, 159–160, 166
impl.py	63	54	14%	32, 49–50, 53–54, 56, 59–60, 66–67, 72–73, 75, 77–82, 85–89, 98–99, 102–106, 115–116, 118–119, 122–125, 135–138, 140–141, 143–146, 148, 156–157, 161–162
openhands-tools/openhands/tools/gemini/read_file
definition.py	40	17	57%	66, 68–71, 73–74, 76–78, 86–87, 120, 122, 124–125, 131
impl.py	48	40	16%	28, 45–47, 50–51, 53, 56–57, 65–66, 73, 75–76, 78, 81–82, 93–95, 98, 101, 104–107, 110–111, 113–115, 118–119, 124, 133–134, 140–141, 147–148
openhands-tools/openhands/tools/gemini/write_file
definition.py	43	22	48%	50, 52–55, 57–60, 62–63, 65–66, 68–69, 76–77, 112, 114, 116–117, 123
impl.py	34	26	23%	27, 44–45, 48–49, 51, 54–55, 61–68, 70, 72, 75–76, 78–79, 87–88, 92–93
openhands-tools/openhands/tools/preset
gemini.py	37	14	62%	55, 57, 63–64, 66, 74–75, 77–78, 83–84, 92, 95, 103
TOTAL	13859	6484	53%

blacksmith-sh · 2025-11-25T13:01:14Z

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

blacksmith-sh · 2025-11-26T13:00:55Z

[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help!

openhands-tools/openhands/tools/gemini_file_editor/__init__.py

openhands-tools/openhands/tools/gemini_file_editor/list_directory.py

blacksmith-sh · 2025-12-01T13:01:13Z

[Automatic Post]: This PR seems to be currently waiting for review. @jpshackelford, @enyst, could you please take a look when you have a chance?

blacksmith-sh · 2025-12-15T13:07:13Z

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

neubig · 2025-12-15T15:40:25Z

I'd really like to get this working, will try to make some time.

neubig · 2025-12-19T16:42:42Z

@OpenHands reflect enyst's comments.

Once that is done, I'd like you to run the integration tests with these new tools with the gemini-3-pro language model. I will provide you an API key.

openhands-ai · 2025-12-19T16:42:52Z

I'm on it! neubig can track my progress at all-hands.dev

Address review comments from enyst: - Split gemini_file_editor into separate tool directories (read_file, write_file, edit, list_directory) - Each tool now follows the same structure as other tools in the repo - Removed rich.console usage from list_directory, using plain text formatting instead - Updated tests to use new import paths and converted to synchronous tests Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-12-19T16:55:31Z

Summary

I've addressed enyst's review comments on PR #1199 by restructuring the gemini-style file editing tools. The changes have been committed and pushed to the openhands/implement-gemini-file-editor branch.

Changes Made

1. Restructured tools into separate directories (addressing enyst's comment #1)

Split the monolithic gemini_file_editor/ directory into four separate tool directories:
- read_file/ - with __init__.py, definition.py, impl.py
- write_file/ - with __init__.py, definition.py, impl.py
- edit/ - with __init__.py, definition.py, impl.py
- list_directory/ - with __init__.py, definition.py, impl.py
Each tool now follows the same structure as other tools in the repository (e.g., glob/, file_editor/)

2. Removed rich.console usage (addressing enyst's comment #2)

Simplified the list_directory visualization to use plain text formatting instead of rich.console
The visualization now uses simple string formatting for the table display

3. Updated tests

Moved tests to corresponding directories (tests/tools/read_file/, etc.)
Updated import paths to use new module structure
Converted async tests to synchronous tests (matching the synchronous ToolExecutor interface)

4. All checks pass

✅ 29 unit tests passing
✅ Pre-commit hooks (ruff format, ruff lint, pycodestyle, pyright) all pass

Regarding Integration Tests

The integration tests require LLM_API_KEY and LLM_BASE_URL environment variables. To run integration tests with gemini-3-pro, you would need to:

Set the API key: export LLM_API_KEY=<your-api-key>
Set the base URL: export LLM_BASE_URL=<gemini-api-base-url>
Run: uv run python tests/integration/run_infer.py --llm-config '{"model": "gemini-3-pro"}'

Note: The current integration tests use the existing FileEditorTool and TerminalTool. To test the new gemini-style tools specifically, a new integration test would need to be created that uses ReadFileTool, WriteFileTool, EditTool, and ListDirectoryTool.

PR Link: #1199

_{View full conversation}

Restructure the gemini-style file editing tools to be under a 'gemini' parent directory for better organization and to clearly indicate these are gemini-style tools. Structure: - openhands/tools/gemini/ - read_file/ - write_file/ - edit/ - list_directory/ Co-authored-by: openhands <openhands@all-hands.dev>

Add a GEMINI_FILE_TOOLS constant that makes it easy to swap the claude-style FileEditorTool for gemini-style tools with a one-line change: # Before (claude-style) tools=[Tool(name=FileEditorTool.name)] # After (gemini-style) tools=[*GEMINI_FILE_TOOLS] Also add example 30_gemini_file_tools.py demonstrating the usage. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2025-12-19T20:02:52Z

This needs eval.

neubig · 2025-12-19T20:48:44Z

@OpenHands download and understand the OpenHands/benchmarks repo. Think carefully about how you can run this PR with the swe-bench benchmark, paying particular attention to what you need to do to make sure that the tools introduced in this benchmark are used instead of the default editing tools. I'll provide you with the API keys you need to perform the evaluation. Run on 50 examples first.

openhands-ai · 2025-12-19T20:48:53Z

I'm on it! neubig can track my progress at all-hands.dev

Add gemini preset with register_gemini_tools function and register gemini tools in the agent server's tool_router.py so they are available when using remote workspace. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-12-19T22:16:01Z

Summary: SWE-bench Evaluation with Gemini-Style File Editing Tools

I successfully ran the SWE-bench benchmark evaluation using the gemini-style file editing tools from PR #1199 (openhands/implement-gemini-file-editor branch). Here's what was accomplished:

Evaluation Results

Metric	Value
Total Instances	50 (requested)
Instances with Patches	51
Instances with Errors	4 (due to transient Docker build failures)
Model	`litellm_proxy/gemini-3-pro-preview`
Tools Preset	`gemini` (ReadFileTool, WriteFileTool, EditTool, ListDirectoryTool)
Max Iterations	100
Workers	8

Changes Made to Enable Gemini Tools Evaluation

Created gemini tools preset (openhands-tools/openhands/tools/preset/gemini.py):
- register_gemini_tools() - Registers gemini tools with the tool router
- get_gemini_tools() - Returns list of gemini tools for the agent
- get_gemini_condenser() - Returns condenser configured for gemini tools
- get_gemini_agent() - Returns agent configured with gemini tools
Modified tool_router.py to register gemini tools in the agent server:
- Added import for register_gemini_tools
- Called register_gemini_tools(enable_browser=True) during initialization
Added --tools argument to run_infer.py:
- Choices: default (claude-style) or gemini (gemini-style)
- Added tools_preset field to EvalMetadata model
- Added _get_tools() method to select tools based on preset
Created LLM config (.llm_config/gemini-3-pro.json) for the evaluation

Output Location

The evaluation results are stored at:

/workspace/project/benchmarks/outputs/gemini-tools-eval/princeton-nlp__SWE-bench_Verified-test/litellm_proxy/gemini-3-pro-preview_sdk_c3ade03_maxiter_100_N_initial/

Key files:

output.critic_attempt_1.jsonl - Full evaluation results (55 instances)
output_errors.jsonl - 4 instances that failed due to missing images
conversations/ - Conversation archives for each instance
logs/ - Detailed logs for each instance

Notes

The 4 errors were due to transient Docker buildx failures during image building (not code issues)
The gemini tools (ReadFileTool, WriteFileTool, EditTool, ListDirectoryTool) were successfully used instead of the default claude-style FileEditorTool
The changes have been committed and pushed to the openhands/implement-gemini-file-editor branch (commit c3ade0323ad5574ea51c55d2fa5955cbdcb2d528)

To run the full SWE-bench evaluation or with different parameters, use:

uv run python benchmarks/swebench/run_infer.py \
  .llm_config/gemini-3-pro.json \
  --dataset princeton-nlp/SWE-bench_Verified \
  --split test \
  --n-limit 500 \
  --workspace remote \
  --tools gemini \
  --num-workers 8 \
  --max-iterations 100

_{View full conversation}

neubig · 2025-12-20T04:15:18Z

SWE-bench Evaluation Results with Gemini-Style File Editing Tools

I ran the SWE-bench evaluation on 50 instances from princeton-nlp/SWE-bench_Verified using the gemini-style file editing tools introduced in this PR.

Configuration

Model: gemini-3-pro-preview (via litellm_proxy)
Tools: Gemini-style (read_file, write_file, edit, list_directory)
Max iterations: 100
SDK commit: c3ade03

Results Summary

Metric	Count
Total instances submitted	50
Instances completed	45
Instances resolved ✅	30
Instances unresolved ❌	15
Empty patches	4
Errors	1

Resolution Rate: 66.7% (30/45 completed instances)

Resolved Instances (30)

astropy__astropy-12907, astropy__astropy-13236, astropy__astropy-14309
django__django-11095, django__django-11133, django__django-11999, django__django-12050, django__django-12155, django__django-12262, django__django-13158, django__django-13279, django__django-13670, django__django-14434, django__django-14999, django__django-15278
pytest-dev__pytest-5262, pytest-dev__pytest-5631, pytest-dev__pytest-5809
scikit-learn__scikit-learn-12973, scikit-learn__scikit-learn-13439, scikit-learn__scikit-learn-14983, scikit-learn__scikit-learn-25232, scikit-learn__scikit-learn-25973
sphinx-doc__sphinx-10466, sphinx-doc__sphinx-7757, sphinx-doc__sphinx-8551, sphinx-doc__sphinx-8621
sympy__sympy-13877, sympy__sympy-15599, sympy__sympy-23534

Notes

4 instances had empty patches due to Docker image build failures during the image preparation step (transient infrastructure issues)
1 instance (matplotlib__matplotlib-25775) had an error during evaluation (missing Docker image)
The gemini-style tools were successfully used by the agent throughout the evaluation

Output Location

Full results are available at:
outputs/gemini-tools-eval/princeton-nlp__SWE-bench_Verified-test/litellm_proxy/gemini-3-pro-preview_sdk_c3ade03_maxiter_100_N_initial/

enyst · 2025-12-20T04:52:10Z

Changes Made to Enable Gemini Tools Evaluation

IMHO this is cool, I really like what it did!

@neubig For comparison, here is the relevant situation for GPT-5.x:

So we already have apply_patch tools, the GPT equivalent of this PR, merged in main. But we don't have a preset for them. I picked up the preset idea from this PR, and I had the agent implement it for GPT-5 (PR 1462):

define and register the preset
it's not default, even for GPTs
the user, such as benchmarks client repo, or CLI, can choose the set of tools to use.

I see OH just posted a result here, I don't know the baseline for these 50 instances for Gemini though, so I'm not sure if it's low or high.

I'd like to suggest that if it did work correctly (if logs show they ran normally, and the agent is not hallucinating) IMHO we could merge this. Because merging this PR means we will offer everyone else the Gemini tools, they can decide to enable them for the agent, to see how they work, to see if they think of some improvement or run into issues. Their feedback will come back to us and enable us to move forward.

I mean, they're not on by default, even for Gemini, and I think maybe that could be okay until we see performance improvements, WDYT?

neubig · 2025-12-20T14:48:50Z

Thanks @enyst, I'm OK with this! I'm also going to try to run the baseline as well though.

examples/01_standalone_sdk/30_gemini_file_tools.py

enyst · 2025-12-21T02:27:56Z

@OpenHands Help me with reviewing this PR. It adds an example to examples/ : I need you to run this example script in your environment and see if it works, and if it worked with gemini tools indeed, not with default tools.

First, add another message from the user to the script, to tell the agent to delete that file it created.

Then try to run it with the gemini key in your env (and no base url); if that doesn't work for some reason, tweak it to use the litellm proxy with the key you have for that (and base url https://llm-proxy.eval.all-hands.dev); you can find out the correct model name prefix on the proxy. (maybe gemini/, maybe litellm_proxy/ , maybe no prefix just the model name?)

Make sure telemetry is enabled and you save logs (see llm.py); then verify the logs. Show me snippets from the logs that show what tool calls exactly did the agent make during the run. Post results.

openhands-ai · 2025-12-21T02:28:05Z

I'm on it! enyst can track my progress at all-hands.dev

…ni file tools example Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-12-21T02:47:56Z

Here’s a concise summary of what I did and verified.

Changes made (and pushed)

File changed: examples/01_standalone_sdk/30_gemini_file_tools.py
- Added a second user message to delete the created file:
  conversation.send_message("Now delete the FACTS.txt file you just created.")
- Enabled telemetry logging:
  LLM(..., log_completions=True)
Commit on branch openhands/implement-gemini-file-editor:
- examples: add deletion follow-up message and enable telemetry in gemini file tools example
- Co-authored-by: openhands openhands@all-hands.dev
Scope: only this file was modified; changes are minimal and directly address the request.

Execution and verification

Run 1: Direct Gemini (GEMINI_API_KEY, no base URL)
- Model: gemini/gemini-3-pro-preview
- Outcome: The agent looped on list_directory and did not complete the write/delete workflow. Telemetry logs saved.
Run 2: LiteLLM proxy (LITELLM_API_KEY + base URL)
- Model and base URL used:
  - LLM_MODEL="litellm_proxy/gemini/gemini-3-pro-preview"
  - LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev"
- Outcome: Success. The agent:
  - Wrote FACTS.txt (with 3 facts about the project)
  - Deleted FACTS.txt via terminal (rm FACTS.txt)
  - Verified deletion (ls FACTS.txt -> not found)
- Confirmed model prefix via the proxy’s /v1/model/info endpoint; litellm_proxy/ prefix is correct.

Telemetry and tool usage verification

Telemetry logging enabled and saved to logs/completions/
Loaded tools at runtime:
- ['terminal', 'read_file', 'write_file', 'edit', 'list_directory']
Tool calls captured in logs for the successful run included:
- ["list_directory"]
- ["read_file"]
- ["write_file"]
- ["terminal"]
- ["finish"]
Example snippet showing explicit gemini tool use (write_file):
- model: litellm_proxy/gemini-3-pro-preview
- tool_calls:
  - name: write_file
  - args:
    {"content": "1. The OpenHands Software Agent SDK is a set of Python and REST APIs for building agents that work with code.\n2. Agents can use the local machine as their workspace or run inside ephemeral workspaces using the Agent Server.\n3. It is the engine behind the OpenHands CLI and OpenHands Cloud.\n",
    "security_risk": "LOW",
    "file_path": "FACTS.txt"}
No usage of the default claude-style file_editor was observed.

Checklist against your instructions

Add a message instructing deletion of the created file: done.
Try with Gemini key (no base URL): done; it didn’t complete the task (looped on list_directory).
If that fails, switch to LiteLLM proxy with key and base URL; discover correct model prefix: done. Used litellm_proxy/gemini/gemini-3-pro-preview and https://llm-proxy.eval.all-hands.dev.
Ensure telemetry is enabled and save logs: done.
Verify logs and show snippets of actual tool calls: done (included above).
Confirm it worked with gemini tools, not default tools: verified via loaded tools and tool_calls in logs.

If you want me to change the default example to use the proxy by default (instead of relying on plain Gemini settings), let me know and I’ll update the example to read proxy settings from env vars.

_{View full conversation}

enyst · 2025-12-21T02:57:59Z

HUMAN: how did it loop?

OpenHands-GPT-5:

It got stuck repeatedly calling the same tool with the same arguments, without progressing to the write/delete steps.

What happened

The model issued multiple consecutive ListDirectoryAction calls with dir_path="." and recursive=False.
Each call returned the same directory listing observation.
The model then asked for list_directory again instead of moving on to write_file or terminal.
After several such action→observation cycles, the stuck detector triggered and the run stopped.

Evidence from the run

Console trace showed repeated “Agent Action: ListDirectoryAction … dir_path: '.' … recursive: False” three times in a row, followed by:
- WARNING openhands.sdk.conversation.stuck_detector: Action, Observation loop detected
- WARNING openhands.sdk.conversation.impl.local_conversation: Stuck pattern detected.
Telemetry logs for the direct Gemini run (no base URL) all recorded the same tool call:
- gemini__gemini-3-pro-preview-1766284619.592-91aa.json → "list_directory"
- gemini__gemini-3-pro-preview-1766284621.939-4cfc.json → "list_directory"
- gemini__gemini-3-pro-preview-1766284624.238-efa3.json → "list_directory"
- gemini__gemini-3-pro-preview-1766284626.609-1f46.json → "list_directory"

In short, it looped by repeatedly selecting the list_directory tool with identical inputs, leading the stuck detector to halt the run.

…ate then delete) Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2025-12-21T04:04:45Z

HUMAN: 😅

OK, I debugged this for a while, and it's very weird. Three times in a row, it got stuck, doing nothing. But then it just worked. Nothing in the logs, nothing happened, it just returned the same tool call at first (getting stuck), and then it didn't.

I think it has nothing to do with the PR. The new tools are active and executed normally.

Notes for later:

only AIStudio Gemini got stuck; Vertex / proxy works as expected
when it got stuck, the logs showed empty thinking_blocks every tool call; when it succeeded, the logs show some empty thinking_blocks, but some tool calls don't have them at all (?)
Vertex logs show some non-empty thinking_blocks, even if it's just the tool call repeated, the LLM is using those.

enyst

Thank you so much for this, it will be awesome to have at our disposal, and everyone else's!

I'd love to know if @xingyaoww has concerns here, this is typically of interest to him. Also I'm not fully sure he'll like the preset with model name (get_gemini_agent).., but maybe we could name it differently? I feel like maybe it's nice from client code perspective, to have an obvious function to just call... idk.

openhands-ai bot mentioned this pull request Nov 18, 2025

Inspect gemini-cli and implement gemini-style file editing tools #1194

Closed

neubig mentioned this pull request Nov 19, 2025

Investigate Gemini-3 Score Discrepancy OpenHands/benchmarks#106

Open

blacksmith-sh bot requested a review from jpshackelford November 26, 2025 13:00

enyst reviewed Nov 28, 2025

View reviewed changes

openhands-tools/openhands/tools/gemini_file_editor/__init__.py Outdated Show resolved Hide resolved

enyst reviewed Nov 28, 2025

View reviewed changes

openhands-tools/openhands/tools/gemini_file_editor/list_directory.py Outdated Show resolved Hide resolved

neubig marked this pull request as draft December 3, 2025 04:17

enyst mentioned this pull request Dec 7, 2025

tool: add image reader tool for local vision inputs #1306

Closed

jpshackelford removed their request for review December 8, 2025 13:22

neubig and others added 2 commits December 19, 2025 11:46

Merge branch 'main' into openhands/implement-gemini-file-editor

d133d5f

neubig and others added 3 commits December 19, 2025 14:42

Merge branch 'main' into openhands/implement-gemini-file-editor

7b06ebb

Register gemini tools in agent server

c3ade03

Add gemini preset with register_gemini_tools function and register gemini tools in the agent server's tool_router.py so they are available when using remote workspace. Co-authored-by: openhands <openhands@all-hands.dev>

This was referenced Dec 20, 2025

feat(preset): model-aware default tools for GPT-5 (apply_patch over file_editor) #1281

Closed

Add GPT-5 preset using ApplyPatchTool (opt-in) #1462

Merged

Merge branch 'main' into openhands/implement-gemini-file-editor

e9b7272

neubig marked this pull request as ready for review December 20, 2025 14:48

neubig requested a review from enyst December 20, 2025 14:48

neubig enabled auto-merge (squash) December 20, 2025 20:40

enyst reviewed Dec 21, 2025

View reviewed changes

examples/01_standalone_sdk/30_gemini_file_tools.py Outdated Show resolved Hide resolved

Update examples/01_standalone_sdk/30_gemini_file_tools.py

835659a

examples: add deletion follow-up message and enable telemetry in gemi…

385fd20

…ni file tools example Co-authored-by: openhands <openhands@all-hands.dev>

examples: run the two user messages in separate runs for clarity (cre…

b6eb656

…ate then delete) Co-authored-by: openhands <openhands@all-hands.dev>

enyst approved these changes Dec 21, 2025

View reviewed changes

neubig merged commit 1134974 into main Dec 21, 2025
21 of 22 checks passed

neubig deleted the openhands/implement-gemini-file-editor branch December 21, 2025 04:10

OpenHands deleted a comment from openhands-ai bot Dec 21, 2025

Comments

Conversation

neubig commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Tools Implemented

Key Differences from Claude-Style Tools

Code Organization

Testing

Quality Checks

Usage Example

Future Work

Checklist

Uh oh!

github-actions bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blacksmith-sh bot commented Nov 25, 2025

Uh oh!

blacksmith-sh bot commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

blacksmith-sh bot commented Dec 1, 2025

Uh oh!

blacksmith-sh bot commented Dec 15, 2025

Uh oh!

neubig commented Dec 15, 2025

Uh oh!

neubig commented Dec 19, 2025

Uh oh!

openhands-ai bot commented Dec 19, 2025

Uh oh!

openhands-ai bot commented Dec 19, 2025

Summary

Changes Made

Regarding Integration Tests

Uh oh!

neubig commented Dec 19, 2025

Uh oh!

neubig commented Dec 19, 2025

Uh oh!

openhands-ai bot commented Dec 19, 2025

Uh oh!

openhands-ai bot commented Dec 19, 2025

Summary: SWE-bench Evaluation with Gemini-Style File Editing Tools

Evaluation Results

Changes Made to Enable Gemini Tools Evaluation

Output Location

Notes

Uh oh!

neubig commented Dec 20, 2025

SWE-bench Evaluation Results with Gemini-Style File Editing Tools

Configuration

Results Summary

Resolution Rate: 66.7% (30/45 completed instances)

Resolved Instances (30)

Notes

Output Location

Uh oh!

enyst commented Dec 20, 2025

Uh oh!

neubig commented Dec 20, 2025

Uh oh!

Uh oh!

enyst commented Dec 21, 2025

Uh oh!

openhands-ai bot commented Dec 21, 2025

Uh oh!

openhands-ai bot commented Dec 21, 2025

Uh oh!

enyst commented Dec 21, 2025

Uh oh!

enyst commented Dec 21, 2025

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neubig commented Nov 18, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Nov 18, 2025 •

edited

Loading