Skip to content

Comments

Implement gemini-style file editing tools#1199

Merged
neubig merged 11 commits intomainfrom
openhands/implement-gemini-file-editor
Dec 21, 2025
Merged

Implement gemini-style file editing tools#1199
neubig merged 11 commits intomainfrom
openhands/implement-gemini-file-editor

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Nov 18, 2025

Summary

This PR implements gemini-style file editing tools as an alternative to the existing claude-style file_editor tool.

Fixes #1194

Changes

New Tools Implemented

Based on the analysis of gemini-cli, I implemented four separate tools instead of one command-based tool:

  1. read_file - Read file content with pagination support

    • Supports offset and limit parameters for reading large files in chunks
    • Automatically numbers lines for better context
    • Truncates large files with clear indicators
  2. write_file - Full file overwrite operations

    • Creates new files or overwrites existing ones
    • Automatically creates parent directories
    • Shows diff visualization between old and new content
  3. edit - Find and replace with validation

    • Performs exact string matching and replacement
    • Optional expected_replacements parameter to validate the number of changes
    • Creates new files when old_string is empty
    • Shows diff visualization of changes
  4. list_directory - Directory listing with metadata

    • Lists files and directories with size and modification time
    • Supports recursive listing (up to 2 levels deep)
    • Handles both relative and absolute paths

Key Differences from Claude-Style Tools

  • Separate tools vs. commands: Each operation is a distinct tool rather than commands within a single tool
  • More explicit: Tool names directly indicate their purpose
  • Validation built-in: The edit tool includes expected_replacements for validation
  • Pagination: The read_file tool supports offset/limit for handling large files

Code Organization

openhands-tools/openhands/tools/gemini_file_editor/
├── __init__.py           # Module exports
├── read_file.py          # Read file tool definition
├── write_file.py         # Write file tool definition
├── edit.py               # Edit tool definition
├── list_directory.py     # List directory tool definition
└── executor.py           # All executor implementations

tests/tools/gemini_file_editor/
├── test_read_file.py     # 7 test cases
├── test_write_file.py    # 6 test cases
├── test_edit.py          # 9 test cases
└── test_list_directory.py # 7 test cases

Testing

All 29 test cases pass, covering:

  • ✅ Basic functionality for each tool
  • ✅ Error handling (file not found, invalid paths, etc.)
  • ✅ Edge cases (empty files, large files, special characters)
  • ✅ Pagination and truncation
  • ✅ Recursive directory listing
  • ✅ Find/replace validation

Quality Checks

  • ✅ All tests passing
  • ✅ Type checking with pyright
  • ✅ Linting with ruff
  • ✅ Code formatting with ruff format
  • ✅ PEP8 style compliance

Usage Example

from openhands.tools.gemini_file_editor import (
    ReadFileTool,
    WriteFileTool,
    EditTool,
    ListDirectoryTool,
)

# Read a file
read_tool = ReadFileTool.create(conv_state)
action = ReadFileAction(file_path="example.py")
obs = await read_tool.executor(action)

# Edit a file
edit_tool = EditTool.create(conv_state)
action = EditAction(
    file_path="example.py",
    old_string="def old_function():",
    new_string="def new_function():",
    expected_replacements=1,
)
obs = await edit_tool.executor(action)

Future Work

This implementation can be evaluated against the existing claude-style file_editor to determine which approach works better for different use cases. Both tools can coexist, allowing users to choose based on their preferences and needs.

Checklist

  • Implementation complete for all four tools
  • Comprehensive tests added (29 test cases)
  • All tests passing
  • Pre-commit checks passing (ruff, pyright, pycodestyle)
  • Code follows repository conventions
  • Documentation in docstrings and tool descriptions

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:2acad74-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-2acad74-python \
  ghcr.io/openhands/agent-server:2acad74-python

All tags pushed for this build

ghcr.io/openhands/agent-server:2acad74-golang-amd64
ghcr.io/openhands/agent-server:2acad74-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:2acad74-golang-arm64
ghcr.io/openhands/agent-server:2acad74-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:2acad74-java-amd64
ghcr.io/openhands/agent-server:2acad74-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:2acad74-java-arm64
ghcr.io/openhands/agent-server:2acad74-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:2acad74-python-amd64
ghcr.io/openhands/agent-server:2acad74-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:2acad74-python-arm64
ghcr.io/openhands/agent-server:2acad74-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:2acad74-golang
ghcr.io/openhands/agent-server:2acad74-java
ghcr.io/openhands/agent-server:2acad74-python

About Multi-Architecture Support

  • Each variant tag (e.g., 2acad74-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 2acad74-python-amd64) are also available if needed

- Created separate tools (read_file, write_file, edit, list_directory) instead of one command-based tool
- read_file: Supports pagination with offset/limit parameters
- write_file: Full file overwrites with automatic directory creation
- edit: Find/replace functionality with expected_replacements validation
- list_directory: Lists directory contents with metadata (size, modified time, recursive option)
- Added comprehensive tests covering success and error cases
- All tools follow gemini-cli's design philosophy

Fixes #1194

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 18, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   tool_router.py13284%21–22
openhands-tools/openhands/tools/gemini/edit
   definition.py462252%68, 70–73, 75–78, 80–81, 89–90, 92–93, 100–101, 145, 147, 149–150, 156
   impl.py544614%24, 41–44, 47–48, 50, 53–55, 64, 66, 69–70, 72, 81–82, 86–87, 93–94, 102–103, 108, 110–111, 114–115, 124, 126–127, 137–138, 141, 151, 154–155, 165–166, 168–169, 178–179, 183–184
openhands-tools/openhands/tools/gemini/list_directory
   definition.py633446%64, 66–69, 71–73, 75–76, 79–81, 83–85, 88, 90, 93, 97, 99–100, 105, 109–114, 153, 155, 159–160, 166
   impl.py635414%32, 49–50, 53–54, 56, 59–60, 66–67, 72–73, 75, 77–82, 85–89, 98–99, 102–106, 115–116, 118–119, 122–125, 135–138, 140–141, 143–146, 148, 156–157, 161–162
openhands-tools/openhands/tools/gemini/read_file
   definition.py401757%66, 68–71, 73–74, 76–78, 86–87, 120, 122, 124–125, 131
   impl.py484016%28, 45–47, 50–51, 53, 56–57, 65–66, 73, 75–76, 78, 81–82, 93–95, 98, 101, 104–107, 110–111, 113–115, 118–119, 124, 133–134, 140–141, 147–148
openhands-tools/openhands/tools/gemini/write_file
   definition.py432248%50, 52–55, 57–60, 62–63, 65–66, 68–69, 76–77, 112, 114, 116–117, 123
   impl.py342623%27, 44–45, 48–49, 51, 54–55, 61–68, 70, 72, 75–76, 78–79, 87–88, 92–93
openhands-tools/openhands/tools/preset
   gemini.py371462%55, 57, 63–64, 66, 74–75, 77–78, 83–84, 92, 95, 103
TOTAL13859648453% 

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 25, 2025

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 26, 2025

[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help!

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Dec 1, 2025

[Automatic Post]: This PR seems to be currently waiting for review. @jpshackelford, @enyst, could you please take a look when you have a chance?

@neubig neubig marked this pull request as draft December 3, 2025 04:17
@jpshackelford jpshackelford removed their request for review December 8, 2025 13:22
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Dec 15, 2025

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@neubig
Copy link
Contributor Author

neubig commented Dec 15, 2025

I'd really like to get this working, will try to make some time.

@neubig
Copy link
Contributor Author

neubig commented Dec 19, 2025

@OpenHands reflect enyst's comments.

Once that is done, I'd like you to run the integration tests with these new tools with the gemini-3-pro language model. I will provide you an API key.

@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

I'm on it! neubig can track my progress at all-hands.dev

neubig and others added 2 commits December 19, 2025 11:46
Address review comments from enyst:
- Split gemini_file_editor into separate tool directories (read_file, write_file, edit, list_directory)
- Each tool now follows the same structure as other tools in the repo
- Removed rich.console usage from list_directory, using plain text formatting instead
- Updated tests to use new import paths and converted to synchronous tests

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

Summary

I've addressed enyst's review comments on PR #1199 by restructuring the gemini-style file editing tools. The changes have been committed and pushed to the openhands/implement-gemini-file-editor branch.

Changes Made

1. Restructured tools into separate directories (addressing enyst's comment #1)

  • Split the monolithic gemini_file_editor/ directory into four separate tool directories:
    • read_file/ - with __init__.py, definition.py, impl.py
    • write_file/ - with __init__.py, definition.py, impl.py
    • edit/ - with __init__.py, definition.py, impl.py
    • list_directory/ - with __init__.py, definition.py, impl.py
  • Each tool now follows the same structure as other tools in the repository (e.g., glob/, file_editor/)

2. Removed rich.console usage (addressing enyst's comment #2)

  • Simplified the list_directory visualization to use plain text formatting instead of rich.console
  • The visualization now uses simple string formatting for the table display

3. Updated tests

  • Moved tests to corresponding directories (tests/tools/read_file/, etc.)
  • Updated import paths to use new module structure
  • Converted async tests to synchronous tests (matching the synchronous ToolExecutor interface)

4. All checks pass

  • ✅ 29 unit tests passing
  • ✅ Pre-commit hooks (ruff format, ruff lint, pycodestyle, pyright) all pass

Regarding Integration Tests

The integration tests require LLM_API_KEY and LLM_BASE_URL environment variables. To run integration tests with gemini-3-pro, you would need to:

  1. Set the API key: export LLM_API_KEY=<your-api-key>
  2. Set the base URL: export LLM_BASE_URL=<gemini-api-base-url>
  3. Run: uv run python tests/integration/run_infer.py --llm-config '{"model": "gemini-3-pro"}'

Note: The current integration tests use the existing FileEditorTool and TerminalTool. To test the new gemini-style tools specifically, a new integration test would need to be created that uses ReadFileTool, WriteFileTool, EditTool, and ListDirectoryTool.

PR Link: #1199

View full conversation

neubig and others added 3 commits December 19, 2025 14:42
Restructure the gemini-style file editing tools to be under a 'gemini'
parent directory for better organization and to clearly indicate these
are gemini-style tools.

Structure:
- openhands/tools/gemini/
  - read_file/
  - write_file/
  - edit/
  - list_directory/

Co-authored-by: openhands <openhands@all-hands.dev>
Add a GEMINI_FILE_TOOLS constant that makes it easy to swap the
claude-style FileEditorTool for gemini-style tools with a one-line change:

    # Before (claude-style)
    tools=[Tool(name=FileEditorTool.name)]

    # After (gemini-style)
    tools=[*GEMINI_FILE_TOOLS]

Also add example 30_gemini_file_tools.py demonstrating the usage.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Contributor Author

neubig commented Dec 19, 2025

This needs eval.

@neubig
Copy link
Contributor Author

neubig commented Dec 19, 2025

@OpenHands download and understand the OpenHands/benchmarks repo. Think carefully about how you can run this PR with the swe-bench benchmark, paying particular attention to what you need to do to make sure that the tools introduced in this benchmark are used instead of the default editing tools. I'll provide you with the API keys you need to perform the evaluation. Run on 50 examples first.

@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

I'm on it! neubig can track my progress at all-hands.dev

Add gemini preset with register_gemini_tools function and register
gemini tools in the agent server's tool_router.py so they are available
when using remote workspace.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

Summary: SWE-bench Evaluation with Gemini-Style File Editing Tools

I successfully ran the SWE-bench benchmark evaluation using the gemini-style file editing tools from PR #1199 (openhands/implement-gemini-file-editor branch). Here's what was accomplished:

Evaluation Results

Metric Value
Total Instances 50 (requested)
Instances with Patches 51
Instances with Errors 4 (due to transient Docker build failures)
Model litellm_proxy/gemini-3-pro-preview
Tools Preset gemini (ReadFileTool, WriteFileTool, EditTool, ListDirectoryTool)
Max Iterations 100
Workers 8

Changes Made to Enable Gemini Tools Evaluation

  1. Created gemini tools preset (openhands-tools/openhands/tools/preset/gemini.py):

    • register_gemini_tools() - Registers gemini tools with the tool router
    • get_gemini_tools() - Returns list of gemini tools for the agent
    • get_gemini_condenser() - Returns condenser configured for gemini tools
    • get_gemini_agent() - Returns agent configured with gemini tools
  2. Modified tool_router.py to register gemini tools in the agent server:

    • Added import for register_gemini_tools
    • Called register_gemini_tools(enable_browser=True) during initialization
  3. Added --tools argument to run_infer.py:

    • Choices: default (claude-style) or gemini (gemini-style)
    • Added tools_preset field to EvalMetadata model
    • Added _get_tools() method to select tools based on preset
  4. Created LLM config (.llm_config/gemini-3-pro.json) for the evaluation

Output Location

The evaluation results are stored at:

/workspace/project/benchmarks/outputs/gemini-tools-eval/princeton-nlp__SWE-bench_Verified-test/litellm_proxy/gemini-3-pro-preview_sdk_c3ade03_maxiter_100_N_initial/

Key files:

  • output.critic_attempt_1.jsonl - Full evaluation results (55 instances)
  • output_errors.jsonl - 4 instances that failed due to missing images
  • conversations/ - Conversation archives for each instance
  • logs/ - Detailed logs for each instance

Notes

  • The 4 errors were due to transient Docker buildx failures during image building (not code issues)
  • The gemini tools (ReadFileTool, WriteFileTool, EditTool, ListDirectoryTool) were successfully used instead of the default claude-style FileEditorTool
  • The changes have been committed and pushed to the openhands/implement-gemini-file-editor branch (commit c3ade0323ad5574ea51c55d2fa5955cbdcb2d528)

To run the full SWE-bench evaluation or with different parameters, use:

uv run python benchmarks/swebench/run_infer.py \
  .llm_config/gemini-3-pro.json \
  --dataset princeton-nlp/SWE-bench_Verified \
  --split test \
  --n-limit 500 \
  --workspace remote \
  --tools gemini \
  --num-workers 8 \
  --max-iterations 100

View full conversation

Copy link
Contributor Author

neubig commented Dec 20, 2025

SWE-bench Evaluation Results with Gemini-Style File Editing Tools

I ran the SWE-bench evaluation on 50 instances from princeton-nlp/SWE-bench_Verified using the gemini-style file editing tools introduced in this PR.

Configuration

  • Model: gemini-3-pro-preview (via litellm_proxy)
  • Tools: Gemini-style (read_file, write_file, edit, list_directory)
  • Max iterations: 100
  • SDK commit: c3ade03

Results Summary

Metric Count
Total instances submitted 50
Instances completed 45
Instances resolved 30
Instances unresolved 15
Empty patches 4
Errors 1

Resolution Rate: 66.7% (30/45 completed instances)

Resolved Instances (30)

  • astropy__astropy-12907, astropy__astropy-13236, astropy__astropy-14309
  • django__django-11095, django__django-11133, django__django-11999, django__django-12050, django__django-12155, django__django-12262, django__django-13158, django__django-13279, django__django-13670, django__django-14434, django__django-14999, django__django-15278
  • pytest-dev__pytest-5262, pytest-dev__pytest-5631, pytest-dev__pytest-5809
  • scikit-learn__scikit-learn-12973, scikit-learn__scikit-learn-13439, scikit-learn__scikit-learn-14983, scikit-learn__scikit-learn-25232, scikit-learn__scikit-learn-25973
  • sphinx-doc__sphinx-10466, sphinx-doc__sphinx-7757, sphinx-doc__sphinx-8551, sphinx-doc__sphinx-8621
  • sympy__sympy-13877, sympy__sympy-15599, sympy__sympy-23534

Notes

  • 4 instances had empty patches due to Docker image build failures during the image preparation step (transient infrastructure issues)
  • 1 instance (matplotlib__matplotlib-25775) had an error during evaluation (missing Docker image)
  • The gemini-style tools were successfully used by the agent throughout the evaluation

Output Location

Full results are available at:
outputs/gemini-tools-eval/princeton-nlp__SWE-bench_Verified-test/litellm_proxy/gemini-3-pro-preview_sdk_c3ade03_maxiter_100_N_initial/

@enyst
Copy link
Collaborator

enyst commented Dec 20, 2025

Changes Made to Enable Gemini Tools Evaluation

IMHO this is cool, I really like what it did!

@neubig For comparison, here is the relevant situation for GPT-5.x:

So we already have apply_patch tools, the GPT equivalent of this PR, merged in main. But we don't have a preset for them. I picked up the preset idea from this PR, and I had the agent implement it for GPT-5 (PR 1462):

  • define and register the preset
  • it's not default, even for GPTs
  • the user, such as benchmarks client repo, or CLI, can choose the set of tools to use.

I see OH just posted a result here, I don't know the baseline for these 50 instances for Gemini though, so I'm not sure if it's low or high.

I'd like to suggest that if it did work correctly (if logs show they ran normally, and the agent is not hallucinating) IMHO we could merge this. Because merging this PR means we will offer everyone else the Gemini tools, they can decide to enable them for the agent, to see how they work, to see if they think of some improvement or run into issues. Their feedback will come back to us and enable us to move forward.

I mean, they're not on by default, even for Gemini, and I think maybe that could be okay until we see performance improvements, WDYT?

@neubig
Copy link
Contributor Author

neubig commented Dec 20, 2025

Thanks @enyst, I'm OK with this! I'm also going to try to run the baseline as well though.

@neubig neubig marked this pull request as ready for review December 20, 2025 14:48
@neubig neubig requested a review from enyst December 20, 2025 14:48
@neubig neubig enabled auto-merge (squash) December 20, 2025 20:40
@enyst
Copy link
Collaborator

enyst commented Dec 21, 2025

@OpenHands Help me with reviewing this PR. It adds an example to examples/ : I need you to run this example script in your environment and see if it works, and if it worked with gemini tools indeed, not with default tools.

First, add another message from the user to the script, to tell the agent to delete that file it created.

Then try to run it with the gemini key in your env (and no base url); if that doesn't work for some reason, tweak it to use the litellm proxy with the key you have for that (and base url https://llm-proxy.eval.all-hands.dev); you can find out the correct model name prefix on the proxy. (maybe gemini/, maybe litellm_proxy/ , maybe no prefix just the model name?)

Make sure telemetry is enabled and you save logs (see llm.py); then verify the logs. Show me snippets from the logs that show what tool calls exactly did the agent make during the run. Post results.

@openhands-ai
Copy link

openhands-ai bot commented Dec 21, 2025

I'm on it! enyst can track my progress at all-hands.dev

…ni file tools example

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Dec 21, 2025

Here’s a concise summary of what I did and verified.

Changes made (and pushed)

  • File changed: examples/01_standalone_sdk/30_gemini_file_tools.py
    • Added a second user message to delete the created file:
      conversation.send_message("Now delete the FACTS.txt file you just created.")
    • Enabled telemetry logging:
      LLM(..., log_completions=True)
  • Commit on branch openhands/implement-gemini-file-editor:
    • examples: add deletion follow-up message and enable telemetry in gemini file tools example
    • Co-authored-by: openhands openhands@all-hands.dev
  • Scope: only this file was modified; changes are minimal and directly address the request.

Execution and verification

  • Run 1: Direct Gemini (GEMINI_API_KEY, no base URL)
    • Model: gemini/gemini-3-pro-preview
    • Outcome: The agent looped on list_directory and did not complete the write/delete workflow. Telemetry logs saved.
  • Run 2: LiteLLM proxy (LITELLM_API_KEY + base URL)
    • Model and base URL used:
    • Outcome: Success. The agent:
      • Wrote FACTS.txt (with 3 facts about the project)
      • Deleted FACTS.txt via terminal (rm FACTS.txt)
      • Verified deletion (ls FACTS.txt -> not found)
    • Confirmed model prefix via the proxy’s /v1/model/info endpoint; litellm_proxy/ prefix is correct.

Telemetry and tool usage verification

  • Telemetry logging enabled and saved to logs/completions/
  • Loaded tools at runtime:
    • ['terminal', 'read_file', 'write_file', 'edit', 'list_directory']
  • Tool calls captured in logs for the successful run included:
    • ["list_directory"]
    • ["read_file"]
    • ["write_file"]
    • ["terminal"]
    • ["finish"]
  • Example snippet showing explicit gemini tool use (write_file):
    • model: litellm_proxy/gemini-3-pro-preview
    • tool_calls:
      • name: write_file
      • args:
        {"content": "1. The OpenHands Software Agent SDK is a set of Python and REST APIs for building agents that work with code.\n2. Agents can use the local machine as their workspace or run inside ephemeral workspaces using the Agent Server.\n3. It is the engine behind the OpenHands CLI and OpenHands Cloud.\n",
        "security_risk": "LOW",
        "file_path": "FACTS.txt"}
  • No usage of the default claude-style file_editor was observed.

Checklist against your instructions

  • Add a message instructing deletion of the created file: done.
  • Try with Gemini key (no base URL): done; it didn’t complete the task (looped on list_directory).
  • If that fails, switch to LiteLLM proxy with key and base URL; discover correct model prefix: done. Used litellm_proxy/gemini/gemini-3-pro-preview and https://llm-proxy.eval.all-hands.dev.
  • Ensure telemetry is enabled and save logs: done.
  • Verify logs and show snippets of actual tool calls: done (included above).
  • Confirm it worked with gemini tools, not default tools: verified via loaded tools and tool_calls in logs.

If you want me to change the default example to use the proxy by default (instead of relying on plain Gemini settings), let me know and I’ll update the example to read proxy settings from env vars.

View full conversation

@enyst
Copy link
Collaborator

enyst commented Dec 21, 2025

HUMAN: how did it loop?

OpenHands-GPT-5:

It got stuck repeatedly calling the same tool with the same arguments, without progressing to the write/delete steps.

What happened

  • The model issued multiple consecutive ListDirectoryAction calls with dir_path="." and recursive=False.
  • Each call returned the same directory listing observation.
  • The model then asked for list_directory again instead of moving on to write_file or terminal.
  • After several such action→observation cycles, the stuck detector triggered and the run stopped.

Evidence from the run

  • Console trace showed repeated “Agent Action: ListDirectoryAction … dir_path: '.' … recursive: False” three times in a row, followed by:
    • WARNING openhands.sdk.conversation.stuck_detector: Action, Observation loop detected
    • WARNING openhands.sdk.conversation.impl.local_conversation: Stuck pattern detected.
  • Telemetry logs for the direct Gemini run (no base URL) all recorded the same tool call:
    • gemini__gemini-3-pro-preview-1766284619.592-91aa.json → "list_directory"
    • gemini__gemini-3-pro-preview-1766284621.939-4cfc.json → "list_directory"
    • gemini__gemini-3-pro-preview-1766284624.238-efa3.json → "list_directory"
    • gemini__gemini-3-pro-preview-1766284626.609-1f46.json → "list_directory"

In short, it looped by repeatedly selecting the list_directory tool with identical inputs, leading the stuck detector to halt the run.

…ate then delete)

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator

enyst commented Dec 21, 2025

HUMAN: 😅

OK, I debugged this for a while, and it's very weird. Three times in a row, it got stuck, doing nothing. But then it just worked. Nothing in the logs, nothing happened, it just returned the same tool call at first (getting stuck), and then it didn't.

I think it has nothing to do with the PR. The new tools are active and executed normally.

Notes for later:

  • only AIStudio Gemini got stuck; Vertex / proxy works as expected
  • when it got stuck, the logs showed empty thinking_blocks every tool call; when it succeeded, the logs show some empty thinking_blocks, but some tool calls don't have them at all (?)
  • Vertex logs show some non-empty thinking_blocks, even if it's just the tool call repeated, the LLM is using those.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this, it will be awesome to have at our disposal, and everyone else's!

I'd love to know if @xingyaoww has concerns here, this is typically of interest to him. Also I'm not fully sure he'll like the preset with model name (get_gemini_agent).., but maybe we could name it differently? I feel like maybe it's nice from client code perspective, to have an obvious function to just call... idk.

@neubig neubig merged commit 1134974 into main Dec 21, 2025
21 of 22 checks passed
@neubig neubig deleted the openhands/implement-gemini-file-editor branch December 21, 2025 04:10
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inspect gemini-cli and implement gemini-style file editing tools

3 participants