[Feat] Added FileEditAction to enable edits using diff format. #3777

RajWorking · 2024-09-08T17:33:01Z

Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG
We want to switch from our current agenskills style (similar to https://aider.chat/docs/benchmarks.html#diff-func) to Aider's diff block (https://aider.chat/docs/benchmarks.html#diff) and how that would improve the editing performance.

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR adds a new EditAction, with logic to parse these output in diff format in the CodeActParser (translating the diff into EditAction).
The EditAction is then executed inside the runtime (for now, we simply parse the diff format into search and replace blocks and manually call the existing agentskills to perform the edit)

We use the <execute_edit> tags to enclose the diff format. For example,

<execute_edit>
demo.py
<<<<<<< SEARCH
    print("hello")
=======
    print("goodbye")
>>>>>>> REPLACE
</execute_edit>

Link of any specific issues this addresses
#3650

tobitege · 2024-09-08T18:04:27Z

Hey Raj,
thanks so much for taking this on!
Got a few questions, though:

What is the reasoning behind moving this into an action/observation instead of keeping it in the agentskills? There seems to be a lot of overhead for a comparatively small amount of added lines that implement the action.
The actual action then still uses the same agentskill methods to do the actual operation. What's the difference here to how Aider does its implementation?
One problem with having it in the client runtime is that it then also needs to be ported into the remote runtime.

These are just first questions/thoughts. Do you have other things still on a to-do list (besides integration test mocks 😬) ?

RajWorking · 2024-09-08T18:14:52Z

Valid question @tobitege, I also thought so initially.

The primary motive here is to get the agent to produce the edit in a diff-format similar to aider (simply because LLMs are trained to do that better). This means we need to modify the prompt and get the agent to output the diff wrapped in <execute_edit> tag.
From there on, actions/observation is the only way to actuate the edit using the runtime.

@xingyaoww Can probably explain this better?

tobitege · 2024-09-08T18:20:30Z

Valid question @tobitege, I also thought so initially.

The primary motive here is to get the agent to produce the edit in a diff-format similar to aider (simply because LLMs are trained to do that better). This means we need to modify the prompt and get the agent to output the diff wrapped in <execute_edit> tag. From there on, actions/observation is the only way to actuate the edit using the runtime.

@xingyaoww Can probably explain this better?

Oh, no, for that the "user_prompt.j2" template (codeact_agent folder) and the docstrings in the file_ops.py file are used for. :)

tobitege · 2024-09-08T18:26:09Z

You might take a peek at the PromptManager (its a rather recent addition):
https://github.com/All-Hands-AI/OpenHands/blob/5100d12cea2cd35c30a22e25fbac376b72ed0981/openhands/utils/prompt.py

Also, don't worry about the integration tests for now. We can regenerate the mock files for you when the time is right.

xingyaoww · 2024-09-09T02:07:05Z

The actual action then still uses the same agentskill methods to do the actual operation. What's the difference here to how Aider does its implementation?

@tobitege Yeah, the main difference between this PR and our current agentskill implementation is that we expose an editing interface to the agent that does not require explicit string escape (e.g., you would need to escape strings when calling Python functions from agentskills - which causes weird bugs). Requiring LLM to return stuff using JSON is also shown to be pretty problematic due to the need to escape stuff (https://aider.chat/2024/08/14/code-in-json.html).

If this diff edit format worked (improved performance), we can implement this action directly on the runtime client's side like FileRead and FileWrite. If it didn't work - I'd say we don't bother.

tobitege · 2024-09-09T03:42:03Z

@tobitege Yeah, the main difference between this PR and our current agentskill implementation is that we expose an editing interface to the agent that does not require explicit string escape (e.g., you would need to escape strings when calling Python functions from agentskills - which causes weird bugs). Requiring LLM to return stuff using JSON is also shown to be pretty problematic due to the need to escape stuff (https://aider.chat/2024/08/14/code-in-json.html).

Alright, sounds like it should help prevent a lot of issues this way. 👍

xingyaoww

This implementation overall LGTM! - I'd be interested to see if we can improve the AiderBench score on this. @RajWorking LMK if you need more credits!

Once we can get some observable performance improvement on AiderBench, I can run this on SWE-Bench too :D

xingyaoww · 2024-09-09T17:55:59Z

openhands/runtime/plugins/agent_skills/agentskills.py

+ 'edit_file_by_replace',
+ 'insert_content_at_line',
+ 'append_file',
+] ## DISABLED TEMPORARILY.


You might also consider disable things like create_file and include some instruction in the prompt, telling the LLM that you can also create new files using EditAction (if this is true ofc)

Thats a good idea, maybe we can do that in a subsequent PR?

Yeah - i think you can experiment with this when you are running benchmarks (e.g., try this and see if it increases score)

tobitege · 2024-09-09T19:02:02Z

@RajWorking if you'd update the branch (either clicking the button or merge main), I could run the regeneration of the integration files for you, just let me know. :)

RajWorking · 2024-09-10T09:37:33Z

Performance on the Aider Bench:
Slight improvement: (Number of passed tests: 84/133)

agenthub/codeact_agent/action_parser.py

openhands/runtime/client/client.py

xingyaoww · 2024-09-10T18:33:29Z

@RajWorking Can you plot before vs. after? Are you using testcases? Is there a change in average number of "turns" / average cost?

xingyaoww · 2024-09-12T15:07:28Z

Once #3829 and #3836 are merged. I'll start an SWE-Bench Lite eval on this PR 😄

xingyaoww · 2024-09-12T15:08:21Z

Probably also a good time to bump the version of CodeAct to 1.10 since there's significant prompt changes - i can push a commit
https://github.com/All-Hands-AI/OpenHands/pull/3777/files#diff-9e2cd489ef2b4ad5f779049fd4a91abe7adea0e5b038fa2a898d1688451ae0eeL38

…iting

xingyaoww · 2024-09-14T14:53:55Z

@RajWorking We only got 74/300 on this branch, using the exact same protocol i did in this PR which got us 89/300. I'm going to re-run eval on the main branch now and investigate whether this degradation comes from the recent main branch or your editing.

In the meantime, feel free to check this eval output file for weird cases / spaces for improvements: claude-3-5-sonnet@20240620_maxiter_30_N_v1.10-no-hint.zip this

tobitege · 2024-09-17T20:00:17Z

I did try to get one instance (57) to a success, but failed after 8 attempts, in spite of trying a couple of tweaks.
ENABLE_AUTO_LINT=true seems not being implemented, which could help in providing a more meaningful error message to the LLM in case of syntax errors
One line I added to the system prompt for example:
SEARCH REQUIRES PROPER INDENTATION! If the assistant plans to add the line ' print(x)', it must fully write the line out, with ALL leading spaces before the code!
My first attempt was the worst as basically all observations had this "truncated" in the middle of it, even though I doubt it had more than 10K characters and I had no other limit specified.
Wrong indendation seems to be a major problem, almost by design and not just for this PR's approach:
if the LLM does manage to replace something, it is often not the full method and places something in the file, that doesn't "fit" with the code following it.
-> maybe we need to change prompts to only replace full methods or something to prevent this.
The regex for search is for above reasons, imo, way too strict. LLM's often skip inline comments or have other "coding styles" and requiring them to regurgitate the whole exact same code seems counterproductive, compared to just using line numbers - this is by design and not because of this PR
maybe we could try to use the same "fuzzy" search and replace from the existing agent skills for the actual operation

RajWorking added 3 commits September 8, 2024 17:25

[Feat] Added FileEditAction to enable edits using diff format.

d13a59e

minor bug fixes

b65a0f6

added new line to regex of diff blocks

7db75eb

Merge branch 'main' into agent-editing

c54eff3

xingyaoww reviewed Sep 9, 2024

View reviewed changes

RajWorking and others added 2 commits September 10, 2024 13:19

Merge branch 'All-Hands-AI:main' into agent-editing

b4851f8

Moved create_dataset to be used implicitly by EditAction.

b49a060

tobitege reviewed Sep 10, 2024

View reviewed changes

agenthub/codeact_agent/action_parser.py Show resolved Hide resolved

tobitege reviewed Sep 10, 2024

View reviewed changes

openhands/runtime/client/client.py Show resolved Hide resolved

xingyaoww added 6 commits September 12, 2024 15:08

bump codeact to 1.10

ef0515c

Merge branch 'main' into agent-editing

d52e1b1

Merge commit '78c5f58adc18a6d6390363192af4d63651fddc48' into agent-ed…

3f94425

…iting

increase timeout for swe entry

ed62015

improve multi-processing to avoid blocking in queue.get

cc56fda

Merge commit 'cc56fdaf92cac6550357f9af66b3afb7dc6ed37f' into agent-ed…

578a307

…iting

neubig added the eval-this label Sep 13, 2024

Updated ErrorObservation for EditAction

fad394e

xingyaoww mentioned this pull request Sep 17, 2024

(enh) AgentSkills: fix \n handling with file editing (iPython) #3904

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Added FileEditAction to enable edits using diff format. #3777

[Feat] Added FileEditAction to enable edits using diff format. #3777

RajWorking commented Sep 8, 2024 •

edited

Loading

tobitege commented Sep 8, 2024 •

edited

Loading

RajWorking commented Sep 8, 2024

tobitege commented Sep 8, 2024

tobitege commented Sep 8, 2024 •

edited

Loading

xingyaoww commented Sep 9, 2024

tobitege commented Sep 9, 2024

xingyaoww left a comment

xingyaoww Sep 9, 2024

RajWorking Sep 9, 2024

xingyaoww Sep 9, 2024

tobitege commented Sep 9, 2024

RajWorking commented Sep 10, 2024

xingyaoww commented Sep 10, 2024

xingyaoww commented Sep 12, 2024

xingyaoww commented Sep 12, 2024

xingyaoww commented Sep 14, 2024

tobitege commented Sep 17, 2024

[Feat] Added FileEditAction to enable edits using diff format. #3777

Are you sure you want to change the base?

[Feat] Added FileEditAction to enable edits using diff format. #3777

Conversation

RajWorking commented Sep 8, 2024 • edited Loading

tobitege commented Sep 8, 2024 • edited Loading

RajWorking commented Sep 8, 2024

tobitege commented Sep 8, 2024

tobitege commented Sep 8, 2024 • edited Loading

xingyaoww commented Sep 9, 2024

tobitege commented Sep 9, 2024

xingyaoww left a comment

Choose a reason for hiding this comment

xingyaoww Sep 9, 2024

Choose a reason for hiding this comment

RajWorking Sep 9, 2024

Choose a reason for hiding this comment

xingyaoww Sep 9, 2024

Choose a reason for hiding this comment

tobitege commented Sep 9, 2024

RajWorking commented Sep 10, 2024

xingyaoww commented Sep 10, 2024

xingyaoww commented Sep 12, 2024

xingyaoww commented Sep 12, 2024

xingyaoww commented Sep 14, 2024

tobitege commented Sep 17, 2024

RajWorking commented Sep 8, 2024 •

edited

Loading

tobitege commented Sep 8, 2024 •

edited

Loading

tobitege commented Sep 8, 2024 •

edited

Loading