-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Added FileEditAction to enable edits using diff format. #3777
base: main
Are you sure you want to change the base?
Conversation
Hey Raj,
These are just first questions/thoughts. Do you have other things still on a to-do list (besides integration test mocks 😬) ? |
Valid question @tobitege, I also thought so initially. The primary motive here is to get the agent to produce the edit in a diff-format similar to aider (simply because LLMs are trained to do that better). This means we need to modify the prompt and get the agent to output the diff wrapped in <execute_edit> tag. @xingyaoww Can probably explain this better? |
Oh, no, for that the "user_prompt.j2" template (codeact_agent folder) and the docstrings in the file_ops.py file are used for. :) |
You might take a peek at the PromptManager (its a rather recent addition): Also, don't worry about the integration tests for now. We can regenerate the mock files for you when the time is right. |
@tobitege Yeah, the main difference between this PR and our current If this |
Alright, sounds like it should help prevent a lot of issues this way. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation overall LGTM! - I'd be interested to see if we can improve the AiderBench score on this. @RajWorking LMK if you need more credits!
Once we can get some observable performance improvement on AiderBench, I can run this on SWE-Bench too :D
'edit_file_by_replace', | ||
'insert_content_at_line', | ||
'append_file', | ||
] ## DISABLED TEMPORARILY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also consider disable things like create_file
and include some instruction in the prompt, telling the LLM that you can also create new files using EditAction
(if this is true ofc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good idea, maybe we can do that in a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - i think you can experiment with this when you are running benchmarks (e.g., try this and see if it increases score)
@RajWorking if you'd update the branch (either clicking the button or merge main), I could run the regeneration of the integration files for you, just let me know. :) |
@RajWorking Can you plot before vs. after? Are you using testcases? Is there a change in average number of "turns" / average cost? |
Probably also a good time to bump the version of CodeAct to |
@RajWorking We only got 74/300 on this branch, using the exact same protocol i did in this PR which got us 89/300. I'm going to re-run eval on the main branch now and investigate whether this degradation comes from the recent In the meantime, feel free to check this eval output file for weird cases / spaces for improvements: claude-3-5-sonnet@20240620_maxiter_30_N_v1.10-no-hint.zip this |
|
Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG
We want to switch from our current agenskills style (similar to https://aider.chat/docs/benchmarks.html#diff-func) to Aider's diff block (https://aider.chat/docs/benchmarks.html#diff) and how that would improve the editing performance.
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR adds a new EditAction, with logic to parse these output in diff format in the CodeActParser (translating the diff into EditAction).
The EditAction is then executed inside the runtime (for now, we simply parse the diff format into search and replace blocks and manually call the existing agentskills to perform the edit)
We use the <execute_edit> tags to enclose the diff format. For example,
Link of any specific issues this addresses
#3650