-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SWE-agent tools as sandbox plugins #1305
Conversation
So we actually had started taking a different approach here: #846 The idea was to convert all of SWE-agent's commands into Actions. We already have support for most of them, with a little translation. IMO this will be important for an actual SWE-agent integration into OpenDevin--if we just expose these commands as bash commands, we don't get any structured data about e.g. what files are being edited and how. I suppose one approach would be to merge this PR as-is, but have the agent implementation eventually intercept |
ssh_box.init_plugins([JupyterRequirement(), SWEAgentCommandsRequirement()]) | ||
logger.info( | ||
'--- SWE-AGENT COMMAND DOCUMENTATION ---\n' | ||
f'{SWEAgentCommandsRequirement().documentation}\n' | ||
'---' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should consider pulling this main
logic into sandbox.py, and making both sandbox type and the plugin list a command-line arg!
@rbren, I totally understand your concern about translating these commands into structured data - however, I kinda don't want to constraint ourselves to those list of SWE-Agent actions, which may grow and become unmanagable at some point. What we quantitatively verified (across 10+ LLMs) in the CodeAct paper is that:
The assumed benefit of structured Action for tracking file edits might not be valid. However, at the end of the day, each agent's implementation can (and should) be completely isolated. I actually prefer that we maintain a very small set of "core" Actions well (e.g., BashRun, PythonRun). And if there's really a need for structured Action, we can starts an "auxiliary action library folder" (or |
Some good points here! I'm in the process of reading your paper. One of my main goals for OpenDevin is to provide users with a transparent window into what the agent is doing. That way users have a sense of control and certainty. If the agent is just running in a black box, the user will worry that it's doing the wrong thing, wasting time, or even doing something harmful. A black box also makes it harder to measure and debug failures. The best way to provide visibility is to put as much of the agent's behavior as possible into structured data. That doesn't mean everything needs to be structured data though! We definitely need to give the agent the chance to run arbitrary bash/python/etc as it goes about its task. To put it succinctly: actions should be structured as much as possible without constraining the agent.
This is what I was getting at here--we can give the agent access to all the raw SWE-agent bash commands like The Action will have the exact same effect as the bash, so far as the agent can tell. But it lets us provide a lot more feedback to the user as to what the agent was doing.
Totally agree we can't use read/write Actions as a source-of-truth about file changes. Instead, we should only treat read/write Actions as a partial record.
Someone brought up the idea of basically archiving the whole workspace at every step, so you can fast-forward and rewind! Listening to
Completely agree here. I don't think we'll add more than a few beyond what we have today.
I could be convinced otherwise, but I don't even want to maintain auxiliary/contrib actions. Agents can create pseudo-actions inside their implementation, and translate them into core Actions. Maybe if we see the same pseudo-actions getting used over and over we should explore some kind of auxiliary angle. |
I like this one! It is just like the one in Devin's demo video, where you can scroll it forward and backward to check the agent's action.
Exactly! I like this approach. I only mentioned
If we want the user to feel sense of control and certainty, we can:
For now, i think the most easy way would be (1) and (2) above. I'm not against getting structured output info for display, as long as that parsing process itself does not constraints/limit the agent in anyway (i.e., it is just for user's information).
So how about we keep this PR as is and let agent have access to the raw bash command, and we starts a new PR and add a field (e.g., This way, we can get away with only keeping these core Actions without expanding them, yet having ways to get structured info out of them for sense of control. In the future, it is also possible to add a very fast, cheap smaller LLM specialized in interpreting commands to |
add missing _split_string
I did this in an early agent I wrote! It works OK but kind of pollutes the git log. Probably something we could explore though...git is super powerful.
I'm not sure about this--let me think more before we go down this path
This is a really neat idea! |
This definitely makes for a more powerful agent, so let's get it in, and then we can figure out the observability/structure piece later |
Based on sandbox plugins introduced in #1255, this PR adds command line tools from SWE-Agent as a plugin that can be initialized and available for all agents.
This PR also slightly adds some lightweight dependencies that the SWE agent requires, which is pretty inconvenient and slow (we need to re-run apt-get update and then install) to add post hoc.
Here's a demo by running
python3 opendevin/sandbox/docker/ssh_box.py
:Be sure to
docker pull ghcr.io/opendevin/sandbox:xw-swe-agent-tool-plugins
and use this image for testing.