Add tool-using NavigatorCoder (/navigator mode), akin to Claude Code#3781
Add tool-using NavigatorCoder (/navigator mode), akin to Claude Code#3781tekacs wants to merge 63 commits intoAider-AI:mainfrom
Conversation
35b3995 to
ddbbdd3
Compare
0b4e50e to
9dd4769
Compare
|
haven't tried yet, but this sounds really cool, thank you for your effort! |
|
I noticed that with I wondered if something like a Or maybe a new config parameter like |
Oh, |
The issue is that Prior to reading the source, I was hoping that the |
|
Great suggestions @Dima-369, thank you. I've considered generating the prompt from the tools and that'll probably make it in soon. What I'm currently quite stuck on is figuring out why I seemingly can't get Sonnet to cache prefixes of prompts reliably, despite efforts to get things in the right order, with seemingly solid Frankly... I can't seem to get it to work in general, even in This causes relatively runaway token usage, so I'm still investigating. If anyone stumbles across this comment and has any insight, it would be appreciated! |
|
@tekacs thanks for this massive amount of work!!! |
Glad to hear it! It's definitely addressable, although the way that I've coded this PR is deliberately light on touching some base code in Aider, to make things more merge-able (although still some work required for that). Something that I think would make it much easier to interact with non-Git files that I've introduced in this PR is a tool that allows the LLM to undo changes that it's made. This makes it less risky for the LLM to perform edits outside of the immediate Git repo. But I think that teaching Aider to wrangle multiple Git repos at once would be a pretty hefty change to how the base works. That would be better for a different PR. Within the confines of this one, if you want to give it a try, I would suggest asking an LLM to add a tool or two under Once this stabilizes, I can certainly give it a go, but you might be able to take my comment and feed it to Navigator (I'd use Gemini) or to Claude Code and have it generate a good working prototype for you. |
I experimented with Regarding your comment about I also briefly tested Navigator and was impressed with the experience! The automated processes add minimal overhead, which was a pleasant surprise - especially coming from a CLI background. One note: currently, running |
Hullo! I'm glad that it's felt smooth! If you look at the screenshot below, you'll notice that the On caching -- yeah, the base repo seems to put the repo map up top, which... maybe is worth fiddling with. After some experimentation, I did find that I was able to get some caching to work, but I still haven't fully figured out its dynamics. In particular I want to make sure that we don't send too many un-cached tokens Sonnet's way, since they rate limit based on un-cached amount. Speaking of CLI, with the release of https://github.com/openai/codex, I notice that they use a lot of CLI tools directly in place of some tools that we use here. While I could imagine that causing divergence across platforms and across models, I wonder if that's a viable option to boost some models' performance? |
I noticed that you are using Claude from the OpenRouter provider. I remember it uses the OpenAI format. Unlike Anthropic, it only supports caching of the text part and does not support caching of the function call part. |
|
Note: there is another open PR which adds MCP support here. Perhaps this PR should use the MCP functionality from that PR? |
|
Guys, this is the best feature Aider has; why not prioritize it over other things? The only reason I stopped using this project was because this feature was missing. |
|
Just wanted to jump in and say – thanks for taking some initiative here, @dwash96 – and thanks to folks who've been trying this out. Please do let me know any issues that y'all run into – in the background I've been throwing together another full-featured coding agent (as a library – will be at tekacs/libca) in Rust (non-commercially / not for my gain – my actual startup does something totally unrelated). But – I have a bunch of queued work to improve navigator mode and I'm happy to jump in and work on it if there are any particular questions or considerations any of you have run into. |
|
I'd love to see this merged. Curious on why there are 170+ PRs open (being that aider is being mostly written by aider, maybe same could be done for PR reviews)? Would love to see aider continue to be the top dawg, but the industry is moving fast and if this PR bottleneck doesn't get resolved I'd hate to see aider get left in the dust... |




One more PR, this one simultaneously very very complete and yet speculative.
In case anyone wants to try this out, I uploaded it to PyPI as
navigator-mode, until (and of course if!) the PR is accepted. By I, I mean that it uploaded itself. You can see the session where it did that here: https://asciinema.org/a/9JtT7DKIRrtpylhUts0lr3EfYSummary
This PR adds a
NavigatorCoder(started by--navigatorand/navigator) to aider, which is able to:/granular-editingturned on, edit files using fine-grained edits through tool use, with feedback diffs sent to the LLM and the ability for the LLM to undo!Compatibility
I've tried very hard to keep changes isolated to just
navigator_coder.py,navigator_prompts.pyandaider/tools. There is currently the context management extension which adjustsBaseCoder, but it's very much optional, could be moved down toNavigatorCoderif that were preferable.Quirks
Cost-optimization
I'm relatively confident that switching away from the standard aider-style mode of sending files with every commit would result in meaningful token- and therefore cost-savings, but the models seem to behave with meaningfully more intelligence when we can keep sending files in full.
I wonder if there might be room for a world in which the models only see that which they just asked for at any given time... or reduced chat history, or reduced use of reflected messages, or similar. For now I've gone for compatibility and highest-intelligence over cost savings as the default mode of operation. Perhaps it could be a toggle-able setting?
I had briefly integrated functionality to decay files not-in-use over time, as well as to reduce files exposed to the LLM, but my concern is that this may cause divergence from a task over a long series of steps in some cases.
Tools
Tools are stored in
aider/toolsand at the time of writing, I'm currently in the process of porting more of them to usetool_utils.py(without which they're somewhat duplicative). That's going very well, though.This is best used with #3778
... because in some (rare) cases it will add and then remove large numbers of files
I added /context-management
This is a toggle-able truncation of large files, only enabled for Navigator by default. It sends a subset of the file, allowing the LLM to sort of see the file contents, but with less overwhelm. That way it has the chance to look at them and then remove them, for example.
Tool calling format and a (current) lack of output elision
I tried for a while to use XML-style tool calling tags, but they cause
rich/mdstreamto emit a lot of blank lines and the tags are stripped, leaving garbage in the output. Despite multiple attempts to suppress streaming or final output, I was unable (so far) to achieve something like 'hide everything inside a <function_calls> tag' from the output whilst also streaming.As a result, I went with the other common tool call syntax of
[...], using the very explicit[tool_call(Cmd, keyword_arg="something", other_arg="something")]to make false positives almost impossible and parsing simple and fast.Tool calls are implemented very robustly, with a parens-scanning mini-parser and Python's
ast.parse, so you can for example edit lines mentioning tool calls with tool calls, or SEARCH/REPLACE blocks with tool calls, etc.SEARCH/REPLACE editing is stolen from the main editor
Editing seems relatively un-abstracted in the codebase right now, so Navigator uses a duplicate of the editing logic for that portion of things.
As I've started to add a /lot/ of granular tool-based editing, this is also de-emphasized and used only as a fallback.
Tests
I usually add tests to PRs, but I've avoided them here for now, both because it's been changing a great deal and because looped LLM calls probably deserve attention on testing.
I've been able to very successfully get Navigator to test itself, though! You can see a session here, where it looped through every tool in sequence, running it, checking that the file had been appropriately changed and then restoring changes with a command:
https://asciinema.org/a/eqnvZ57O7nVWkpc15NYO1tMHx
Bottom line
Having used Claude Code a great deal, this... behaves very much similarly and feels complete with the possible absence of #3672. I really appreciate that this is so relatively-straightforwardly possible, given how aider is implemented! Thank you again for producing aider as open source!
As mentioned above, this PR is very much speculative -- I'm using and developing it (with itself!) continuously, but I imagine that you may have your own plans or suggestions for this sort of functionality, so I very much defer to you!
Example sessions
Publishing self to PyPI and then later updating:
https://asciinema.org/a/9JtT7DKIRrtpylhUts0lr3EfY
https://gist.github.com/tekacs/b92d508a06b8f802611b00d1529c3907
Testing out all the tools:
https://asciinema.org/a/eqnvZ57O7nVWkpc15NYO1tMHx
Looking for bugs in its own editing tools:
https://gist.github.com/tekacs/c07454aec86b1e312cf03ece4e68e5a9