-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few issues while trying to rerun swe-bench-lite with aider #1
Comments
Reading more about the SWE-Bench-docker, I'd recommend adding to the Installation section (possibly with some disclaimers) something like:
|
Ok, Recommend to add
|
With that, benchmark is running, will report back after repro is done. |
Run killed partway though (due to oai limit errors), trying to run
After dealing with all of these, report.py completes, will mention that in separate comment. |
I fully agree with @daniel-vainsencher; running this repo is not that smooth. BTW, many thanks to Daniel's running log, it's really helpful. |
@RenzeLou I appreciate it. BTW, my report ended up looking very bad, because for many instances the logs are missing. Looking back, there is some issue writing the logs to |
I have gone through the testing scripts of this repo, it basically uses the However, I didn't find any @paul-gauthier Could you answer these questions? Where the I would very much appreciate it if you could help on this issue. |
The workflow for working with SWE Bench in general is 2 steps:
This repo is for running and evaluating aider on SWE Bench. As described in the README, it consists of 2 scripts:
Let me know if that was helpful? |
Thanks for your reply @paul-gauthier! I am running the SWE-bench Lite, where I think I have correctly set the dataset ( After several instances had been predicted by Aider (I didn't run the full Lite bench), I ran the Here is the info printed on the terminal:
I don't know what's going on. Could you provide any hints on this issue? Thanks so much. |
@RenzeLou I would start from |
@RenzeLou another thing: keep in mind that this is a very new repo, and that the task it attempts to do closely coordinates 4 repos (not counting mere libraries):
This is a complicated integration piece, so it starting out imperfect is totally understandable. This is not for the feint of heart, its still a bit the wild west. If you are not ready to do serious debugging, I'd recommend waiting and seeing if they stabilize. |
@paul-gauthier two things:
|
For posterity: part of the issue I was encountering with permissions is probably because I was using rootless docker, but I haven't completely resolved them. Switching to rootless podman solved some problems, created others :/ |
pip install requirements.txt
, without mention of creating a venv or other kind of environment. People should take care to avoid this, but better to smooth it.just_devin_570
(which I didn't want to do, since running lite) is turned on and used in two ways:The text was updated successfully, but these errors were encountered: