Automatic benchmarking of gpt-engineer with swe-bench #913

AntonOsika · 2023-12-18T14:42:18Z

Feature description

We have a way to easily add benchmarks:

https://www.loom.com/share/206805143fbb4302b5455a5329eaab17?sid=f689608f-8e49-44f7-b55f-4c81e9dc93e6

This issue is about looking into if swe-bench is a good benchmark to add and then add a simple version of it.

ErikBjare · 2024-03-13T10:19:23Z

Tempted to prioritize this higher after the Devin announcement (just as @batwood001 in #1062).

viborc · 2024-03-13T10:24:45Z

Makes sense. Let's figure it out this Thursday at our tech planning meeting and the availability of people.

Mohit-Dhawan98 · 2024-03-28T18:53:19Z

@viborc can you assign this to me?

viborc · 2024-03-28T18:54:05Z

@viborc can you assign this to me?

Done!

viborc · 2024-05-04T09:43:24Z

This is more of a general update to the community than anything else. The work on this issue is ongoing, and @Mohit-Dhawan98 is working on it with @ATheorell's support. We'll likely have SWE bench support in the near future!

viborc · 2024-07-18T17:52:10Z

Someone from the OpenDevin suggested we might look into their work here and possibly learn from it and re-use if needed. Putting this here for our reference: https://github.com/OpenDevin/OpenDevin/tree/main/evaluation/swe_bench

AntonOsika added enhancement New feature or request triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. labels Dec 18, 2023

AntonOsika changed the title ~~Automatic benchmarking of gpt-engineer with [swe-bench](https://github.com/princeton-nlp/SWE-bench)~~ Automatic benchmarking of gpt-engineer with swe-bench Dec 18, 2023

viborc added this to gpt-engineer roadmap Feb 8, 2024

viborc moved this to Todo in gpt-engineer roadmap Feb 8, 2024

ErikBjare mentioned this issue Mar 13, 2024

Run against SWE-Bench #1062

Closed

AntonOsika removed the triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. label Mar 14, 2024

viborc moved this from Todo to In Progress in gpt-engineer roadmap Mar 28, 2024

viborc assigned Mohit-Dhawan98 Mar 28, 2024

ATheorell mentioned this issue Apr 21, 2024

Bench config #1126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic benchmarking of gpt-engineer with swe-bench #913

Automatic benchmarking of gpt-engineer with swe-bench #913

AntonOsika commented Dec 18, 2023 •

edited

Loading

ErikBjare commented Mar 13, 2024

viborc commented Mar 13, 2024

Mohit-Dhawan98 commented Mar 28, 2024

viborc commented Mar 28, 2024

viborc commented May 4, 2024

viborc commented Jul 18, 2024

Automatic benchmarking of gpt-engineer with swe-bench #913

Automatic benchmarking of gpt-engineer with swe-bench #913

Comments

AntonOsika commented Dec 18, 2023 • edited Loading

Feature description

ErikBjare commented Mar 13, 2024

viborc commented Mar 13, 2024

Mohit-Dhawan98 commented Mar 28, 2024

viborc commented Mar 28, 2024

viborc commented May 4, 2024

viborc commented Jul 18, 2024

AntonOsika commented Dec 18, 2023 •

edited

Loading