Skip to content

Releases: logic-star-ai/swt-bench

Release 1.2.0 - Reproductions script mode

06 Mar 08:43
5d6e00e
Compare
Choose a tag to compare

This release adds a "reproduction script" mode for SWT-Bench. In this mode (which was leveraged by i.e. AEGIS), the test is not required to fit into the unit test framework of the repository but can be a standalone script. We compute coverage delta as usual and count a non-zero exit code of the script as failing and a zero exit code as passing. In this setting, it is not possible to adversely affect other test cases in the framework.

What's Changed

Full Changelog: 1.1.0...1.2.0

Release 1.1.0 - SWT-Bench Verified

04 Mar 06:21
Compare
Choose a tag to compare

This release transfers a number of further patches that have been reported useful in SWE-Bench and adds support for SWT-Bench Verified, obtained with the same quality criteria as SWT-Bench Lite.

We released SWT-Bench Verified and published the three best performing methods of SWT-Bench Lite as baselines on our website.

What's Changed

Full Changelog: 1.0.1...1.1.0

Release 1.0.1 - Patch instances

16 Nov 10:46
f42d9fe
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.0.0...1.0.1

Release 1.0.0 - Initial Release

01 Nov 16:18
Compare
Choose a tag to compare

This version is the original code of the Neurips Published paper "SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents"

Full Changelog: https://github.com/logic-star-ai/swt-bench/commits/1.0.0