Releases: logic-star-ai/swt-bench
Release 1.2.0 - Reproductions script mode
This release adds a "reproduction script" mode for SWT-Bench. In this mode (which was leveraged by i.e. AEGIS), the test is not required to fit into the unit test framework of the repository but can be a standalone script. We compute coverage delta as usual and count a non-zero exit code of the script as failing and a zero exit code as passing. In this setting, it is not possible to adversely affect other test cases in the framework.
What's Changed
- Add reproduction script mode by @nielstron in #20
Full Changelog: 1.1.0...1.2.0
Release 1.1.0 - SWT-Bench Verified
This release transfers a number of further patches that have been reported useful in SWE-Bench and adds support for SWT-Bench Verified, obtained with the same quality criteria as SWT-Bench Lite.
We released SWT-Bench Verified and published the three best performing methods of SWT-Bench Lite as baselines on our website.
What's Changed
- Fix sklearn constants by @nielstron in #11
- Reproduce docker image fixes (pinning versions) from SWE-Bench by @nielstron in #12
- Add leaderboard website and submission instructions by @nielstron in #14
- Add SWT-Verified by @nielstron in #18
Full Changelog: 1.0.1...1.1.0
Release 1.0.1 - Patch instances
What's Changed
- Fix building django images by @zyone1991 in #6
- Run install and test on several Python versions by @nielstron in #7
New Contributors
- @zyone1991 made their first contribution in #6
Full Changelog: 1.0.0...1.0.1
Release 1.0.0 - Initial Release
This version is the original code of the Neurips Published paper "SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents"
Full Changelog: https://github.com/logic-star-ai/swt-bench/commits/1.0.0