Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #69

Merged
merged 2 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Code and data for our ICLR 2024 paper <a href="http://swe-bench.github.io/paper.
</a>
</p>

Please refer our [website](http://swe-bench.github.io) for the public leaderboard and the [change log](https://github.com/princeton-nlp/SWE-bench/blob/master/CHANGELOG.md) for information on the latest updates to the SWE-bench benchmark.
Please refer our [website](http://swe-bench.github.io) for the public leaderboard and the [change log](https://github.com/princeton-nlp/SWE-bench/blob/main/CHANGELOG.md) for information on the latest updates to the SWE-bench benchmark.

## 👋 Overview
SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub.
Expand All @@ -44,9 +44,9 @@ You can download the SWE-bench dataset directly ([dev](https://drive.google.com/

To use SWE-Bench, you can:
* Train your own models on our pre-processed datasets
* Run [inference](https://github.com/princeton-nlp/SWE-bench/blob/master/inference/) on existing models (either models you have on-disk like LLaMA, or models you have access to through an API like GPT-4). The inference step is where you get a repo and an issue and have the model try to generate a fix for it.
* [Evaluate](https://github.com/princeton-nlp/SWE-bench/blob/master/harness/) models against SWE-bench. This is where you take a SWE-Bench task and a model-proposed solution and evaluate its correctness.
* Run SWE-bench's [data collection procedure](https://github.com/princeton-nlp/SWE-bench/blob/master/collect/) on your own repositories, to make new SWE-Bench tasks.
* Run [inference](https://github.com/princeton-nlp/SWE-bench/blob/main/inference/) on existing models (either models you have on-disk like LLaMA, or models you have access to through an API like GPT-4). The inference step is where you get a repo and an issue and have the model try to generate a fix for it.
* [Evaluate](https://github.com/princeton-nlp/SWE-bench/blob/main/swebench/harness/) models against SWE-bench. This is where you take a SWE-Bench task and a model-proposed solution and evaluate its correctness.
* Run SWE-bench's [data collection procedure](https://github.com/princeton-nlp/SWE-bench/blob/main/swebench/collect/) on your own repositories, to make new SWE-Bench tasks.

## ⬇️ Downloads
| Datasets | Models |
Expand Down
2 changes: 1 addition & 1 deletion swebench/collect/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ We include a comprehensive [tutorial](https://github.com/princeton-nlp/SWE-bench

> SWE-bench's collection pipeline is currently designed to target PyPI packages. We hope to expand SWE-bench to more repositories and languages in the future.

<img src="../assets/collection.png">
<img src="../../assets/collection.png">

## Collection Procedure
To run collection on your own repositories, run the `run_get_tasks_pipeline.sh` script. Given a repository or list of repositories (formatted as `owner/name`), for each repository this command will generate...
Expand Down
4 changes: 2 additions & 2 deletions swebench/harness/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The `engine_evaluation.py` and `run_evaluation.py` code is used for evaluating m

The evaluation script generally performs the following steps:

![evaluation](../assets/evaluation.png)
![evaluation](../../assets/evaluation.png)

the `run_evaluation.py` script is invoked using the `./run_evaluation.sh` script with the following arguments:
```
Expand All @@ -35,7 +35,7 @@ In the context of the collection pipeline, you should use this script after

The validation script generally performs the following steps:

![validation](../assets/validation.png)
![validation](../../assets/validation.png)

The `engine_validation.py` script is invoked using the `./run_validation.sh` script with the following arguments:
```
Expand Down
4 changes: 2 additions & 2 deletions tutorials/validation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
"source": [
"import glob, json, os, sys\n",
"\n",
"sys.path.append('/path/to/metrics/') # TODO: Replace with path to `SWE-bench/metrics` folder\n",
"sys.path.append('/path/to/metrics/') # TODO: Replace with path to `SWE-bench/swebench/metrics` folder\n",
"from conversion import convert_log_to_ground_truth\n",
"from getters import get_logs_gold\n",
"from monitor import monitor_validation, monitor_logs_same_diff\n",
"sys.path = sys.path[:-1]\n",
"\n",
"sys.path.append('/path/to/harness/') # TODO: Replace with path to `SWE-bench/harness` folder\n",
"sys.path.append('/path/to/harness/') # TODO: Replace with path to `SWE-bench/swebench/harness` folder\n",
"from utils import has_attribute_or_import_error\n",
"sys.path = sys.path[:-1]"
]
Expand Down