GitHub - r2e-project/r2e: r2e: turn any github repository into a programming agent environment

R2E (Repository to Environment) is a framework that turns any GitHub repository into an executable environment for evaluating static code generation models and programming agents at scale. It extracts functions and methods from the repository, generates and executes equivalence tests for them using LLMs, and creates an interactive execution environment. These environments can be used to evaluate the quality of LLM generated code.

Installation

Install uv to setup R2E.

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a venv, clone, and install.

git clone https://github.com/r2e-project/r2e.git
cd r2e && uv venv && source .venv/bin/activate
uv sync

Key Concept: Equivalence Test Harnesses

R2E introduces Equivalence Tests, a test that checks if two pieces of code are equivalent. In the context of R2E, an Equivalence Test checks if the output of a function/method in a repository is the same as the output of the generated code for the same function/method. Here's an overview of the key principles of R2E:

Principle	Description
Equivalence Tests	R2E generates tests that use the ground truth implementation to check equivalence, avoiding predicting test outputs.
Complete Harness	Generates harnesses with complete setup info (e.g., files, DB connections) instead of simple I/O examples.
Sliced Context	Leverages dependency slicing to provide minimal, relevant context, for test generation.
Coverage	Filters incorrect tests and assesses test quality using branch coverage.

Find more details and examples at r2e.dev and in our paper.

Usage

R2E provides a convenient CLI to work with. The usual steps are as follows: (1) setup and extract functions from repositories, (2) build and install repositories, and (3) generate and execute Equivalence Tests

Find the complete CLI documentation at CLI.md. Below is a quickstart guide:

1. Setup and Extract

First, choose a unique experiment id (e.g., quickstart) that you can reuse for the entire workflow. Then setup repositories and extract functions from:

r2e setup -r https://github.com/google-research/python-graphs
r2e extract -e quickstart --overwrite_extracted

Output

Cloning repository https://github.com/google-research/python-graphs
Repo Location: /home/user/buckets/local_repoeval_bucket/repos/
Setup completed successfully.

Result: /home/user/buckets/local_repoeval_bucket/repos

Extracting..: 100%|███████████████████████| 2/2 [00:00<00:00,  8.89it/s]
Extracted 18 functions and 53 methods
Extraction completed successfully.

Result: /home/user/buckets/r2e_bucket/extracted_data/quickstart_extracted.json

Note

We also support copying from a local path, or processing a list of URLs/local paths from a json file (cli docs).

During extraction all repos cloned into REPOS_DIR are processed. The extracted functions and methods are written to a JSON file. Use --overwrite_extracted to overwrite any existing results.

2. Build and Install

Docker Mode: By default, all repos in REPOS_DIR are installed in a Docker image for sandboxed execution. Find the generated dockerfile in REPOS_DIR. Useful reference: install docker

Local Mode: Use --local which will suggest the steps you need to take to manually to install repos.

r2e build -e quickstart

Output

Found 1 repositories in the repos directory.
Running in Docker mode.
Creating a dockerfile...
Dockerfile generated at:  /local_repoeval_bucket/repos/r2e_final_dockerfile.dockerfile
...

[+] Building 553.2s (16/16) FINISHED                                         docker:default
 => [internal] load build definition from r2e_final_dockerfile.dockerfile              0.0s
 => => transferring dockerfile: 2.52kB                                                 0.0s
 ...
 => exporting to image                                                                31.0s 
 => => exporting layers                                                               30.9s 
 => => writing image sha256:28d6f5751dfac6de9ccd883f0830cf8ac5c88e46df8bd7             0.0s 
 => => naming to docker.io/library/r2e:quickstart                                      0.0s

$ docker image ls
REPOSITORY   TAG          IMAGE ID       CREATED         SIZE
r2e          quickstart   28d6f5751dfa   4 minutes ago   10.1GB

3. Generate and Execute Tests

R2E provides a single command that runs a series of k generate-execute rounds w/ feedback. The loop continues until min_valid% functions reach a min_cov% branch coverage. Defaults: k=3, min_valid=0.8, and min_cov=0.8.

r2e genexec -e quickstart --save_chat

Output

Generating contexts: 100%|███████████████████████| 10/10 [00:03<00:00, 20.56it/s]

Starting round 1/3
100%|███████████████████████| 10/10 [00:13<00:00,  1.36s/it]
Loaded 10 functions under test
100%|███████████████████████| 10/10 [00:01<00:00,  5.74it/s]
Round 1 completed. Status: 0.20 good FUTs.

Starting round 2/3
100%|███████████████████████| 8/8 [00:20<00:00,  2.58s/it]
Loaded 8 functions under test
100%|███████████████████████| 8/8 [00:01<00:00,  4.66it/s]
Round 2 completed. Status: 0.60 good FUTs.

Starting round 3/3
100%|███████████████████████| 4/4 [00:13<00:00,  3.39s/it]
Loaded 4 functions under test
100%|███████████████████████| 4/4 [00:01<00:00,  2.41it/s]
Reached max rounds. Stopping at round 3

Result: /home/user/buckets/r2e_bucket/execution/quickstart_out.json

Note

You can also run r2e generate and r2e execute separately (cli docs).

The generated tests are executed in the Docker container. Use --local to execute locally.

Additional Resources

Improving Specifications:: Find details about improving the specifications of extracted functions to build benchmarks using LLMs @ Spec Refinement.
PAT: Program Analysis Tools: We developed an in-house program and static analysis toolbox that powers R2E along with LLMs. These tools can be used independently too. Learn more about them @ PAT.

Citation

If you use R2E in your research, please cite the following paper:

@inproceedings{
    jain2024r2e,
    title={R2E: Turning any Github Repository into a Programming Agent Environment},
    author={Naman Jain and Manish Shetty and Tianjun Zhang and King Han and Koushik Sen and Ion Stoica},
    booktitle={ICML},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
assets/images		assets/images
docs		docs
src/r2e		src/r2e
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
unittest.cfg		unittest.cfg
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Key Concept: Equivalence Test Harnesses

Usage

1. Setup and Extract

2. Build and Install

3. Generate and Execute Tests

Additional Resources

Citation

About

Releases

Packages

Contributors 5

Languages

License

r2e-project/r2e

Folders and files

Latest commit

History

Repository files navigation

Installation

Key Concept: Equivalence Test Harnesses

Usage

1. Setup and Extract

2. Build and Install

3. Generate and Execute Tests

Additional Resources

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages