This README contains documentation for the main inference script run.sh
along with some miscellaneous scripts that may be helpful.
Warning
These scripts have been written to be invoked from the root of this codebase (i.e. ./scripts/run.sh
).
The ./run.sh
script has been provided as an example of how to invoke run.py
.
A single run.py
call will generate a trajectory/<username>/<experiment name>
folder containing the trajectories and predictions generated by a <model_name>
model run on every instance in the <data_path>
dataset.
The following is a comprehensive guide to using the provided run.py
script, detailing available command-line arguments, their purposes, and default values. Flags that you might find helpful have been marked with a 💡.
The code and explanation of the implementations for configuration based workflows are explained in agent/
.
Tip
Run python run.py --help
to view the most up-to-date documentation of the arguments.
-h, --help
: Show the help message and exit.
These arguments configure the script's behavior:
--instance_filter <str>
💡: Run instances that match this regex pattern. Default is .*.--noskip_existing, --skip_existing,
: [Do not] skip instances that have been completed before.--suffix <str>
: Appends a suffix to the name of the folder containing the trajectories for an experiment run.
These arguments are related to the environment configuration:
--data_path <str>
💡: Path to the data file -or- a Hugging Face dataset -or- a GitHub issue URL.--base_commit <str>
: You can specify the base commit sha to checkout. This is determined automatically for instances in SWE-bench.--image_name <str>
: Name of the Docker image to use. Default is swe-agent.--noinstall_environment, --install_environment
: [Do not] install the environment. Default is True.--noverbose, --verbose
: Enable verbose output. Default is False.--timeout <int>
: Timeout in seconds. Default is 35.--container_name <str>
💡: Name of the Docker container if you would like to create a persistent container. Optional.
Warning
If you specify a container name, do not run multiple instances of run.py
with the same container name!
Configure agent behavior:
--config_file <Path>
💡: Path to the configuration YAML file. Default is config/default.yaml.
Configure model parameters:
--model_name <str>
💡: Name of the model. Default isgpt4
.--per_instance_cost_limit <float>
💡: Per-instance cost limit (interactive loop will automatically terminate when cost limit is hit). Default is 3.0.--temperature <float>
💡: Model temperature. Default is 0.0.--top_p <float>
💡: Top p filtering. Default is 0.95.--total_cost_limit <float>
: Total cost limit. Default is 0.0 (unlimited).
Run with custom data path and verbose mode:
python run.py --data_path /path/to/data.json --verbose
Specify a model and adjust the temperature and top_p parameters:
python run.py --model_name gpt4 --temperature 0.2 --top_p 0.9
remove_all_containers.sh
: Forcibly removes all Docker containers currently present on the system.run_and_eval.sh
: Runs SWE-agent inference and evaluation on a specified dataset N times. You can specify thedataset_path
,num_runs
,template
, andsuffix
arguments.run_jsonl.sh
: Run SWE-agent inference from a.jsonl
file that contains a SWE-bench style task instance.run_replay.sh
: Run SWE-agent inference from a.traj
file. This is useful for automatically creating a new demonstration for a new config from an existing sequence of actions.