Skip to content

Latest commit

 

History

History
 
 

scripts

Scripts

This README contains documentation for the main inference script run.sh along with some miscellaneous scripts that may be helpful.

Warning

These scripts have been written to be invoked from the root of this codebase (i.e. ./scripts/run.sh).

🏃 Inference Script

The ./run.sh script has been provided as an example of how to invoke run.py.

A single run.py call will generate a trajectory/<username>/<experiment name> folder containing the trajectories and predictions generated by a <model_name> model run on every instance in the <data_path> dataset.

The following is a comprehensive guide to using the provided run.py script, detailing available command-line arguments, their purposes, and default values. Flags that you might find helpful have been marked with a 💡.

The code and explanation of the implementations for configuration based workflows are explained in agent/.

Tip

Run python run.py --help to view the most up-to-date documentation of the arguments.

Optional Arguments

  • -h, --help: Show the help message and exit.

Script Arguments

These arguments configure the script's behavior:

  • --instance_filter <str> 💡: Run instances that match this regex pattern. Default is .*.
  • --noskip_existing, --skip_existing,: [Do not] skip instances that have been completed before.
  • --suffix <str>: Appends a suffix to the name of the folder containing the trajectories for an experiment run.

Environment Arguments

These arguments are related to the environment configuration:

  • --data_path <str> 💡: Path to the data file -or- a Hugging Face dataset -or- a GitHub issue URL.
  • --base_commit <str>: You can specify the base commit sha to checkout. This is determined automatically for instances in SWE-bench.
  • --image_name <str>: Name of the Docker image to use. Default is swe-agent.
  • --noinstall_environment, --install_environment: [Do not] install the environment. Default is True.
  • --noverbose, --verbose: Enable verbose output. Default is False.
  • --timeout <int>: Timeout in seconds. Default is 35.
  • --container_name <str> 💡: Name of the Docker container if you would like to create a persistent container. Optional.

Warning

If you specify a container name, do not run multiple instances of run.py with the same container name!

AgentArguments

Configure agent behavior:

  • --config_file <Path> 💡: Path to the configuration YAML file. Default is config/default.yaml.

ModelArguments

Configure model parameters:

  • --model_name <str> 💡: Name of the model. Default is gpt4.
  • --per_instance_cost_limit <float> 💡: Per-instance cost limit (interactive loop will automatically terminate when cost limit is hit). Default is 3.0.
  • --temperature <float> 💡: Model temperature. Default is 0.0.
  • --top_p <float> 💡: Top p filtering. Default is 0.95.
  • --total_cost_limit <float>: Total cost limit. Default is 0.0 (unlimited).

📙 Example Usage

Run with custom data path and verbose mode:

python run.py --data_path /path/to/data.json --verbose

Specify a model and adjust the temperature and top_p parameters:

python run.py --model_name gpt4 --temperature 0.2 --top_p 0.9

🛠️ Miscellaneous Scripts

  • remove_all_containers.sh: Forcibly removes all Docker containers currently present on the system.
  • run_and_eval.sh: Runs SWE-agent inference and evaluation on a specified dataset N times. You can specify the dataset_path, num_runs, template, and suffix arguments.
  • run_jsonl.sh: Run SWE-agent inference from a .jsonl file that contains a SWE-bench style task instance.
  • run_replay.sh: Run SWE-agent inference from a .traj file. This is useful for automatically creating a new demonstration for a new config from an existing sequence of actions.