|
| 1 | +# Running on AWS |
| 2 | + |
| 3 | + |
| 4 | +Let's see how easy it is to run `fseval` jobs on AWS. We are going to do this using the [Ray launcher](https://hydra.cc/docs/1.2/plugins/ray_launcher/) from Hydra. We are going to take the quick start example and run it on an AWS cluster. Specifically, EC2. |
| 5 | + |
| 6 | +## Prerequisites |
| 7 | + |
| 8 | +We are going to use more or less the same configuration as we did in the [Quick start](../../quick-start) example. Again, start by downloading the example project: [running-on-aws-using-ray.zip](pathname:///fseval/zipped-examples/running-on-aws-using-ray.zip) |
| 9 | + |
| 10 | +### Installing the required packages |
| 11 | + |
| 12 | +Now, let's install the required packages: |
| 13 | + |
| 14 | +``` |
| 15 | +pip install hydra-ray-launcher --upgrade |
| 16 | +pip install ray[default]==1.13.0 |
| 17 | +``` |
| 18 | + |
| 19 | +:::note |
| 20 | + |
| 21 | +We require Ray version 1.13 and up, because it contains a fix regarding `protobuf` that is necessary for our setup. |
| 22 | + |
| 23 | +::: |
| 24 | + |
| 25 | +### Authenticating to AWS |
| 26 | +Make sure you are authenticated to AWS. Ray uses either the [environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvar) or your AWS profile stored in `~/.aws` (run `aws configure` to install a profile) to read your authentication details. Make sure you have the [AWS V2](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) installed. |
| 27 | + |
| 28 | +You can test your authentication as follows: |
| 29 | + |
| 30 | +```shell |
| 31 | +aws sts get-caller-identity |
| 32 | +``` |
| 33 | + |
| 34 | +✓ which should give you some output. |
| 35 | + |
| 36 | +✕ if this does not yet work, see AWS's elorate guide on authenticating the CLI for more info: [AWS CLI Configuration basics](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html). |
| 37 | + |
| 38 | +## Experiment setup |
| 39 | +In the experiment, we configured the main config file like so: |
| 40 | + |
| 41 | + |
| 42 | +```yaml title="conf/my_config.yaml" |
| 43 | +defaults: |
| 44 | + - base_pipeline_config |
| 45 | + - _self_ |
| 46 | + - override dataset: synthetic |
| 47 | + - override validator: knn |
| 48 | + - override /callbacks: |
| 49 | + - to_sql |
| 50 | + // highlight-start |
| 51 | + - override hydra/launcher: custom_ray_aws |
| 52 | + // highlight-end |
| 53 | + |
| 54 | +n_bootstraps: 1 |
| 55 | +callbacks: |
| 56 | + to_sql: |
| 57 | + url: sqlite:////home/ubuntu/results/results.sqlite # any well-defined database URL |
| 58 | + |
| 59 | +``` |
| 60 | + |
| 61 | +Here, we are configuring to use a new launcher called `custom_ray_aws`. |
| 62 | + |
| 63 | +```yaml title="conf/hydra/launcher/custom_ray_aws.yaml" |
| 64 | +defaults: |
| 65 | + - ray_aws |
| 66 | + |
| 67 | +env_setup: |
| 68 | + pip_packages: |
| 69 | + fseval: 3.0.3 |
| 70 | + |
| 71 | +ray: |
| 72 | + cluster: |
| 73 | + # Mount our code to the execution directory on both the head- and worker nodes. |
| 74 | + # See: https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html#cluster-configuration-file-mounts |
| 75 | + file_mounts: |
| 76 | + /home/ubuntu: benchmark.py |
| 77 | + |
| 78 | + initialization_commands: |
| 79 | + - mkdir -p /home/ubuntu/results |
| 80 | + |
| 81 | +sync_down: |
| 82 | + source_dir: /home/ubuntu/results |
| 83 | + target_dir: . |
| 84 | +``` |
| 85 | +
|
| 86 | +In this launcher config, a lot of stuff is happening. In short: |
| 87 | +
|
| 88 | +1. `fseval` is installed on the EC2 cluster node |
| 89 | +2. Once a node has been started, `benchmark.py` is mounted into the home folder. Ray by default runs all experiments from this folder. This is such that we can correctly instantiate any classes that were defined in the config as a `_target_`. |
| 90 | + |
| 91 | + :::note |
| 92 | + If you would like to explore with a terminal inside a node, try setting `stop_cluster: false` in `custom_ray_config.yaml`, run an experiment, and then [connect to your EC2 instance with SSH](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html). |
| 93 | + ::: |
| 94 | + |
| 95 | +3. An initialization command is run to make sure the `/home/ubuntu/results` directory exists. We configured our SQLite table to be stored inside this folder. |
| 96 | +4. Finally, once the experiment has been finished, download everything inside of `/home/ubuntu/results` to the current working directory (`.`). |
| 97 | + |
| 98 | +## Running the experiment |
| 99 | + |
| 100 | + |
| 101 | +``` |
| 102 | +python benchmark.py --multirun ranker='glob(*)' |
| 103 | +``` |
| 104 | +
|
| 105 | +
|
| 106 | +Now, the experiment should start running, on AWS! Ray automatically instantiates and configures nodes on EC2, ships your code, installs `fseval`, and runs the experiments. Cool! |
| 107 | +
|
| 108 | + |
| 109 | +
|
| 110 | +🙌🏻 |
| 111 | +
|
| 112 | +The results are downloaded back to your local computer, and are available in the `results` folder: |
| 113 | +
|
| 114 | +``` |
| 115 | +(base) ➜ running-on-aws-using-ray git:(recipe/running-on-aws) ✗ tree results |
| 116 | +results |
| 117 | +└── results.sqlite |
| 118 | + |
| 119 | +0 directories, 1 file |
| 120 | +``` |
| 121 | +
|
| 122 | +In this way, it's possible to run experiments on a massive scale, by using Amazon's datacentres for scaling. |
| 123 | +
|
| 124 | +
|
| 125 | +--- |
| 126 | +
|
| 127 | +## 🌐 Sources |
| 128 | +For more information, see the following sources: |
| 129 | +
|
| 130 | +- [Hydra's Ray Launcher plugin docs](https://hydra.cc/docs/1.2/plugins/ray_launcher/) |
| 131 | +- [Ray YAML configuration options](https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html) |
| 132 | +- [EC2 instances types](https://aws.amazon.com/ec2/instance-types/) |
| 133 | +- [EC2 custom AMI types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) |
| 134 | +- [Connecting to a live database instead of saving to a local `.sqlite` file](https://dev.to/chrisgreening/connecting-to-a-relational-database-using-sqlalchemy-and-python-1619) |
0 commit comments