Skip to content

AWS Benchmarks

Alberto Sonnino edited this page Dec 14, 2021 · 14 revisions

This repo integrates various python scripts to deploy and benchmark the codebase on Amazon Web Services (AWS). They are particularly useful to run benchmarks in the WAN, across multiple data centers.

Step 1. Set up your AWS credentials

Set up your AWS credentials to enable programmatic access to your account from your local machine. These credentials will authorize your machine to create, delete, and edit instances on your AWS account programmatically. First of all, find your 'access key id' and 'secret access key'. Then, create a file ~/.aws/credentials with the following content:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

Do not specify any AWS region in that file as the python scripts will allow you to handle multiple regions programmatically.

Step 2. Add your SSH public key to your AWS account

You must now add your SSH public key to your AWS account. This operation is manual (AWS exposes little APIs to manipulate keys) and needs to be repeated for each AWS region that you plan to use. Upon importing your key, AWS requires you to choose a 'name' for your key; ensure you set the same name on all AWS regions. This SSH key will be used by the python scripts to execute commands and upload/download files to your AWS instances. If you don't have an SSH key, you can create one using ssh-keygen:

$ ssh-keygen -f ~/.ssh/aws

Step 3. Configure the testbed

The file settings.json (located in hotstuff/benchmarks) contains all the configuration parameters of the testbed to deploy. Its content looks as follows:

{
    "testbed": "hotstuff",
    "key": {
        "name": "aws",
        "path": "/absolute/key/path"
    },
    "ports": {
        "consensus": 8000,
        "mempool": 7000,
        "front": 6000
    },
    "repo": {
        "name": "hotstuff",
        "url": "https://github.com/asonnino/hotstuff.git",
        "branch": "main"
    },
    "instances": {
        "type": "m5d.8xlarge",
        "regions": ["us-east-1", "eu-north-1", "ap-southeast-2", "us-west-1", "ap-northeast-1"]
    }
}

The testbed name (testbed) is used to tag the name of the AWS instance for easy reference.

The first block (key) contains information regarding your SSH key:

"key": {
    "name": "aws",
    "path": "/absolute/key/path"
},

Enter the name of your SSH key; this is the name you specified in the AWS web console in step 2. Also, enter the absolute path of your SSH private key (using a relative path won't work).

The second block (ports) specifies the TCP ports to use:

"ports": {
    "consensus": 8000,
    "mempool": 7000,
    "front": 6000
},

HotStuff requires 3 ports; the first is used for consensus messages, the second for mempool messages, and the last to receive client transactions. Note that these ports will be open to the WAN on all your AWS instances.

The third block (repo) contains the information regarding the repository's name, the URL of the repo, and the branch containing the code to deploy:

"repo": {
    "name": "hotstuff",
    "url": "https://github.com/asonnino/hotstuff.git",
    "branch": "main"
},

Remember to update the url field to the name of your repo. Modifying the branch name is particularly useful when testing new functionalities without having to checkout the code locally.

The the last block (instances) specifies the AWS instance type and the AWS regions to use:

"instances": {
    "type": "m5d.8xlarge",
    "regions": ["us-east-1", "eu-north-1", "ap-southeast-2", "us-west-1", "ap-northeast-1"]
}

The instance type selects the hardware on which to deploy the testbed. For example, m5d.8xlarge instances come with 32 vCPUs (16 physical cores), 128 GB of RAM, and guarantee 10 Gbps of bandwidth. The python scripts will configure each instance with 300 GB of SSD hard drive. The regions field specifies the data centers to use. If you require more nodes than data centers, the python scripts will distribute the nodes as equally as possible amongst the data centers. All machines run a fresh install of Ubuntu Server 20.04.

Step 4. Create a testbed

The AWS instances are orchestrated with Fabric from the file fabfile.py (located in hotstuff/benchmarks); you can list all possible commands as follows:

$ cd hotstuff/benchmark
$ fab --list

The command fab create creates new AWS instances; open fabfile.py and locate the create task:

@task
def create(ctx, nodes=2):
    ...

The parameter nodes determines how many instances to create in each AWS region. That is, if you specified 5 AWS regions as in the example of step 3, setting nodes=2 will creates a total of 10 machines:

$ fab create

Creating 10 instances |██████████████████████████████| 100.0% 
Waiting for all instances to boot...
Successfully created 10 new instances

You can then clone the repo and install rust on the remote instances with fab install:

$ fab install

Installing rust and cloning the repo...
Initialized testbed of 10 nodes

This may take a long time as the command will first update all instances. The commands fab stop and fab start respectively stop and start the testbed without destroying it (it is good practice to stop the testbed when not in use as AWS can be quite expensive); and fab destroy terminates all instances and destroys the testbed. Note that, depending on the instance types, AWS instances may take up to several minutes to fully start or stop. The command fab info displays a nice summary of all available machines and information to manually connect to them (for debug).

Step 5. Run a benchmark

After setting up the testbed, running a benchmark on AWS is similar to running it locally (see Run Local Benchmarks). Locate the task remote in fabfile.py:

@task
def remote(ctx):
    ...

The benchmark parameters are similar to local benchmarks but allow to specify the number of nodes and the input rate as arrays to automate multiple benchmarks with a single command; the parameter runs specifies the number of times to repeat each benchmark (to later compute the average and stdev of the results):

bench_params = {
    'faults': 0,
    'nodes': [10, 20, 30],
    'rate': [20_000, 30_000, 40_000],
    'tx_size': 512,
    'duration': 300,
    'runs': 2,
}

Similarly to local benchmarks, the scripts will deploy as many clients as nodes and divide the input rate equally amongst each client. Each client shares an AWS instance with a node, and only submit transactions to the node with whom they share the machine.

Once you specified both bench_params and node_params as desired, run:

$ fab remote

This command first updates all machines with the latest commit of the GitHub repo and branch specified in your file settings.json (step 3); this ensures that benchmarks are always run with the latest version of the code. It then generates and uploads the configuration files to each machine, runs the benchmarks with the specified parameters, and downloads the logs. It finally parses the logs and prints the results into a folder called results (which is automatically created if it doesn't already exists). You can run fab remote multiple times without fearing to override previous results, the command either appends new results to a file containing existing ones or prints them in separate files. If anything goes wrong during a benchmark, you can always stop it by running fab kill.

Step 6. Plot the results

Once you have enough results, you can aggregate and plot them:

$ fab plot

This command creates a latency graph, a throughput graph, and a robustness graph in a folder called plots (which is automatically created if it doesn't already exists). The next section provides insights on how to interpret those graphs. You can adjust the plot parameters to filter which curves to add to the plot:

plot_params = {
    'faults': [0],
    'nodes': [10, 20],
    'tx_size': 512,
    'max_latency': [2_000, 5_000]
}