Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If getting less than 1/2 the max bandwidth with my cluster, should I be choosing a different instance type? #286

Open
rsignell opened this issue Jul 11, 2024 · 3 comments

Comments

@rsignell
Copy link

rsignell commented Jul 11, 2024

I've ran a workflow that was just extracting a bunch of data values from a bunch of files in object storage (extracting a time series from a large collection of global simulation NetCDF files on AWS S3).

I have a cluster of 50 workers (200 threads) and I'm only getting less than 1/2 the max bandwidth of the cluster.

image
Does this mean I should choose a different instance type and perhaps lower my costs?

@fjetter
Copy link
Member

fjetter commented Jul 11, 2024

changing instance types has only very little impact on the network. Memory should likely be the primary decision factor for the instance type, followed by CPUs.


unrelated to the instance types, you may want to increase the worker threads since this is a primarily network bound problem. The network throughput is likely limited by S3 which throttles at about 50MiB/s per connection. On your cluster, you have 50 workers, 4 threads each, i.e. 50MiB/s * 50 * 4 ~ 10GiB/s

You might get better performance if you doubled the number of threads...

import coiled
cluster = coiled.Cluster(
    worker_vm_types=["m7i.xlarge"],  # pick whatever you like, of course (or use default but check #CPUs)
    worker_options={
        # make sure this is aligned to the instance type. This is 2x the number CPUs
        "nthreads": 8
    }, 
)
client = cluster.get_client()

just be careful that now every worker also has twice as many partitions, i.e. it could blow up in memory!

@ntabris
Copy link
Member

ntabris commented Jul 11, 2024

Hi, @rsignell.

Florian and I just had a quick chat and it also probably makes sense to try using a larger number of smaller workers—e.g., 100 m7g.large workers (instead of 50 m7g.xlarge).

Depending on how much tuning you want to do, trying both smaller workers and some oversubscription of threads (maybe 1.5x or maybe 2x, I wouldn't go higher than that).

@rsignell
Copy link
Author

rsignell commented Jul 12, 2024

Bingo @ntabris!

I was a little confused by the initial response because I was already using all the 4 threads on the 50 m7g.xlarge instances Coiled picked for me. I did try using all 8 threads on 25 m6g.2xlarge instances, but that took much longer -- over twice as long.

I then noticed while perusing the different characteristics of the AWS instance ARM instance types that they have a free trial going on until Dec 31, 2024 on the t4g.small instances:

image

And when I fired off 100 of these t4g.small 2cpu machines, I got the same performance as the default m7g.xlarge instances, but for free! (and if I use more than 650 hours per month, it will still be only 25% of the cost of the m7g.xlarge instances)

Amazing. Goes to show you it really pays to check what instances are appropriate for your type of workflow.
For the same performance with the same workflow, I can pay $4/hour, $1/hour, or FREE (while the promotion lasts).

And that's only made possible by the Cloud and Coiled! So cool! 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants