-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If getting less than 1/2 the max bandwidth with my cluster, should I be choosing a different instance type? #286
Comments
changing instance types has only very little impact on the network. Memory should likely be the primary decision factor for the instance type, followed by CPUs. unrelated to the instance types, you may want to increase the worker threads since this is a primarily network bound problem. The network throughput is likely limited by S3 which throttles at about 50MiB/s per connection. On your cluster, you have 50 workers, 4 threads each, i.e. You might get better performance if you doubled the number of threads... import coiled
cluster = coiled.Cluster(
worker_vm_types=["m7i.xlarge"], # pick whatever you like, of course (or use default but check #CPUs)
worker_options={
# make sure this is aligned to the instance type. This is 2x the number CPUs
"nthreads": 8
},
)
client = cluster.get_client() just be careful that now every worker also has twice as many partitions, i.e. it could blow up in memory! |
Hi, @rsignell. Florian and I just had a quick chat and it also probably makes sense to try using a larger number of smaller workers—e.g., 100 Depending on how much tuning you want to do, trying both smaller workers and some oversubscription of threads (maybe 1.5x or maybe 2x, I wouldn't go higher than that). |
Bingo @ntabris! I was a little confused by the initial response because I was already using all the 4 threads on the 50 m7g.xlarge instances Coiled picked for me. I did try using all 8 threads on 25 m6g.2xlarge instances, but that took much longer -- over twice as long. I then noticed while perusing the different characteristics of the AWS instance ARM instance types that they have a free trial going on until Dec 31, 2024 on the t4g.small instances: And when I fired off 100 of these t4g.small 2cpu machines, I got the same performance as the default m7g.xlarge instances, but for free! (and if I use more than 650 hours per month, it will still be only 25% of the cost of the m7g.xlarge instances) Amazing. Goes to show you it really pays to check what instances are appropriate for your type of workflow. And that's only made possible by the Cloud and Coiled! So cool! 😎 |
I've ran a workflow that was just extracting a bunch of data values from a bunch of files in object storage (extracting a time series from a large collection of global simulation NetCDF files on AWS S3).
I have a cluster of 50 workers (200 threads) and I'm only getting less than 1/2 the max bandwidth of the cluster.
Does this mean I should choose a different instance type and perhaps lower my costs?
The text was updated successfully, but these errors were encountered: