Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of parallel s3 transfers #907

Closed
phemmer opened this issue Sep 3, 2014 · 4 comments
Closed

Limit number of parallel s3 transfers #907

phemmer opened this issue Sep 3, 2014 · 4 comments
Labels
feature-request A feature should be added or improved. s3

Comments

@phemmer
Copy link

phemmer commented Sep 3, 2014

Can we get a way to limit the number of parallel s3 transfers? As it is, transfer jobs are consuming a lot of system resources (CPU, disk IO, bandwidth) because the aws s3 sync command is launching several parallel transfers.

The simplest way I can think of would be to pull constants from environment variables. This would let you override the MAX_PARTS from constants.py.
With this method, power users could override other constants as well. But for my use case, a --max-parts command line option would suffice.

@kyleknap
Copy link
Contributor

kyleknap commented Sep 3, 2014

There is no way to limit the number of parallel s3 transfers from the command line itself.

That being said you do not want to change MAX_PARTS since that designates the maximum number of parts in a multipart upload. For now, to limit the parallelism, you would want to decrease NUM_THREADS. It is currently at 10, meaning there are 10 threads transferring files. Do not change this number to less than one. I would say changing it 2 or 3 would be good to limit parallelism.

@danoyoung
Copy link

Is there a way to increase the NUM_THREADS so that more files are downloaded in parallel? I've increased this to 20 and now see 4 files vs 2 during a sync. But how can we get to say 6,8 files? I changed the NUM_THREADS to 30, but seems to have the same affect as 20.

@kyleknap
Copy link
Contributor

It also depends on how large the file is. Most of the time, files are uploaded as 5 MB chunks so increasing the chunksize would decrease the amount of parts a file needs to be completely uploaded. Then since each thread uploads one of these chunks, more threads could be used for other files since there are less parts to upload.

Increasing NUM_THREADS would increase parallelism, but at a decreasing rate. You should be aware that at some point, depending on your bandwidth, threads will throttle each other because each of these threads are making separate requests.

@jamesls
Copy link
Member

jamesls commented Mar 4, 2015

This is now possible via #1122, docs are here: https://github.com/aws/aws-cli/blob/develop/awscli/topics/s3-config.rst

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. s3
Projects
None yet
Development

No branches or pull requests

4 participants