Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Boto3Storage spawns many threads for storing files #610

Closed
NicolasLM opened this issue Oct 1, 2018 · 3 comments
Closed

S3Boto3Storage spawns many threads for storing files #610

NicolasLM opened this issue Oct 1, 2018 · 3 comments
Labels

Comments

@NicolasLM
Copy link

While looking at logs I realized that s3transfer spawns a thread-pool each time it needs to store a file. The rationale is that it allows to send multiple files in parallel. In the case of Django however, the Storage API does not seem to allow bulk uploads so spawning thread-pools is wasteful.

boto3 has a config option to prevent this problem: https://github.com/boto/boto3/blob/0cc6042615fd44c6822bd5be5a4019d0901e5dd2/boto3/s3/transfer.py#L158

Code snippet:

from django.core.files.base import File
from django.core.files.storage import default_storage

default_storage.save(image_path, File(data))

Settings:

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
AWS_S3_REGION_NAME = 'ams3'
AWS_S3_ENDPOINT_URL = 'https://ams3.digitaloceanspaces.com'
AWS_ACCESS_KEY_ID = 'xxx'
AWS_SECRET_ACCESS_KEY = 'xxx'
AWS_STORAGE_BUCKET_NAME = 'xxx'
AWS_DEFAULT_ACL = 'private'
AWS_QUERYSTRING_EXPIRE = 7800

Dependencies:

boto3==1.9.14
botocore==1.12.14
Django==2.1.2
django-storages==1.7.1
s3transfer==0.1.13
@NicolasLM
Copy link
Author

I tried to pass use_threads=False to boto3 but couldn't figure how to do that, so I went with a dirty monkey-patch. If it can help anyone else:

from boto3.s3 import transfer

def create_transfer_manager(*arg, **kwargs):
    return transfer.TransferManager(
        *arg, **kwargs, executor_cls=transfer.NonThreadedExecutor
    )

transfer.create_transfer_manager = create_transfer_manager

@johnyoonh
Copy link

Thanks. Python 2.7 doesn't allow named argument after kwargs. So:

from boto3.s3 import transfer

def create_transfer_manager(*arg, **kwargs):
    kwargs["executor_cls"] = transfer.NonThreadedExecutor
    return transfer.TransferManager(
        *arg, **kwargs
    )

transfer.create_transfer_manager = create_transfer_manager

This seems to reduce memory usage, but didn't solve the growing memory problem that is inherent with the boto3 library:
boto/boto3#1670

@jschneier
Copy link
Owner

Solvable via configuring the TransferConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants