-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 - Accept Range argument for download_file / download_fileobj methods #3339
Comments
Hi @vEpiphyte have you tried using the Range argument with get_object? For example:
|
Hi @tim-finnigan ! So the reason I'm looking at the two helper APIs is that my main target is a file descriptor that is backed by a socket; so that provides a mechanism for limiting memory consumption [since writes to that file descriptor are blocked waiting for the reader to read chunks from it]. Potentially large requests for entire blobs, or ranges thereof, don't end up exhausting memory; since the application is reliant on the requester to handle those chunks, draining the socket and allowing for additional data to be written. It does look like the get_object response represents the response data in a StreamingBody object (https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html?highlight=streamingbody#botocore.response.StreamingBody) that could be used to feed a file descriptor in my case. I can give that a shot and see how that works. |
Also experiencing issues due to this. Working Code: img_data_buf = BytesIO()
S3.download_fileobj(bucket, key, img_data_buf)
img_data_buf.seek(0)
result = get_image_info(img_data_buf) Failing Code: s3_response_object = S3.get_object(
Bucket=bucket,
Key=key,
# Range="bytes=0-512" # fetching the full range throws the same error
)
img_data_buf = s3_response_object["Body"].read()
result = get_image_info(img_data_buf) Error message:
It looks like there is a workaround to cast the StreamingBody into a BytesIO object, see this comment |
@pradoz the problem with casting the StreamingBody into a BytesIO is that you have to read the entire response into memory to do so; so a large file being downloaded from S3 that exceeded available memory could cause a Python MemoryError (and thats not good). |
Checking in again - I think this may actually be a duplicate of an older issue: #1215. A corresponding issue was also created here in the s3transfer repository: boto/s3transfer#248. To make issue tracking easier we generally combine overlapping issues. Please let us know if there are any distinctions you'd like to make between the issues. |
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one. |
@tim-finnigan I will try to review these today. I missed this over the american holiday week. |
Describe the feature
I'd like to be able to use a Range argument in the S3 download_file / download_fileobj methods to download a subset of a file; according to the S3 Byte-Range requests noted here
https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html
https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_RequestSyntax
Use Case
My use case it to seek into blobs at given offsets and read a portion of them; without needing to read the entire blob. I have a need to read these blobs primarily into a file descriptor.
Proposed Solution
I believe the s3transfer library can be modified to allow this behavior to occur; but I am uncertain if there are other considerations [such as multi-threaded downloads] which may be complicated by simply allowing Range in the
ALLOWED_DOWNLOAD_ARGS
constant. When I had modified that; I ran into issues when testing with moto and didn't want to go too much further in the event that there was a better or more correct way to add support for Range here.Other Information
If this is appropriate for moving to the s3transfer project; or there exists a way to provide a range header, please let me know and we can close this out accordingly :)
Acknowledgements
SDK version used
1.24.27
Environment details (OS name and version, etc.)
Python 3.8 / Python 3.10; Debian and Ubunutu latest stable releases.
The text was updated successfully, but these errors were encountered: