-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize reading from S3 #322
Conversation
We raise ValueError for backwards compatibility, this should really be IOError in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, so the issue was due to multiple remote HEAD
calls for content_length
, right?
IMO 30% overhead is acceptable (plus there seems to be some ahead-of-time buffering in smart_open, which may be related too?).
@@ -386,6 +386,8 @@ def test_nonexisting_bucket(self): | |||
fout.write(expected) | |||
|
|||
def test_read_nonexisting_key(self): | |||
create_bucket_and_key() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was happening before this change? Testing for a missing bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nonexisting key in an existing bucket.
The test started failing once we moved the content_length calls around. It's probably tripping over some moto oddity
@mpenkov our github repo badge is now red, "Build failing":
Some issue with the modified test? |
It may be a merge artifact. I'll have a look. |
Fixes #317. Looks like content_length wasn't being cached in boto3.
With the updated code:
We're now around 1.3 times slower than using boto3. I suspect that this is a start-up cost only, so as we read larger files, it will amortize.