-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve buffering efficiency #427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Just curious – since when have we had this bug? It looks pretty serious (poor S3 performance).
Probably for a while now, as long as I can remember, anyway. I'm not sure what the actual impact of this bug is. I'll run some benchmarks later and let you know if anything interesting comes up. |
Before: ------------------------------------------- benchmark: 1 tests ------------------------------------------- Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ---------------------------------------------------------------------------------------------------------- test 4.8925 10.1093 5.9906 2.3032 5.0104 1.3963 1;1 0.1669 5 1 ---------------------------------------------------------------------------------------------------------- After: ------------------------------------------- benchmark: 1 tests ------------------------------------------ Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations --------------------------------------------------------------------------------------------------------- test 4.9611 9.7707 5.9822 2.1190 5.0280 1.3168 1;1 0.1672 5 1 ---------------------------------------------------------------------------------------------------------
I haven't been able to measure any significant benefit in doing this. I suspect something else outside of our control is performing its own buffering. |
Despite the lack of performance improvement, I think we should merge this anyway, as the new way of doing things makes more sense. Let me know if you think otherwise. |
What's the performance before/after? |
It's in the commit message for the "add benchmarks" commit.
|
Is boto is doing some buffering internally? How else could reading one-byte-at-a-time be the same speed? Strange. If so, maybe we shouldn't buffer at all in |
Yes, I also suspect boto3 does its own buffering. We could investigate, and remove our own buffering and compare performance, but that's a much larger change than the current PR. I'd rather deal with that separately, when we have more time. |
Motivation
This fixes an edge case where the user performs multiple sequential reads of a small number of bytes, e.g. one byte at a time. The previous implementation would fill the buffer one byte a time, which negates the benefit of using a buffer.
The new implementation fixes the above problem by always reading in chunks that are larger than a sensible threshold (currently equal to io.DEFAULT_BUFFER_SIZE).
Checklist
Before you create the PR, please make sure you have: