-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add max_concurrency support for azure #642
Conversation
@omBratteng Could you please enable maintainer edits for this PR? I need to push some changes to get the tests working, but I'm getting permission denied. |
@mpenkov I can't see where to enable it after I've created the PR, but I've added you to our fork, so you should be able to edit it there directly. |
I'd rather get things working in this PR. I'm still getting permission denied, though. Can you please have a look here: https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork That has instructions for giving maintainer access to PRs. |
Sorry @mpenkov, I can't seem to have that option anywhere. If it's just to get the latest commits in from the develop branch, I can do a rebase. |
No need to rebase, just merge the develop HEAD. |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Thank you for your contribution @omBratteng ! |
Motivation
download_blob
inazure.storage.blob
supports setting a max concurrency for downloading blobs, which can improve download speeds. In a similar implementation ofsmart_open
, we've noticed that downloading a 313 GiB file averaged 83 minutes download time, whilstsmart_open
at 122 minutes.Some benchmarks (note: these also calculates the sha256 checksum of the blob downloaded)
buffer_size
is set to268435456
As one can see, setting max_concurrency to 4, gives a decrease of 27%, which I think is significant, and is a nice feature to add to the
azure
connector insmart_open
Tests
Failing tests seems to not be related to
azure
, buts3
Checklist
Before you create the PR, please make sure you have: