-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for faster uploads using Azure Storage Data Movement Library #47
Comments
HowardWolosky
added a commit
to hwbackup/StoreBroker
that referenced
this issue
May 1, 2017
This update provides about a 10-15x increase in upload performance by switching from using the standard Windows Azure Storage API's for uploading/downloading, to the Azure Storage Data Movement API set, which is optimized for faster uploads. That change by itself cut the transfer speeds in half. This then applies two other suggested best practices for increasing transfer bandwidth: * Increasing the `DefaultConnectionLimit` > By default, the .Net HTTP connection limit is 2. This implies that > only two concurrent connections can be maintained. It prevents more > parallel connections accessing Azure blob storage from your application. * Turning off `Expect100Continue` > When the property "Expect100Continue" is set to true, client requests > that use the PUT and POST methods will add an Expect: 100-continue > header to the request and it will expect to receive a 100-Continue > response from the server to indicate that the client should send the > data to be posted. This mechanism allows clients to avoid sending large > amounts of data over the network when the server, based on the request > headers, intends to reject the request. > > However, once the entire payload is received on the server end, other > errors may still occur. And if Windows Azure clients have tested the > client well enough to ensure that it is not sending any bad requests, > clients could turn off 100-continue so that the entire request is sent > in one roundtrip. This is especially true when clients send small size > storage objects. Some validation tests were performed with sample sizes of 10: **Upload of a one-gigabye file** _Current method_ * Avg: 205 seconds * Median: 196 seconds _Using DataMovement alone_ * Avg: 18 seconds * Median: 18 seconds _Using DataMovement with best practices_ * Avg: 18 seconds * Median: 18 seconds **Upload of a six-gigabye file** _Current method_ * Avg: 1413 seconds * Median: 1413 seconds _*Note: This took so much longer that I only tried two samples._ _Using DataMovement with best practices_ * Avg: 90.6 seconds * Median: 91.5 seconds Because these results were so surprising, I then downloaded the file after uploading it, and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded successfully and completely. Resolves Issue microsoft#47: Add support for faster uploads using Azure Storage Data Movement Library
HowardWolosky
added a commit
to hwbackup/StoreBroker
that referenced
this issue
May 2, 2017
This update provides about a 10-15x increase in upload performance by switching from using the standard Windows Azure Storage API's for uploading/downloading, to the Azure Storage Data Movement API set, which is optimized for faster uploads. That change by itself cut the transfer speeds in half. This then applies two other suggested best practices for increasing transfer bandwidth: * Increasing the `DefaultConnectionLimit` > By default, the .Net HTTP connection limit is 2. This implies that > only two concurrent connections can be maintained. It prevents more > parallel connections accessing Azure blob storage from your application. * Turning off `Expect100Continue` > When the property "Expect100Continue" is set to true, client requests > that use the PUT and POST methods will add an Expect: 100-continue > header to the request and it will expect to receive a 100-Continue > response from the server to indicate that the client should send the > data to be posted. This mechanism allows clients to avoid sending large > amounts of data over the network when the server, based on the request > headers, intends to reject the request. > > However, once the entire payload is received on the server end, other > errors may still occur. And if Windows Azure clients have tested the > client well enough to ensure that it is not sending any bad requests, > clients could turn off 100-continue so that the entire request is sent > in one roundtrip. This is especially true when clients send small size > storage objects. Some validation tests were performed with sample sizes of 10: **Upload of a one-gigabye file** _Current method_ * Avg: 205 seconds * Median: 196 seconds _Using DataMovement alone_ * Avg: 18 seconds * Median: 18 seconds _Using DataMovement with best practices_ * Avg: 18 seconds * Median: 18 seconds **Upload of a six-gigabye file** _Current method_ * Avg: 1413 seconds * Median: 1413 seconds _*Note: This took so much longer that I only tried two samples._ _Using DataMovement with best practices_ * Avg: 90.6 seconds * Median: 91.5 seconds Because these results were so surprising, I then downloaded the file after uploading it, and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded successfully and completely. Resolves Issue microsoft#47: Add support for faster uploads using Azure Storage Data Movement Library
HowardWolosky
added a commit
that referenced
this issue
May 2, 2017
This update provides about a 10-15x increase in upload performance by switching from using the standard Windows Azure Storage API's for uploading/downloading, to the Azure Storage Data Movement API set, which is optimized for faster uploads. That change by itself cut the transfer speeds in half. This then applies two other suggested best practices for increasing transfer bandwidth: * Increasing the `DefaultConnectionLimit` > By default, the .Net HTTP connection limit is 2. This implies that > only two concurrent connections can be maintained. It prevents more > parallel connections accessing Azure blob storage from your application. * Turning off `Expect100Continue` > When the property "Expect100Continue" is set to true, client requests > that use the PUT and POST methods will add an Expect: 100-continue > header to the request and it will expect to receive a 100-Continue > response from the server to indicate that the client should send the > data to be posted. This mechanism allows clients to avoid sending large > amounts of data over the network when the server, based on the request > headers, intends to reject the request. > > However, once the entire payload is received on the server end, other > errors may still occur. And if Windows Azure clients have tested the > client well enough to ensure that it is not sending any bad requests, > clients could turn off 100-continue so that the entire request is sent > in one roundtrip. This is especially true when clients send small size > storage objects. Some validation tests were performed with sample sizes of 10: **Upload of a one-gigabye file** _Current method_ * Avg: 205 seconds * Median: 196 seconds _Using DataMovement alone_ * Avg: 18 seconds * Median: 18 seconds _Using DataMovement with best practices_ * Avg: 18 seconds * Median: 18 seconds **Upload of a six-gigabye file** _Current method_ * Avg: 1413 seconds * Median: 1413 seconds _*Note: This took so much longer that I only tried two samples._ _Using DataMovement with best practices_ * Avg: 90.6 seconds * Median: 91.5 seconds Because these results were so surprising, I then downloaded the file after uploading it, and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded successfully and completely. Resolves Issue #47: Add support for faster uploads using Azure Storage Data Movement Library
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
More info on the library can be found on the GitHub page.
We'll need to add a new NuGet dependency (
Microsoft.Azure.Storage.DataMovement
) which builds on top of the pre-existing dependency onWindowsAzure.Storage
. The NuGet package brings in a number of other dependencies too, and I'm not yet sure if the rest can be ignored for our scenarios. More investigation will need to happen.Right now, in
Set-SubmissionPackage
, after we create the$cloudBlockBlob
we call the member methodUploadFromFile()
. It looks like with Azure Storage DML, we'd instead callTransferManager.UploadAsync
. Calling async methods from PowerShell is not typical, but initial testing indicates we'd be able to call it by calling
[Microsoft.WindowsAzure.Storage.DataMovement.TransferManager]::UploadAsync(...).GetAwaiter().GetResult();
which would turn it into a synchronous call.We'd also probably want to follow some of the DML best practices around increasing the default connection limit and turning off 100 continue.
If that works, that would retain existing behavior where StoreBroker just sits there until the upload completes. If we want to do one better, we could look into what it would take to display true upload status (percentage uploaded/remaining as opposed to just time elapsed), by leveraging
SingleTransferContext
and registering aProgressHandler
to report back status as the upload occurs.The text was updated successfully, but these errors were encountered: