Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for faster uploads using Azure Storage Data Movement Library #47

Closed
HowardWolosky opened this issue Apr 28, 2017 · 0 comments
Assignees

Comments

@HowardWolosky
Copy link
Member

More info on the library can be found on the GitHub page.

We'll need to add a new NuGet dependency (Microsoft.Azure.Storage.DataMovement) which builds on top of the pre-existing dependency on WindowsAzure.Storage. The NuGet package brings in a number of other dependencies too, and I'm not yet sure if the rest can be ignored for our scenarios. More investigation will need to happen.

Right now, in Set-SubmissionPackage, after we create the $cloudBlockBlob we call the member method UploadFromFile(). It looks like with Azure Storage DML, we'd instead call TransferManager.UploadAsync
. Calling async methods from PowerShell is not typical, but initial testing indicates we'd be able to call it by calling [Microsoft.WindowsAzure.Storage.DataMovement.TransferManager]::UploadAsync(...).GetAwaiter().GetResult(); which would turn it into a synchronous call.

We'd also probably want to follow some of the DML best practices around increasing the default connection limit and turning off 100 continue.

If that works, that would retain existing behavior where StoreBroker just sits there until the upload completes. If we want to do one better, we could look into what it would take to display true upload status (percentage uploaded/remaining as opposed to just time elapsed), by leveraging SingleTransferContext and registering a ProgressHandler to report back status as the upload occurs.

@HowardWolosky HowardWolosky self-assigned this Apr 28, 2017
HowardWolosky added a commit to hwbackup/StoreBroker that referenced this issue May 1, 2017
This update provides about a 10-15x increase in upload performance by
switching from using the standard Windows Azure Storage API's for
uploading/downloading, to the Azure Storage Data Movement API set,
which is optimized for faster uploads.

That change by itself cut the transfer speeds in half.  This then applies
two other suggested best practices for increasing transfer bandwidth:

    * Increasing the `DefaultConnectionLimit`
      > By default, the .Net HTTP connection limit is 2. This implies that
      > only two concurrent connections can be maintained. It prevents more
      > parallel connections accessing Azure blob storage from your application.

    * Turning off `Expect100Continue`
      > When the property "Expect100Continue" is set to true, client requests
      > that use the PUT and POST methods will add an Expect: 100-continue
      > header to the request and it will expect to receive a 100-Continue
      > response from the server to indicate that the client should send the
      > data to be posted. This mechanism allows clients to avoid sending large
      > amounts of data over the network when the server, based on the request
      > headers, intends to reject the request.
      >
      > However, once the entire payload is received on the server end, other
      > errors may still occur. And if Windows Azure clients have tested the
      > client well enough to ensure that it is not sending any bad requests,
      > clients could turn off 100-continue so that the entire request is sent
      > in one roundtrip. This is especially true when clients send small size
      > storage objects.

Some validation tests were performed with sample sizes of 10:

**Upload of a one-gigabye file**
_Current method_
  * Avg: 205 seconds
  * Median: 196 seconds

_Using DataMovement alone_
  * Avg: 18 seconds
  * Median: 18 seconds

_Using DataMovement with best practices_
  * Avg: 18 seconds
  * Median: 18 seconds

**Upload of a six-gigabye file**
_Current method_
  * Avg: 1413 seconds
  * Median: 1413 seconds
_*Note: This took so much longer that I only tried two samples._

_Using DataMovement with best practices_
  * Avg: 90.6 seconds
  * Median: 91.5 seconds

Because these results were so surprising, I then downloaded the file after uploading it,
and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded
successfully and completely.

Resolves Issue microsoft#47: Add support for faster uploads using Azure Storage Data Movement Library
HowardWolosky added a commit to hwbackup/StoreBroker that referenced this issue May 2, 2017
This update provides about a 10-15x increase in upload performance by
switching from using the standard Windows Azure Storage API's for
uploading/downloading, to the Azure Storage Data Movement API set,
which is optimized for faster uploads.

That change by itself cut the transfer speeds in half.  This then applies
two other suggested best practices for increasing transfer bandwidth:

    * Increasing the `DefaultConnectionLimit`
      > By default, the .Net HTTP connection limit is 2. This implies that
      > only two concurrent connections can be maintained. It prevents more
      > parallel connections accessing Azure blob storage from your application.

    * Turning off `Expect100Continue`
      > When the property "Expect100Continue" is set to true, client requests
      > that use the PUT and POST methods will add an Expect: 100-continue
      > header to the request and it will expect to receive a 100-Continue
      > response from the server to indicate that the client should send the
      > data to be posted. This mechanism allows clients to avoid sending large
      > amounts of data over the network when the server, based on the request
      > headers, intends to reject the request.
      >
      > However, once the entire payload is received on the server end, other
      > errors may still occur. And if Windows Azure clients have tested the
      > client well enough to ensure that it is not sending any bad requests,
      > clients could turn off 100-continue so that the entire request is sent
      > in one roundtrip. This is especially true when clients send small size
      > storage objects.

Some validation tests were performed with sample sizes of 10:

**Upload of a one-gigabye file**
_Current method_
  * Avg: 205 seconds
  * Median: 196 seconds

_Using DataMovement alone_
  * Avg: 18 seconds
  * Median: 18 seconds

_Using DataMovement with best practices_
  * Avg: 18 seconds
  * Median: 18 seconds

**Upload of a six-gigabye file**
_Current method_
  * Avg: 1413 seconds
  * Median: 1413 seconds
_*Note: This took so much longer that I only tried two samples._

_Using DataMovement with best practices_
  * Avg: 90.6 seconds
  * Median: 91.5 seconds

Because these results were so surprising, I then downloaded the file after uploading it,
and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded
successfully and completely.

Resolves Issue microsoft#47: Add support for faster uploads using Azure Storage Data Movement Library
HowardWolosky added a commit that referenced this issue May 2, 2017
This update provides about a 10-15x increase in upload performance by
switching from using the standard Windows Azure Storage API's for
uploading/downloading, to the Azure Storage Data Movement API set,
which is optimized for faster uploads.

That change by itself cut the transfer speeds in half.  This then applies
two other suggested best practices for increasing transfer bandwidth:

    * Increasing the `DefaultConnectionLimit`
      > By default, the .Net HTTP connection limit is 2. This implies that
      > only two concurrent connections can be maintained. It prevents more
      > parallel connections accessing Azure blob storage from your application.

    * Turning off `Expect100Continue`
      > When the property "Expect100Continue" is set to true, client requests
      > that use the PUT and POST methods will add an Expect: 100-continue
      > header to the request and it will expect to receive a 100-Continue
      > response from the server to indicate that the client should send the
      > data to be posted. This mechanism allows clients to avoid sending large
      > amounts of data over the network when the server, based on the request
      > headers, intends to reject the request.
      >
      > However, once the entire payload is received on the server end, other
      > errors may still occur. And if Windows Azure clients have tested the
      > client well enough to ensure that it is not sending any bad requests,
      > clients could turn off 100-continue so that the entire request is sent
      > in one roundtrip. This is especially true when clients send small size
      > storage objects.

Some validation tests were performed with sample sizes of 10:

**Upload of a one-gigabye file**
_Current method_
  * Avg: 205 seconds
  * Median: 196 seconds

_Using DataMovement alone_
  * Avg: 18 seconds
  * Median: 18 seconds

_Using DataMovement with best practices_
  * Avg: 18 seconds
  * Median: 18 seconds

**Upload of a six-gigabye file**
_Current method_
  * Avg: 1413 seconds
  * Median: 1413 seconds
_*Note: This took so much longer that I only tried two samples._

_Using DataMovement with best practices_
  * Avg: 90.6 seconds
  * Median: 91.5 seconds

Because these results were so surprising, I then downloaded the file after uploading it,
and used `fc.exe /b <orig file> <downloaded file>` to validate the file was indeed uploaded
successfully and completely.

Resolves Issue #47: Add support for faster uploads using Azure Storage Data Movement Library
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant