Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-MD5 .... not quite there... #57

Closed
udf2457 opened this issue May 13, 2017 · 6 comments
Closed

Content-MD5 .... not quite there... #57

udf2457 opened this issue May 13, 2017 · 6 comments

Comments

@udf2457
Copy link

udf2457 commented May 13, 2017

re: #51

Sorry to flag this one up again but "-m" does not appear to be working as advertised ?

$ blobporter -q -m -c test -f a -n test
BlobPorter 
Copyright (c) Microsoft Corporation. 
Version: 0.5.02
---------------
Transfer Task: file-blockblob
Files to Transfer:
Source: a Size:2 

The process took 124.574686ms to run.
Throughput: 0.00 MB/s (0.00 Mb/s) 
Configuration: R=24, W=36, DataSize=2KiB (2), Blocks=1
Cumulative Writes Duration: Total=28.659118ms, Avg Per Worker=796.086µs
Retries: Avg=0 Total=0

Yields:

<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://my***account.blob.core.windows.net/" ContainerName="test">
<Blobs>
    <Blob>
        <Name>test</Name>
        <Properties>
            <Last-Modified>Sat, 13 May 2017 09:10:33 GMT</Last-Modified>
            <Etag>0x8D499DFE8F46B23</Etag>
            <Content-Length>2</Content-Length>
            <Content-Type>application/octet-stream</Content-Type>
            <Content-Encoding/>
            <Content-Language/>
            <Content-MD5/>
            <Cache-Control/>
            <Content-Disposition/>
            <BlobType>BlockBlob</BlobType>
            <LeaseStatus>unlocked</LeaseStatus>
            <LeaseState>available</LeaseState>
        </Properties>
    </Blob>
</Blobs>
<NextMarker/>
</EnumerationResults>

As you can see, the Content-MD5 element is empty, which would not be the case if you were really sending MD5s ....

@udf2457
Copy link
Author

udf2457 commented May 13, 2017

I suspect what might be happening is:

  • You are (hopefully) sending Content-MD5 when sending blocks via Put Block
  • You are (hopefully) sending Content-MD5 when sending Put Block List
  • You are (probably) forgetting x-ms-blob-content-md5 when sending Put Block List
  • The same applies for Put Blob

From the docs (https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list):

x-ms-blob-content-md5: Optional. An MD5 hash of the blob content. Note that this hash is not validated, as the hashes for the individual blocks were validated when each was uploaded.

The Get Blob operation returns the value of this header in the Content-MD5 response header.

If this property is not specified with the request, then it is cleared for the blob if the request is successful.

@giventocode
Copy link
Contributor

giventocode commented May 13, 2017

Thanks for the follow up. BlobPorter uses blocks for all the transfers, this is what allows the high level of concurrency and maximizes throughput. The current implementation calculates a block level MD5 (option -m), which addresses the concern of data integrity during transfer - this value is validated by the storage backend.

As you've pointed out, the content md5 (whole blob) is not validated by Azure storage. Considering this and the fact that computing a blob wide MD5 would be a resource intensive pre-processing step (must be done sequentially and prior to the transfer), little value would be provided while affecting the overall transfer time.

As an alternative, we are considering an approach where you can pre-calculate the MD5, using the tool of your own choosing, prior to the transfer and then you can pass it to BlobPorter. In effect, treating this value as a metadata item -which is what, technically, this value becomes when it is not validated by the backend.

@udf2457
Copy link
Author

udf2457 commented May 13, 2017

So basically keep -m and add an additional metadata parameter for the whole-blob version ? Sounds fair enough.

@giventocode
Copy link
Contributor

Correct, where the whole-blob version will be calculated outside blobporter -e.g. $md5sum file.

@giventocode
Copy link
Contributor

Updating this issue to point to this project that addresses the gap of not been able to calculate the MD5 hash for multi-block blobs https://github.com/giventocode/azure-blob-md5

@ankku
Copy link

ankku commented Oct 6, 2020

just to clarify here, i am using logic app and have called this action get metadata for blob and I get the properties :

{
"Id": "Jtestakajsdjas==",
"Name": "test.json",
"DisplayName": "test.json",
"Path": "/resistor-v3/test.json",
"LastModified": "test",
"Size": 2480,
"MediaType": "application/octet-stream",
"IsFolder": false,
"ETag": """",
"FileLocator": "testskjhska",
"LastModifiedBy": null
}

I don't see content MD5 property in here, though i can see it when I go to my blob and right click to properties. Is this a default behavior of logic app?? how can i get the ContentMD5 property

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants