Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store some per-file metadata (rclone multipart upload fails on unexpected ETag) #2486

Closed
arielshaqed opened this issue Sep 22, 2021 · 4 comments

Comments

@arielshaqed
Copy link
Contributor

(Originally reported in #2445).

When uploading a file using rclone and multipart upload, rclone computes an MD5 checksum of the file contents, then runs HeadObject on the assembled file and expects to receive that as the ETag. Since the two are different, it fails.

Example (with 10M of random data in /tmp/x):

$ rclone copyto --dump headers,auth --s3-upload-cutoff 1M /tmp/x local-lake:/moo/main/2445/xyzzy
[... lots and lots of headers and retries and stuff...]
2021/09/22 15:19:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/22 15:19:23 DEBUG : HTTP REQUEST (req 0xc0000ffc00)
2021/09/22 15:19:23 DEBUG : HEAD /main/2445/xyzzy HTTP/1.1
Host: moo.s3.local.lakefs.io:8000
User-Agent: rclone/v1.56.0
Authorization: AWS4-HMAC-SHA256 Credential=AKIAIOSFODNN7EXAMPLE/20210922/eu-central-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=47187765273b205a2bac70a19e3660f625e9f63e87fe821db6955bf9ba9fb098
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20210922T121923Z

2021/09/22 15:19:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/22 15:19:23 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/22 15:19:23 DEBUG : HTTP RESPONSE (req 0xc0000ffc00)
2021/09/22 15:19:23 DEBUG : HTTP/1.1 200 OK
Content-Length: 31457280
Accept-Ranges: bytes
Date: Wed, 22 Sep 2021 12:19:23 GMT
Etag: "fc27755c7b393a94b8728fa8a959f837"
Last-Modified: Wed, 22 Sep 2021 12:19:23 GMT
X-Amz-Request-Id: 801dd44f-89c5-467c-8c51-6067f8178b95

2021/09/22 15:19:23 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/22 15:19:23 DEBUG : x: md5 = d02d960cdb44ac61855cea101652266b (Local file system at /tmp)
2021/09/22 15:19:23 DEBUG : xyzzy: md5 = fc27755c7b393a94b8728fa8a959f837 (S3 bucket moo path main/2445)
2021/09/22 15:19:23 ERROR : xyzzy: corrupted on transfer: md5 hash differ "d02d960cdb44ac61855cea101652266b" vs "fc27755c7b393a94b8728fa8a959f837"
2021/09/22 15:19:23 INFO  : xyzzy: Removing failed copy
2021/09/22 15:19:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/22 15:19:23 DEBUG : HTTP REQUEST (req 0xc000460d00)
2021/09/22 15:19:23 DEBUG : DELETE /main/2445/xyzzy HTTP/1.1
Host: moo.s3.local.lakefs.io:8000
User-Agent: rclone/v1.56.0
Authorization: AWS4-HMAC-SHA256 Credential=AKIAIOSFODNN7EXAMPLE/20210922/eu-central-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=86aa71d3655f29649089da72ca1eb29d2ee902723321fc398b22e1cc55cd3034
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20210922T121923Z
Accept-Encoding: gzip

2021/09/22 15:19:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/22 15:19:23 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/22 15:19:23 DEBUG : HTTP RESPONSE (req 0xc000460d00)
2021/09/22 15:19:23 DEBUG : HTTP/1.1 204 No Content
Content-Type: application/xml
Date: Wed, 22 Sep 2021 12:19:23 GMT
X-Amz-Request-Id: 99370428-f998-4f71-8944-2f8b31cea1a3

2021/09/22 15:19:23 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/22 15:19:23 ERROR : Attempt 3/3 failed with 1 errors and: corrupted on transfer: md5 hash differ "d02d960cdb44ac61855cea101652266b" vs "fc27755c7b393a94b8728fa8a959f837"
2021/09/22 15:19:23 INFO  : 
Transferred:           90Mi / 90 MiByte, 100%, 898.564 KiByte/s, ETA 0s
Errors:                 1 (retrying may help)
Elapsed time:      2m38.4s

2021/09/22 15:19:23 DEBUG : 12 go routines active
2021/09/22 15:19:23 Failed to copyto: corrupted on transfer: md5 hash differ "d02d960cdb44ac61855cea101652266b" vs "fc27755c7b393a94b8728fa8a959f837"

$ md5sum /tmp/x
d02d960cdb44ac61855cea101652266b  /tmp/x
@arielshaqed
Copy link
Contributor Author

This expectation of rclone is very strange. AWS documentation explicitly says:

Your complete multipart upload request must include the upload ID and a list of both part numbers and corresponding ETag values. The Amazon S3 response includes an ETag that uniquely identifies the combined object data. **This ETag is not necessarily an MD5 hash of the object data. **

(Bold text mine...)

@arielshaqed
Copy link
Contributor Author

arielshaqed commented Sep 22, 2021

Here's how rclone even gets the MD5 value. When I run the same command against an AWS S3 bucket, the "complete multipart upload" POST looks like this:

2021/09/22 16:02:01 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/22 16:02:01 DEBUG : HTTP REQUEST (req 0xc000874400)
2021/09/22 16:02:01 DEBUG : POST /moo/main/2445/xyzzy?uploads= HTTP/1.1
Host: treeverse-ariels-test.s3.eu-central-1.amazonaws.com
User-Agent: rclone/v1.56.0
Content-Length: 0
Authorization: AWS4-HMAC-SHA256 Credential=AKIAxyzzyfoobarbazquux/20210922/eu-central-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-meta-md5chksum;x-amz-meta-mtime;x-amz-server-side-encryption, Signature=f00b44f00b44...
Content-Type: application/octet-stream
X-Amz-Acl: bucket-owner-full-control
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20210922T130201Z
X-Amz-Meta-Md5chksum: 0C2WDNtErGGFXOoQFlImaw==
X-Amz-Meta-Mtime: 1632302729.927695526
X-Amz-Server-Side-Encryption: aws:kms
Accept-Encoding: gzip

The relevant field is X-Amz-Meta-Md5chksum: 0C2WDNtErGGFXOoQFlImaw==. This is base64 encoded, so let's decode it...

$ echo 0C2WDNtErGGFXOoQFlImaw== | base64 -d | hexdump -C
00000000  d0 2d 96 0c db 44 ac 61  85 5c ea 10 16 52 26 6b  |.-...D.a.\...R&k|
00000010

There is no documentation for this field, but rclone seems to like it.

Updated: This field is added by rclone. This bug may be something of a dupe of #2296: it needs to store some per-file header value metadata.

@arielshaqed arielshaqed changed the title rclone multipart upload fails on unexpected ETag Store some per-file metadata (rclone multipart upload fails on unexpected ETag) Sep 22, 2021
@arielshaqed
Copy link
Contributor Author

arielshaqed commented Nov 18, 2021

@nopcoder can we mark this fixed?

@nopcoder
Copy link
Contributor

guess I missed it. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants