-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large multipart uploads with aws cli to GCS fail #330
Comments
How large is the object you are uploading and what are the part sizes/number of parts? Also can you share any relevant logs when running S3Proxy with trace-level logging? |
Sorry, the uploaded files come from a db dump that takes ~15M to do and then disappears with the container. So I can't easily get you exact numbers. I can tell you that the managed SQL DB that is the source of the dump has a SSD that is only 31GB large. The The upload is done by |
I can however confirm s3proxy connected to GCP (and GCP's own S3 backend) do work with |
AWS CLI defaults to an 8 MB part size so a 6 GB object would be 750 parts. GCS natively supports only 32 parts. Can you try changing the value of multipart-chunksize to something larger, e.g., 1 GB? This should work around the symptoms. I think S3Proxy could be changed to do something more complicated by using GCS ability to recursively combine sets of 32 parts, although this would take some effort. |
Hmmm, that GCP limitation sounds awful. Why can't everyone just agree on one standardized API for cloud file storage? I tried changing the I don't know how to debug what chunks aws-cli sent so I can't verify whether this ended up under the limit and indicates a different issue or my test was just faulty. |
This recursively combines up to 32 sets of 32 parts, allowing 1024 part multipart uploads. Fixes #330.
@dantman Agree that multiple protocols are painful for users. Many of the non-S3 implementations have added either partial or full support for S3 so this situation is improving. While I work for Google, I have no relationship with Google Cloud so I recommended giving them feedback directly, by Twitter or otherwise. GCS does offer S3-compatible access but it does not support MPU at all. You might be able to configure your application to not use multipart upload since GCS S3 supports objects greater than 5 GB. I am confused why changing the chunk size did not work and you might want to debug this a further. I spent a few hours looking into recursively combining objects in S3Proxy to work around this limitation and you can test #333. This needs a little more work before I merge it but I would appreciate if you could give feedback. |
This recursively combines up to 32 sets of 32 parts, allowing 1024 part multipart uploads. Fixes #330.
This recursively combines up to 32 sets of 32 parts, allowing 1024 part multipart uploads. Fixes #330.
This recursively combines up to 32 sets of 32 parts, allowing 1024 part multipart uploads. Fixes #330.
This recursively combines up to 32 sets of 32 parts, allowing 1024 part multipart uploads. Fixes #330.
I was using a docker image (mysql-backup) that uploads to a "S3" backend using the aws cli. Trying to use it with s3proxy connected to GCP as the upload destination fails with the following error.
An error occurred (BadDigest) when calling the CompleteMultipartUpload operation
More details from the log (includes the shell commands executed and full cli response)
I assume a similar error could be triggered by configuring s3 to use GCP, trying to use the aws s3 cli to to and upload, and using the
multipart_threshold
to force the cli to do multipart uploads for smaller files.I also presume GCP and Amazon have different interpretations of how hash digests of multipart uploads should work.
The text was updated successfully, but these errors were encountered: