Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 storage truncated to 250000MB per object without error #10775

Open
walt-jones opened this issue Dec 17, 2024 · 6 comments
Open

S3 storage truncated to 250000MB per object without error #10775

walt-jones opened this issue Dec 17, 2024 · 6 comments
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug

Comments

@walt-jones
Copy link

walt-jones commented Dec 17, 2024

We're facing an issue where object storage on S3 is being truncated to exactly 262144000000 bytes (strangely 250,000MB) without error. oCIS takes the upload without issue, processes it, pushes to S3 and then complains when you try and download it because the metadata doesn't match. We're storing everything on Wasabi S3 which has a single object limit of 5TB so no issues there.

We're running oCIS 6.6.1 and were hitting the same issue on oCIS 6.2.0 although there we also hit the nats max payload exceeded error mentioned in #10377 which we believed was causing the problem.

Error on retrieval attempt is:
2024-12-17T01:57:43.948902080Z 2024-12-17T01:57:43Z ERR unexpected error error="Decomposedfs: error download blob '4c6b7e96-beee-4495-929c-20a93b9127fd': blob has unexpected size. 480836121225 bytes expected, got 262144000000 bytes" action=download handler=download line=/ocis/vendor/github.com/cs3org/reva/v2/pkg/rhttp/datatx/utils/download/download.go:263 pkg=rhttp request-id=e39b8b312288/mNXHNfpjsK-238118 service=storage-users svc=datatx traceid=ab51ef8b716878b77617c6151ba2d079

Upload for the same appears clean in the logs:
2024-12-16T19:20:00.956073270Z 2024-12-16T19:20:00Z INF user idp:"internal" opaque_id:"268dba74-6565-4481-97c8-4de03c27f989" type:USER_TYPE_PRIMARY authenticated line=/ocis/vendor/github.com/cs3org/reva/v2/internal/grpc/services/authprovider/authprovider.go:145 pkg=rgrpc service=storage-system traceid=e168080f4eb93a03bf17f8578dfcd1bc 2024-12-16T19:20:00.956696520Z 2024-12-16T19:20:00Z INF file download data-server=http://localhost:9216/data line=/ocis/vendor/github.com/cs3org/reva/v2/internal/grpc/services/storageprovider/storageprovider.go:296 pkg=rgrpc ref={"path":"./users/<redacted>/received.json","resource_id":{"opaque_id":"jsoncs3-share-manager-metadata","space_id":"jsoncs3-share-manager-metadata"}} service=storage-system traceid=e168080f4eb93a03bf17f8578dfcd1bc 2024-12-16T19:20:00.957223042Z 2024-12-16T19:20:00Z WRN http end="16/Dec/2024:19:20:00 +0000" host=127.0.0.1 line=/ocis/vendor/github.com/cs3org/reva/v2/internal/http/interceptors/log/log.go:112 method=GET pkg=rhttp proto=HTTP/1.1 service=storage-system size=0 start="16/Dec/2024:19:20:00 +0000" status=404 time_ns=178630 traceid=cc76f6a5d122311e3675d3edbe02018b uri=/data/spaces/jsoncs3-share-manager-metadata%21jsoncs3-share-manager-metadata/users/<redacted>/received.json url=/users/<redacted>/received.json 2024-12-16T19:20:00.968318716Z 2024-12-16T19:20:00Z INF file upload data-server=http://localhost:9158/data/simple/a5c4450e-9dfe-4d7d-ac1a-83bfbe1791c9 fn=./fileName.7z line=/ocis/vendor/github.com/cs3org/reva/v2/internal/grpc/services/storageprovider/storageprovider.go:470 pkg=rgrpc service=storage-users traceid=97bca1a34c6af966a577fb6b357d109b xs="map[md5:100 unset:1000]" 2024-12-16T19:20:00.968457815Z 2024-12-16T19:20:00Z INF file upload data-server=http://localhost:9158/data/tus/a5c4450e-9dfe-4d7d-ac1a-83bfbe1791c9 fn=./fileName.7z line=/ocis/vendor/github.com/cs3org/reva/v2/internal/grpc/services/storageprovider/storageprovider.go:470 pkg=rgrpc service=storage-users traceid=97bca1a34c6af966a577fb6b357d109b xs="map[md5:100 unset:1000]" 2024-12-16T19:20:00.969169589Z 2024-12-16T19:20:00Z INF access-log bytes=0 duration=36.192464 line=/ocis/services/proxy/pkg/middleware/accesslog.go:34 method=POST path=/remote.php/dav/files/markc@cayc.io proto=HTTP/1.1 remote-addr=10.0.0.2 request-id=e39b8b312288/mNXHNfpjsK-140447 service=proxy status=201 traceid=97bca1a34c6af966a577fb6b357d109b 2024-12-16T19:20:01.072372965Z 2024-12-16T19:20:01Z INF skipping auth check for: /data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNzM0NDYzMjAwLCJpYXQiOjE3MzQzNzY4MDAsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9hNWM0NDUwZS05ZGZlLTRkN2QtYWMxYS04M2JmYmUxNzkxYzkifQ.r03a60qp5_BC7Af4TkB_DZdRGLZnst0gI6qD-qZQZaY line=/ocis/vendor/github.com/cs3org/reva/v2/internal/http/interceptors/auth/auth.go:195 pkg=rhttp service=frontend traceid=41d6ad1d2b93b973d88321a9a5cc5db1 2024-12-16T19:20:01.072401034Z 2024-12-16T19:20:01Z WRN core access token not set line=/ocis/vendor/github.com/cs3org/reva/v2/internal/http/interceptors/auth/auth.go:248 pkg=rhttp service=frontend traceid=41d6ad1d2b93b973d88321a9a5cc5db1

@micbar micbar added the Priority:p1-urgent Consider a hotfix release with only that fix label Dec 17, 2024
@micbar micbar moved this from Qualification to Prio 1 in Infinite Scale Team Board Dec 17, 2024
@jvillafanez
Copy link
Member

jvillafanez commented Dec 17, 2024

Based on https://docs.wasabi.com/docs/how-does-wasabi-handle-multipart-uploads , the base implementation should follow https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

In particular, there is a hard limit of 10000 parts per multipart upload. The default part size is 16MB. Assuming you have configured the part size to 25MB, that would give us 262.144.000.000 bytes which is what you got.

As far as I know, it's impossible to modify the maximum number of parts (10000), so the only option to upload bigger files is to increase the part size. Also note that the maximum part size allowed is 5GB (5 * 1024 * 1024 * 1024)

Disabling concurrent uploads (setting STORAGE_USERS_S3NG_PUT_OBJECT_CONCURRENT_STREAM_PARTS = false) should show the expected error.

@mmattel
Copy link
Contributor

mmattel commented Dec 17, 2024

I will add the parts * part_size to the admin docs

@walt-jones
Copy link
Author

Thanks @mmattel and @jvillafanez - increasing the part size did the trick and allowed us to get the file through successfully.

Remaining concern: why doesn't oCIS fail or at least log an error when the total upload size exceeds known limits?

@jvillafanez
Copy link
Member

From our side, it's true that we aren't checking any limitation ourselves (at least for now), so we basically send the whole file with the part size to the underlying library (minio-go). It's the library the one that cuts the file into pieces and does the hard work, but it also fails to forward the error in that specific scenario.
In the end, the library doesn't return an error for the upload, so oCIS assumes the upload went through without any issue.

@mmattel
Copy link
Contributor

mmattel commented Dec 18, 2024

@walt-jones I have updated the admin docs to describe the parts number / size situation including calculation examples.

A dev relevant summary is described in issue #10780

@walt-jones
Copy link
Author

Great to have https://doc.owncloud.com/ocis/next/deployment/storage/s3.html#prevent-failing-uploads updated with details on the limits.

Regarding #10780, it would be incredibly helpful to have oCIS throw an error if it knows from current settings that the file push to S3 won't succeed, even if minio-go isn't passing along errors from its side. For example, if we know there's a hard limit of 10,000 parts and STORAGE_USERS_S3NG_PUT_OBJECT_PART_SIZE=1000000 then we know with absolute certainty that all files larger than around 9GB will be truncated and fail on retrieval. The impact of this issue increases with the size of the file (can take many hours to upload and check whether it's succeeding) and there's nothing in any logs that tell you anything went wrong aside from the "blob has unexpected size" when you try and access the file later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug
Projects
Status: Prio 1
Development

No branches or pull requests

4 participants