-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API presign multipart upload experimenal support for S3 #7246
Conversation
♻️ PR Preview 8b45119 has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this so quickly! Please add tests, other than that mostly minor comments
Initiates a multipart upload and returns an upload ID with presigned URLs for each part (optional). | ||
Part numbers starts with 1. Each part except the last one has minimum size depends on the underlying blockstore implementation. | ||
For example working with S3 blockstore, minimum size is 5MB (excluding the last part). | ||
responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need 400 too. For example, 'parts' is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the missing 400 status code for all routes.
The SDK will fail the request in case of bad request.
The number of parts for create is optional, in order to support additional API to provide presigned parts URL request for streaming.
required: true | ||
schema: | ||
type: string | ||
- in: query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the path needed when uploadID is given?
Can 2 mpus be available for the same part?
isn't uploadID unique between different paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require path like S3 protocol requires key. Same for upload ID, even though it seems unique, s3 protocol requires bucket and key on each api call.
When we actually work with physical address (which we also require) we use the path for permission check on create and complete.
For each part the request is signed with all the information which include the upload id and the path.
There can be two mpu to the same logical address which means that there can be two parts upload in parallel. In our case because we provide different physical addresses these parts will have different addresses.
If you will try to provide such path to different upload request the complete will not find this part. Hope I understood the concern.
application/json: | ||
schema: | ||
$ref: "#/components/schemas/CompletePresignMultipartUpload" | ||
responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400 is needed. for example. wrong parts order
application/json: | ||
schema: | ||
$ref: "#/components/schemas/AbortPresignMultipartUpload" | ||
responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
400 is needed. For example, uploadID mismatch with path
@@ -609,6 +609,9 @@ func newCache(cfg CacheConfig) cache.Cache { | |||
} | |||
|
|||
func (m *Manager) SetLinkAddress(ctx context.Context, repository *graveler.RepositoryRecord, physicalAddress string) error { | |||
if physicalAddress == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug? If so, can you add tests for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just input validation to make sure we will not store value for empty path.
const DefaultUploadPartSize = MinUploadPartSize | ||
|
||
// DefaultUploadConcurrency is the default number of goroutines to spin up when uploading a multipart upload | ||
const DefaultUploadConcurrency = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it enough? Seems too low for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is the default aws sdk uses - I'll make it configurable for lakectl fs upload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's AWS defaults then it's good enough
pkg/api/helpers/upload.go
Outdated
if uploader != nil { | ||
stats, err = uploader.Upload(ctx) | ||
} else { | ||
stats, err = clientUploadPreSignHelper(ctx, client, repoID, branchID, objPath, metadata, contentType, contents, contentLength) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. uploader
is PreSignUploader
. If it's nil, use clientUploadPreSignHelper
instead. Can you rename vars or use comment to help clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll update the name or integrate them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrated presign upload of single object into uploader functionality.
const MinUploadPartSize int64 = 1024 * 1024 * 5 | ||
|
||
// DefaultUploadPartSize is the default part size to buffer chunks of a payload into | ||
const DefaultUploadPartSize = MinUploadPartSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this optimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are AWS SDK defaults
Co-authored-by: itaiad200 <itaiad200@gmail.com>
Co-authored-by: itaiad200 <itaiad200@gmail.com>
Co-authored-by: itaiad200 <itaiad200@gmail.com>
Co-authored-by: itaiad200 <itaiad200@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only concern is with the upload file different flows test coverage. Given the urgency, it could be handled in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome!
Some comments and also missing crucial tests for the uploader
@@ -253,6 +253,7 @@ type Config struct { | |||
PreSignedExpiry time.Duration `mapstructure:"pre_signed_expiry"` | |||
DisablePreSigned bool `mapstructure:"disable_pre_signed"` | |||
DisablePreSignedUI bool `mapstructure:"disable_pre_signed_ui"` | |||
DisablePreSignedMultipart bool `mapstructure:"disable_pre_signed_multipart"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this under "experimental" or "beta" please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added configuration reference doc line with experimental - didn't want to configure it outside the block adapter.
Let me know if you think we have better place.
return | ||
} | ||
|
||
repo, err := c.Catalog.GetRepository(ctx, repository) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can skip this - BranchExists will already do that for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad it is cached, we need it for BlockAdapter.ResolveNamespace
func (u *presignUpload) initMultipart(ctx context.Context) (*apigen.PresignMultipartUpload, error) { | ||
// adjust part size | ||
u.partSize = DefaultUploadPartSize | ||
if u.size/u.partSize >= int64(MaxUploadParts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of awkward. I think we should enforce that at least (MaxUploadParts + x) * u.partSize > u.size, where x is a constant size.
Otherwise, this can lead to bad performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used the same logic the s3 sdk uses to calculate the part size the client will use. It uses default part size and try to use it until the max number of parts allowed.
In the case the size can't fit the parts it will adjust the size.
The cli usually gives a way to configure the part size, but for now I kept it as the default which is the min size.
@@ -422,6 +431,42 @@ func (a *Adapter) GetPreSignedURL(ctx context.Context, obj block.ObjectPointer, | |||
return req.URL, expiry, nil | |||
} | |||
|
|||
func (a *Adapter) GetPresignUploadPartURL(ctx context.Context, obj block.ObjectPointer, uploadID string, partNumber int) (string, error) { | |||
if a.disablePreSigned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a.disablePreSigned { | |
if a.disablePreSigned || a.disablePreSignedMultipart { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking about this one - if I like to block the presign because of the multipart capability or not. but I kept it bound to the feature it is based on. so if we support presign you can have part presign, in case you need it for some other feature.
added esti tests for presin multipart upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions / requests regarding tests
require.NotNil(t, resp.JSON201) | ||
require.NotEmpty(t, resp.JSON201.UploadId) | ||
require.NotEmpty(t, resp.JSON201.PhysicalAddress) | ||
if tt.parts != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but this code is never reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant for no_parts
case
}) | ||
require.NoError(t, err, "CompletePresignMultipartUpload should succeed") | ||
require.Equalf(t, http.StatusOK, resp.StatusCode(), "CompletePresignMultipartUpload status code mismatch: %s - %s", resp.Status(), resp.Body) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we validate that the object actually exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding
@@ -0,0 +1,246 @@ | |||
package esti |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for u.size / u.partSize >= maxParts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D sorry - I can manually test this one but it will be a big file for a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
OpenAPI s3 presign multipart upload experimental for s3 base on the following proposal:
https://github.com/treeverse/lakeFS/blob/master/design/accepted/openapi-multipart-upload.md
Close #7253