-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/s3: try to guess size to enable multipart uploads #617
Conversation
I think we had some discussion offline on this. Definitely this PR is wrong as it will panic if io.Reader is not os.File. So we have 3 options:
As the 3rd option is most likely long and potentially not possible due to s3 API limitations and taking into consideration fact that in most cases we have os.File passed as reader I think option 1 makes sense... Option 2 is too hidden and it makes assumptions that are easy to miss. Thoughts @domgreen? |
e40f6cb
to
bec24ab
Compare
I went with option 1. I guess that should help with #467 @jojohappy ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @jojohappy would be unblocked.
It's interface change so it's quite major - affects all implementations, let's get some feedback from more people before merging.
From my side LGTM, just minor nits.
pkg/objstore/objstore.go
Outdated
fileSize := int64(-1) | ||
fileInfo, err := r.Stat() | ||
if err != nil { | ||
return errors.Wrapf(err, "Could not stat file %s", src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors should not be capitalized (as they will be wrapped)
pkg/objstore/objstore.go
Outdated
fileInfo, err := r.Stat() | ||
if err != nil { | ||
return errors.Wrapf(err, "Could not stat file %s", src) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this else as on error we reurn
pkg/objstore/swift/swift.go
Outdated
@@ -154,7 +154,7 @@ func (c *Container) IsObjNotFoundErr(err error) bool { | |||
} | |||
|
|||
// Upload writes the contents of the reader as an object into the container. | |||
func (c *Container) Upload(ctx context.Context, name string, r io.Reader) error { | |||
func (c *Container) Upload(ctx context.Context, name string, r io.Reader, sizeBytes int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testutil.Ok(t, bkt.Upload(context.Background(), "id1/obj_2.some", testDataReader2, testDataReader2.Size())) | ||
testutil.Ok(t, bkt.Upload(context.Background(), "id1/obj_3.some", testDataReader3, testDataReader3.Size())) | ||
testutil.Ok(t, bkt.Upload(context.Background(), "id2/obj_4.some", testDataReader4, testDataReader4.Size())) | ||
testutil.Ok(t, bkt.Upload(context.Background(), "obj_5.some", testDataReader5, testDataReader5.Size())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is a good test to actually try to... put wrong size (too small or too big) to ensure we have a knowledge how each provider handles this scenario.. What do you think? Might be not worthy.
pkg/objstore/inmem/inmem.go
Outdated
@@ -124,7 +124,7 @@ func (b *Bucket) Exists(_ context.Context, name string) (bool, error) { | |||
} | |||
|
|||
// Upload writes the file specified in src to into the memory. | |||
func (b *Bucket) Upload(_ context.Context, name string, r io.Reader) error { | |||
func (b *Bucket) Upload(_ context.Context, name string, r io.Reader, sizeBytes int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pkg/objstore/gcs/gcs.go
Outdated
@@ -124,7 +124,7 @@ func (b *Bucket) Exists(ctx context.Context, name string) (bool, error) { | |||
} | |||
|
|||
// Upload writes the file specified in src to remote GCS location specified as target. | |||
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader) error { | |||
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader, sizeBytes int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we can reuse this knowledge, but you can leave that to others. Just please use _ for arg name that is not used.
pkg/objstore/azure/azure.go
Outdated
@@ -211,7 +211,7 @@ func (b *Bucket) Exists(ctx context.Context, name string) (bool, error) { | |||
} | |||
|
|||
// Upload the contents of the reader as an object into the bucket. | |||
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader) error { | |||
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader, sizeBytes int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vglafirov is this something we can use in azure?
If not @adrien-f let's make it _
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
But I shouldn't know the size of Reader. Maybe the client should be handled that will be the best solution. Just my opinion.
Anyway, thank you for your PR. 👍
What do you mean? |
@bwplotka sorry for my poor English, |
Well but for COS you need size as well? Otherwise it does not work either: https://github.com/improbable-eng/thanos/pull/467/files#r209909773 |
Yes, actually the 1st option will solve the problem now and make sence. I accept it and will ask the cos-client maintainer to not make the size sensitively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree option 1 is better than the other options at the moment, my main concern is extending the interface with extra parameters for the func that are not used by other implementations. Doesnt feel to clean tbh.
If we start adding more parameters we would want to make this into an uploadOpts
that deal with all these extra pesky options for different providers, but lets wait till that becomes an issue.
Would be good to ensure that the maintainers of each provider know of this change and if it optimises uploads for them make correlating PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other idea was to cast it to a *File
like in the COS implementation but again that doesnt feel like the best solution to the problem due recasting etc. not sure how much overhead this has in go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to specify this size on the upload interface. I feel attempting to cast to a file and or doing an intermediate buffer copy might be better than leak this into the interface.
pkg/objstore/objstore.go
Outdated
@@ -83,7 +83,14 @@ func UploadFile(ctx context.Context, logger log.Logger, bkt Bucket, src, dst str | |||
} | |||
defer runutil.CloseWithLogOnErr(logger, r, "close file %s", src) | |||
|
|||
if err := bkt.Upload(ctx, dst, r); err != nil { | |||
fileSize := int64(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but this initialization doesn't seem to be necessary as the Stat
call must succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's remove it
The COS implementation does this here https://github.com/improbable-eng/thanos/pull/467/files#r209909773 |
It's not about cost of type assertion - it's marginal. It's about fact that provider has different logic or even fails if the passed io.Reader is NOT os.File. The fact that implmentation assumes So in future someone might add usage that will wrap More details why this problem can be categorized as abstraction leak: https://medium.com/@fagnerbrack/repairing-the-leaky-abstraction-e726baab91b5 So there is no other way than either:
For example, |
@stefan-improbable Why is that better? Also passing Also, implementation does not need |
I think we should consult the minio people, as to me it seems that this should be possible without a change in the interface, looking at their internal handling, this should actually work, as it should attempt to perform multi part on a best effort basis: https://github.com/minio/minio-go/blob/70799fe8dae6ecfb6c7d7e9e048fce27f23a1992/api-put-object.go#L157 (mind you I'm well jetlagged right now so may very well be missing something) |
While digging this issue, I looked into this piece of code I got to https://github.com/minio/minio-go/blob/70799fe8dae6ecfb6c7d7e9e048fce27f23a1992/api-put-object.go#L184 then https://github.com/minio/minio-go/blob/2508a98a820e993b0307bbac4db9731e4e6f89a0/api-put-object-common.go#L72 that goes with this:
The following issue minio/minio-go#848 (comment) propose an alternative solution that involves manually uploading the blob parts by parts. Adding such a relative complexity to Thanos felt unneeded. |
Passing reader and length seems a bit weird, but I don't have any other argument. Possibly just that all callers will have to calculate size as opposed to it being calculated in the one AWS-specific function. As for the accept-interfaces-and-return-structs, you can open the file for reading/writing only and test it using ioutil.TempFile. |
67d62d5
to
876e122
Compare
Any last minute changes I should make ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @adrien-f there are minor outstanding comment.
So this is difficult decision. I totally agree with @brancz here that even if some magic is needed from our side to use mino library in bit more complicated way that deduce what upload operation it should use - it is worth in long term.
Additionally again, COS Tencent is depending on lenght as well: #467 and this might be something that @jojohappy could help us with longterm to put issues on https://github.com/mozillazg/go-cos (we don't know Chineese well) to make that happen/figure out how to use the library without pre known lenght.
I see this change as quick mitigation that does not hurt us in mid term but we should aim for having bucket providers that support working without known lenght for the sake of good patterns and need for streamed upload in future.
pkg/objstore/objstore.go
Outdated
@@ -22,7 +22,7 @@ type Bucket interface { | |||
BucketReader | |||
|
|||
// Upload the contents of the reader as an object into the bucket. | |||
Upload(ctx context.Context, name string, r io.Reader) error | |||
Upload(ctx context.Context, name string, r io.Reader, sizeBytes int64) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's comment properly here, why this has size and reader interface. Worth to mention PR #617 as we have reference discussion why etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add a comment saying TODO(PR #617): Consider removing requirement on pre-known length when all providers will support this.
pkg/objstore/objstore.go
Outdated
@@ -83,7 +83,14 @@ func UploadFile(ctx context.Context, logger log.Logger, bkt Bucket, src, dst str | |||
} | |||
defer runutil.CloseWithLogOnErr(logger, r, "close file %s", src) | |||
|
|||
if err := bkt.Upload(ctx, dst, r); err != nil { | |||
fileSize := int64(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's remove it
@adrien-f Can you address small comments and change title/commit title to exclude |
@adrien-f And we need CHANGELOG maybe as this loosens our requirements towards client provider implementations (if someone wants to add one)? |
Does the update of the COS API effect this discussion? #679 I am happy to add the |
Re To further followup on my previous comment now it is only minio that needs the size ... maybe we treat this as the special case and do the casting? Sorry to keep throwing the cat amongst the pigeons. |
I have make a test and confirmed that the new version of COS provider had been supported uploading files without the length. |
876e122
to
3d55401
Compare
I made the last requested changes I think. |
Oh boy, COS news means that this interface change is indeed weird. Thanks for doing all this work @adrien-f and I feel like in this case casting to os.File and warn if we cannot do it so Even if we fail to cast it s3 will still work. The COS was different as this was a blocker for it (but not anymore) Thanks all for valuable discussion! (: |
3d55401
to
5c35b8e
Compare
There you go ! |
pkg/objstore/s3/s3.go
Outdated
// TODO(PR #617): Consider removing requirement on pre-known length when all providers will support this. | ||
fileInfo, err := r.(*os.File).Stat() | ||
if err != nil { | ||
return errors.Wrapf(err, "could not stat local file %s", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we agreed for best effort logic, so we need warning instead I think, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I really need holidays 🤕
5c35b8e
to
66093ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Adrien Fillon adrien.fillon@cdiscount.com
I noticed an issue where a S3 multipart upload would fail with a
413 Entity Too Large
error. After investigating, I noticed we were not sending the size of the buffer/chunk/file/... to the minio-go library and that would cause the multipart upload to be a single part upload.From what I've seen, the
Upload
method is only called with aos.File
object : https://github.com/improbable-eng/thanos/blob/a09d41dd6534f28365eede43fd024fed3dcc0750/pkg/objstore/objstore.go#L80I'm currently not satisfied with what I did but it works ! I open this PR so we can figure out the best way to handle this. In case of real buffers, to compute their size I would need to duplicate it in memory from what I read.
Changes
Verification
For your information, we use Ceph as our S3 provider.