-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a MultipartUploadStream IO object #46
Conversation
@Drvi I pushed a first version of the |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #46 +/- ##
==========================================
+ Coverage 83.13% 84.21% +1.07%
==========================================
Files 7 7
Lines 587 646 +59
==========================================
+ Hits 488 544 +56
- Misses 99 102 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Nantia and thanks for the PR. I think the implementation would work, but it will probably be noticeably slower than CloudStore.put
because we don't attempt any parallelism here. Even though I think your main concern was memory overhead, it shouldn't be that hard to get even faster than CloudStore.put
since we should be able to overlap arrow serialization and uploading.
An approach inspired by PrefetchedDownloadStream
would be:
a) When creating MultipartUploadStream
, you'd also spawn multiple tasks
b) The spawned tasks would try to take buffers from a new Channel field we'd add to the MultipartUploadStream
, and upload them
c) When we call Base.write(io::MultipartUploadStream, bytes)
we put bytes
to the Channel and let the tasks handle it.
d) the close
method would have to block until all submitted buffers were successfully uploaded and shutdown the tasks.
For bonus points, once we're done uploading a buffer, we'd give it back to the caller, maybe to a user-provided channel, so you could safely reuse them.
Hi @Drvi, thanks for the feedback! It's very helpful. |
@nantiamak Ah, good point. I think the design space is a bit larger than I initially thought:) A couple of options How many tasks to spawn?
and we should target each buffer being 8MiB (
Yes, so for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Nantia, exciting progress! Left a couple of concurrency related notes. Thanks for working on this!
src/object.jl
Outdated
if x.ntasks == 0 | ||
break | ||
end | ||
wait(x.cond_wait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think before we check ntasks
values, we should confirm that we're in a good state. If an upload task failed for whatever reason before it decremented here and if it notified before we grabbed the lock, we'd wait forever. The upload task grabs the lock to error notify -- if you in the same block also set some state that singals error, then it would be safe, because here in close
a) we didn't held the lock, so we'd wait for the upload task to error notify and set the error state, then we'd grab the lock and checked the error state
b) we did held the lock, so when we'd get to wait
and the upload task would succesfully error-notify us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an is_error
flag in MultipartUploadStream, which I'm setting to true
here, when catching an exception. I'm looking for a test case checking this code path. Any thoughts?
src/object.jl
Outdated
end | ||
end | ||
# atomically increment our part counter | ||
@atomic io.cur_part_id += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we increment after we call put!
, I think it could happen that two tasks could reach the put!
call with the same value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I've seen this happening. On the other hand if we increment before put!
two tasks might have incremented io_cur_part_id
before reaching put!
and io.sync.i
, which is incremented inside put!
will be != io.cur_part_id` resulting in a deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this by assigning io.cur_part_id
to a variable and passing it as a parameter to _upload_task
rather than getting its value from io.cur_part_id
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, making this the callers problem seems like a nice solution to me 👍
CI/Julia 1.6 is failing with |
@Drvi About your comment "we should target each buffer being 8MiB (MULTIPART_SIZE)", the buffer is currently constructed outside |
Hey @nantiamak, sorry for the delay.
You can see e.g. in the
I meant in our usage code we should target cca 8MiB chunked to be given to the |
@Drvi Thanks for the pointer! |
@Drvi Could you please take another look on this PR to see if we can merge it? |
@nantiamak Sorry for the late reply, in short:
Also, could you try larger files for the benchmark, say 700MiB, and use a semaphore? |
notify(io.cond_wait) | ||
end | ||
catch e | ||
isopen(io.upload_queue) && close(io.upload_queue, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Drvi Could you help me with adding a test case that triggers an error during uploading and covers these lines? I tried with uploading a part that was smaller than the minimum S3 size and got the following error, but it didn't exercise this catch block. 🤔
HTTP.Exceptions.StatusError(400, "POST", "/jl-minio-23869/test.csv?uploadId=7e57586f-e75d-4cf6-a14e-f480ebe655cd", HTTP.Messages.Response:
"""
HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Cache-Control: no-cache
Content-Length: 364
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin, Accept-Encoding
X-Accel-Buffering: no
X-Amz-Request-Id: 179736FE62556FB8
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Mon, 13 Nov 2023 15:04:10 GMT
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>EntityTooSmall</Code><Message>Your proposed upload is smaller than the minimum allowed object size.</Message><Key>test.csv</Key><BucketName>jl-minio-23869</BucketName><Resource>/jl-minio-23869/test.csv</Resource><RequestId>179736FE62556FB8</RequestId><HostId>7b2b4a12-8baf-4d22-bc30-e4c3f2f12bff</HostId></Error>""")
Stacktrace:
[1] (::HTTP.ConnectionRequest.var"#connections#4"{HTTP.ConnectionRequest.var"#connections#1#5"{HTTP.TimeoutRequest.var"#timeouts#3"{HTTP.TimeoutRequest.var"#timeouts#1#4"{HTTP.ExceptionRequest.var"#exceptions#2"
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you had a good idea, try writing a small file and then testing, that the channel is closed and that the exp field is populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but when the errors happens I'm getting Exiting on signal: TERMINATED
and the test terminates before I get to check the channel and the exception.
src/object.jl
Outdated
Base.@lock io.cond_wait begin | ||
while true | ||
io.ntasks == 0 && !io.is_error && break | ||
if io.is_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Drvi I added this if
check here to account for the case when the upload task has error notified before we enter wait()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need is_error
field, checking for !isnothing(io.exc)
. Then you can just check for error first and only if we don't throw, you'd io.ntasks == 0 && break
src/object.jl
Outdated
Base.@lock io.cond_wait begin | ||
while true | ||
io.ntasks == 0 && !io.is_error && break | ||
if io.is_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need is_error
field, checking for !isnothing(io.exc)
. Then you can just check for error first and only if we don't throw, you'd io.ntasks == 0 && break
notify(io.cond_wait) | ||
end | ||
catch e | ||
isopen(io.upload_queue) && close(io.upload_queue, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you had a good idea, try writing a small file and then testing, that the channel is closed and that the exp field is populated.
end | ||
end | ||
catch e | ||
rethrow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing a way to signal to S3/Blobs that we're aborting the multi-part upload a la AbortMultipartUpload... can you open an issue about it?
Also, I think we should consider the following syntax to the user
MultipartUploadStream(...) do io
...
end
something like
function MultipartUploadStream(f::Function, args...; kwargs...)
io = MultipartUploadStream(args...; kwargs...)
try
f(io)
wait(io)
close(io)
catch e
abort(io, e) # todo, we don't have this yet
rethrow()
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an issue for adding more low-level functionality regarding multipart uploads including list parts and and list parts that are in progress, which are related to aborting: #3. Should I add a comment there about abort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the alternative syntax, f
would need to encapsulate a call to MultipartUploadStream.write()
though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry, I didn't realize there was an issue already
For the alternative syntax, f would need to encapsulate a call to MultipartUploadStream.write() though, right?
Yes, similar to what one would do with a local file
open("file") do io
write(io, "my stuff")
end
src/object.jl
Outdated
mutable struct MultipartUploadStream <: IO | ||
store::AbstractStore | ||
url::String | ||
credentials::Union{Nothing, AWS.Credentials, Azure.Credentials} | ||
uploadState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specialize on the store
type? What is the type of uploadState
? Ideally, we'd specialize so that touching the credentials and store is inferrable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced AbstractStore
for store
with Union{AWS.Bucket, Azure.Container}
. uploadState
is either String
or nothing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we want to specialize MultipartUploadStream{T<:AbstractStore}
, similar to PrefetchedDownloadStream
? It's possible that Julia will manage to union split on small unions like these but it would be nice to make sure we don't box unnecessarily. Also to enforce that we have compatible Credetial and Store objects (i.e. both Azure or both AWS specific).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind making MultipartUploadStream
being parametrized on AbstractStore
, but I'm not quite following what's the benefit in comparison to Union{AWS.Bucket, Azure.Container}
, as I'm not very familiar with the details of boxing. We do use a union for credentials. Is this because there isn't an abstract struct for those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe boxing is not the right CS term, but the idea is to make the code as inferrable as possible, so Julia can optimize things easily without having to do dynamic dispatch and so on. In these type-unstable cases, Julia tends to allocate defensively because it doesn't know which types it will encounter. In this specific example, it might be ok since the unions are small, but there is no harm in specializing these since people tend to know in advance if they want to talk to Azure or S3 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! I admit I like the flexibility of Union
😄, but I see your point.
A couple more results on a larger file and with adding a semaphore for
There is a big difference now between |
Hmm, the performance difference seems problematic, we should investigate. Can you share the benchmarking code again? |
@Drvi Regarding the following:
Why should we change this behaviour for |
I just think it is simple to use 1 synchronization mechanism than 2, Since we already do the locking for the condition, we might as well assign the eTag to the eTags vector without involving the OrderedSynchornizer |
Ah I get your point now. But what do you mean by "making sure the io.eTags is grown as needed"? I only know of |
@Drvi Do you maybe mean to use |
@Drvi I think I've addressed all of your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, I've left a couple of docs improvement ideas but apart from that, I think this is ready to be used. I'm tempted to mark this API as experimental since we'd probably want a story about reusing buffers and handing them back to the caller via a channel and also a way to pre-spawn uploader tasks.
src/object.jl
Outdated
Data chunks are written to a channel and spawned tasks read buffers from this channel. | ||
We expect the chunks to be written in order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd mention this is currently spawns one task per chunk and lets explicitly mention the write
method and also put
as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate a bit more on the use of put
as an alternative here? Doesn't put
require the total input to be known upfront?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention it, that if you don't need to upload files by part (or if you simply cannot, because your file is too small), you can always use put
Btw, I think the benchnark results are heavily influenced by the performance of
And the copy got progressively slower over time:
|
Wow! Good catch. I didn't expect it to be something in the benchmark code. |
Then, I'll remove the comment about performance getting worse with larger files for now, as it might be misleading, but I've mentioned that the API is experimental. |
This PR adds a
MultipartUploadStream
IO object inCloudStore.jl
, which facilitates uploading a stream of data chunks to blob storage. After creating aMultipartUploadStream
object, we can repeatedly callwrite()
and upload each part till there are no more chunks, when we can complete uploading and close the stream.