-
Notifications
You must be signed in to change notification settings - Fork 50
artifact store: fix correctness bug; remove copy read timeout #7866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me but it's heavily using some sharp-edged stuff that I haven't used a lot (spawn_blocking
, blocking_recv
, etc.) so it might be nice to get another set of eyes on it. Maybe @jgallagher or @sunshowers?
sled-agent/src/artifact_store.rs
Outdated
OverwriteBehavior::AllowOverwrite, | ||
temp_dir, | ||
); | ||
let (tx, mut rx) = mpsc::channel(16); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something in particular you plan to do for this // TODO
? If it's just that this seems like an arbitrary number, that's fair -- I'd just document that.
It's unfortunate that the buffering here is per-message and not based on the amount of bytes. I don't really see an easy way around that though and I'm not sure it's worth doing much work to improve it. Maybe some day it'll be worth creating an async atomicwrites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maybe worth noting that this is the number of chunks from dropshot's StreamingBody
we're willing to buffer up if the writing task(s) are slow? I'm not sure how big those chunks are in practice, nor how likely a slow writer is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was implementing it, the typical chunk sizes I saw were around 50-100KiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to figure out what the expected chunk sizes are (we have both dropshot's StreamingBody
and reqwest's Body
to take into account here) and then go from there but haven't yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamingBody
just forwards the Bytes
chunks returned from reqwest's Body
-- there's no rechunking or additional copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell the maximum chunk sizes from Nexus (which StreamingBody
wraps) is the buffer sizes in Nexus, which by default are ~400 KB for both HTTP 1 and HTTP 2 clients. reqwest presumably uses Hyper's defaults for clients, which is ~400 KB for HTTP 1 and 1 MB for HTTP 2.
I am not really sure what a good number here is, other than maybe "1". If we're only pulling bytes off the wire as fast as we can write them to both M.2s, we're not going to buffer a bunch of network traffic and then have backpressure later.
I do think I should adjust the code to write the chunk to both M.2s simultaneously though, right now it's done serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a max mem use of up to 64 MB is fine, so 64 (assuming the 1 MB upper bound per chunk) seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The async bits LGTM.
sled-agent/src/artifact_store.rs
Outdated
OverwriteBehavior::AllowOverwrite, | ||
temp_dir, | ||
); | ||
let (tx, mut rx) = mpsc::channel(16); // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maybe worth noting that this is the number of chunks from dropshot's StreamingBody
we're willing to buffer up if the writing task(s) are slow? I'm not sure how big those chunks are in practice, nor how likely a slow writer is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes uploading a TUF repo into a4x2 take 15 minutes instead of N hours, and doesn't fill up the disk once that's done. Ship it!
This fixes the two main problems identified in #7796:
In the previous implementation, we always wrote temporary files to
tmp/{sha256}
, and returned an error if that file was already present. Due to the Drop behavior of Utf8TempFile, what I think was my attempting to simplify the code resulted in deleting another task's temporary file when we return an "already in progress" error. (This PR adds a regression test for this issue, which I've verified fails as expected on the current implementation.)This replaces the temporary file persisting logic with the atomicwrites crate as suggested by @sunshowers, which is resistant to trivial mistakes like this. AtomicFile::write takes a function which must perform the entire write operation; if the function returns an error, the file is not renamed to the final path. To make it work in an asynchronous context, AtomicFile::write is placed on a blocking task and bytes are sent over an mpsc channel. The write task also creates a checksum of the data, returning an error to prevent persisting the file if the checksum is invalid.
atomicwrites also syncs the file as well as the parent directories, meaning we can remove that code in our implementation.
Since the logic for detecting multiple in-progress transfers relied on the predictable naming of temporary files, I removed that. The original reasoning for checking this was to avoid doing unnecessary work. The current implementation allows writing an artifact that already exists though, so long as someone else isn't trying to write it at the same time. (I intentionally allowed overwriting an existing artifact to allow Nexus to work around an incorrectly-written artifact in the store, if it noticed such a thing happening; it doesn't currently.) I don't know which is better; in theory allowing multiple writers means that if one fails spuriously the other that's running could still succeed. It would be simple enough to add the proposed change in #7860 to this PR as well.
The reason the 15 second read timeout was in place for copy requests was to avoid it entirely holding up any other attempts to replicate the artifact. Since we would no longer stop these attempts the read timeout can be removed.