-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor S3 store & support upload retry #854
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @adam-singer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 10 files at r1, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and 3 discussions need to be resolved
nativelink-error/src/lib.rs
line 61 at r1 (raw file):
} pub fn set_code(mut self, code: Code) -> Self {
not used, can be removed
nativelink-store/src/s3_store.rs
line 111 at r1 (raw file):
buf: &[u8], ) -> Poll<Result<usize, tokio::io::Error>> { Pin::new(&mut Pin::get_mut(self).connection).poll_write(cx, buf)
Thoughts on moving this bit into a simpler function to help readability
fn get_pin_connection(self: Pin<&mut Self>) -> [...] {
Pin::new(&mut Pin::get_mut(self).connection)
}
[...]
get_pin_connection(self).poll_write(cx, buf);
Code quote:
Pin::new(&mut Pin::get_mut(self).connection)
nativelink-util/src/buf_channel.rs
line 246 at r1 (raw file):
/// received exceeds this size, the recent_data buffer will be cleared and /// no longer populated. pub fn set_max_recent_data_size(&mut self, size: usize) {
Are we doing this for convenience of the call sites not to think of machine size? Do we run any risk of upcasting below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r1.
Reviewable status: 0 of 1 LGTMs obtained, and 3 discussions need to be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and 3 discussions need to be resolved
* Upload failures for S3 will now retry * S3 tcp connections now counted as with file descriptor limits * Significantly refactored how uploads happen in S3 store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: pre-commit-checks, and 4 discussions need to be resolved
nativelink-store/src/s3_store.rs
line 111 at r1 (raw file):
Previously, adam-singer (Adam Singer) wrote…
Thoughts on moving this bit into a simpler function to help readability
fn get_pin_connection(self: Pin<&mut Self>) -> [...] { Pin::new(&mut Pin::get_mut(self).connection) } [...] get_pin_connection(self).poll_write(cx, buf);
If I do this, it'll require a custom impl
just for this one function. To be honest, I agree it's super complicated and difficult to read, but in reality, it's quickly realized that it's just forwarding the call on to self.connection.poll_write()
and everything else is just to deal with rust's Pin
lifetime stuff.
I'd rather not factor this out and add +5 lines just for this one call.
nativelink-util/src/buf_channel.rs
line 246 at r1 (raw file):
Previously, adam-singer (Adam Singer) wrote…
Are we doing this for convenience of the call sites not to think of machine size? Do we run any risk of upcasting below?
To be honest I'm trying to get a better handle on usize
vs u64
. Any time it's Data will live in memory
it should be usize
. Any time it's data will live "somewhere" it should be i64 or u64 (depending on if it came from protos or not)`.
In this case it's caching data in memory, so it should be usize
and the caller should be the one to cast it.
nativelink-util/src/retry.rs
line 135 at r2 (raw file):
// function that it complained about was calling another function that called // this one. #[allow(clippy::manual_async_fn)]
no idea what changed here, but clippy was now giving some warnings here about this function being async fn
, but if I change it, we get this error:
error[E0308]: mismatched types
--> nativelink-store/src/grpc_store.rs:556:28
|
556 | ) -> Result<(), Error> {
| ____________________________^
557 | | if matches!(self.store_type, nativelink_config::stores::StoreType::ac) {
558 | | return self.update_action_result_from_bytes(digest, reader).await;
559 | | }
... |
623 | | Ok(())
624 | | }
| |_____^ one type is more general than the other
|
= note: expected `async` block `{async block@nativelink-store/src/grpc_store.rs:583:69: 613:10}`
found `async` block `{async block@nativelink-store/src/grpc_store.rs:583:69: 613:10}`
error: aborting due to 1 previous error
But I didn't change anything there and this seems to be the only way to shut it up.
nativelink-error/src/lib.rs
line 61 at r1 (raw file):
Previously, adam-singer (Adam Singer) wrote…
not used, can be removed
Ooops, forgot to factor this one out when cleaning up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and 1 discussions need to be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2.
Reviewable status: complete! 1 of 1 LGTMs obtained
* Upload failures for S3 will now retry * S3 tcp connections now counted as with file descriptor limits * Significantly refactored how uploads happen in S3 store
This change is