-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement worker api for killing running actions #840
Conversation
b91f05d
to
95edf4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Cargo Dev / macos-13, and 7 discussions need to be resolved
nativelink-proto/com/github/trace_machina/nativelink/remote_execution/worker_api.proto
line 136 at r1 (raw file):
/// Request to kill a running action sent from the scheduler to a worker. message KillActionRequest { /// The the hash of the unique qualifier for the action to be killed.
nit: Don't make it the hash, make it the hex.
nativelink-proto/com/github/trace_machina/nativelink/remote_execution/worker_api.proto
line 138 at r1 (raw file):
/// The the hash of the unique qualifier for the action to be killed. string action_id = 1; reserved 2;
nit: Add // NextId.
nativelink-worker/src/local_worker.rs
line 215 at r1 (raw file):
Update::KillActionRequest(kill_action_request) => { let kill_action_result = self.running_actions_manager .kill_action(kill_action_request.clone())
nit: Pass just the action_id by ref. If we need more later we can change it, but keep it simple for now.
nativelink-worker/src/local_worker.rs
line 217 at r1 (raw file):
.kill_action(kill_action_request.clone()) .await; if let Err(e) = kill_action_result {
Shouldn't we propagate this error back up the grpc stream? See how Update::ConnectionResult
does it.
nativelink-worker/src/running_actions_manager.rs
line 1792 at r1 (raw file):
assert_eq!( hex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8]),
nit: Make the caller do this conversion and pass in an ActionId
.
nativelink-worker/src/running_actions_manager.rs
line 1799 at r1 (raw file):
let running_actions = self.running_actions.lock(); running_actions.get(&action_id).map_or( Err(make_err!(
nit: use make_input_err!()
nativelink-worker/src/running_actions_manager.rs
line 1805 at r1 (raw file):
)), |action| { action.upgrade().map(Self::kill_action);
If upgrade fails, we should return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Cargo Dev / macos-13, and 7 discussions need to be resolved
95edf4b
to
b9caf1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved
nativelink-proto/com/github/trace_machina/nativelink/remote_execution/worker_api.proto
line 136 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Don't make it the hash, make it the hex.
Done.
nativelink-proto/com/github/trace_machina/nativelink/remote_execution/worker_api.proto
line 138 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Add
// NextId.
Done.
nativelink-worker/src/local_worker.rs
line 215 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Pass just the action_id by ref. If we need more later we can change it, but keep it simple for now.
Done.
nativelink-worker/src/local_worker.rs
line 217 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Shouldn't we propagate this error back up the grpc stream? See how
Update::ConnectionResult
does it.
Done.
nativelink-worker/src/running_actions_manager.rs
line 1792 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Make the caller do this conversion and pass in an
ActionId
.
Done.
nativelink-worker/src/running_actions_manager.rs
line 1799 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: use
make_input_err!()
Done.
nativelink-worker/src/running_actions_manager.rs
line 1805 at r1 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
If upgrade fails, we should return an error.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 3 discussions need to be resolved
nativelink-worker/src/local_worker.rs
line 215 at r2 (raw file):
Update::KillActionRequest(kill_action_request) => { let mut action_id = [0u8; 32]; let decode_result = hex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8]);
nit:
hex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8])
.map_err(|e| Err(make_input_err!(
"KillActionRequest failed to decode ActionId hex with error {}",
e
))?;
nativelink-worker/src/local_worker.rs
line 223 at r2 (raw file):
)); } let kill_action_result = self.running_actions_manager
nit:
self.running_actions_manager
.kill_action(action_id)
.await
.err_tip(|| format!("Kill action {} failed with error - {}", hex::encode(action_id), e))
nativelink-worker/src/running_actions_manager.rs
line 1790 at r2 (raw file):
async fn kill_action(&self, action_id: ActionId) -> Result<(), Error> { let upgrade_or_err: Result<Arc<RunningActionImpl>, Error> = {
nit:
async fn kill_action(&self, action_id: ActionId) -> Result<(), Error> {
let running_action = {
let running_actions = self.running_actions.lock();
running_actions
.get(&action_id)
.and_then(|action| action.upgrade())
.ok_or_else(|| make_input_err!(
"Failed to get running action {}",
hex::encode(action_id)
))?
};
Self::kill_action(running_action).await;
Ok(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 3 discussions need to be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 2 discussions need to be resolved
nativelink-worker/src/local_worker.rs
line 215 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit:
hex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8]) .map_err(|e| Err(make_input_err!( "KillActionRequest failed to decode ActionId hex with error {}", e ))?;
Thanks, the error handling got really messy here. The compiler didn't like the Err
part but
hex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8])
.map_err(|e| make_input_err!(
"KillActionRequest failed to decode ActionId hex with error {}",
e
))?;
seems to work. I'm not certain of what the make_input_err!
macro does (assume it expands to an error with some custom code) so want to check that it's actually functionally equivalent to your previous code.
nativelink-worker/src/local_worker.rs
line 223 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit:
self.running_actions_manager .kill_action(action_id) .await .err_tip(|| format!("Kill action {} failed with error - {}", hex::encode(action_id), e))
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04, and 1 discussions need to be resolved
nativelink-worker/src/running_actions_manager.rs
line 1790 at r2 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit:
async fn kill_action(&self, action_id: ActionId) -> Result<(), Error> { let running_action = { let running_actions = self.running_actions.lock(); running_actions .get(&action_id) .and_then(|action| action.upgrade()) .ok_or_else(|| make_input_err!( "Failed to get running action {}", hex::encode(action_id) ))? }; Self::kill_action(running_action).await; Ok(()) }
Done.
9b72cad
to
faca603
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
nativelink-worker/src/local_worker.rs
line 215 at r2 (raw file):
Previously, zbirenbaum (Zach Birenbaum) wrote…
Thanks, the error handling got really messy here. The compiler didn't like the
Err
part buthex::decode_to_slice(kill_action_request.action_id, &mut action_id as &mut [u8]) .map_err(|e| make_input_err!( "KillActionRequest failed to decode ActionId hex with error {}", e ))?;
seems to work. I'm not certain of what the
make_input_err!
macro does (assume it expands to an error with some custom code) so want to check that it's actually functionally equivalent to your previous code.
Yeah it's the same thing.
make_input_err!
is just short for:
Error::new(
Code::InvalidArgument,
format!("{}", format_args!($($arg)+)),
)
Implements worker api for requesting a currently running action to be killed. Allows for action cancellation requests to be sent from the scheduler during scenarios such as client disconnection. towards TraceMachina#338
faca603
to
c55983d
Compare
Description
Implements worker api for requesting a currently running action to be killed. Allows for action cancellation requests to be sent from the scheduler during scenarios such as client disconnection.
towards: #338
Type of change
Please delete options that aren't relevant.
How Has This Been Tested?
Added a new test case to confirm that the request to kill an action is received and handled accordingly.
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)