Improvements to node scheduler #213

mwylde · 2023-07-20T23:58:19Z

This addresses a few issues with the node scheduler:

Previously we were sending the entire pipeline binary as part of a single gRPC message, but as our pipelines binaries have grown in size this can bump up against the max size limit. This PR moves the communication to a multi-part stream of 2MB chunks.
If a node dies, the jobs running on it going into recovery. The first phase of this is to try to clean up the old cluster, which for the node scheduler involves telling the node to shut down the worker. Except that the node itself is down, and so can't do that. Now we no longer treat this as an error, so that recovery can continue under the assumption that if the node isn't responding any more, the worker probably isn't running either.

jacksonrnewhouse · 2023-07-21T00:04:25Z

We could probably bump the message size limits up to the point where that isn't an issue. Have a preference?

jacksonrnewhouse · 2023-07-20T23:59:51Z

arroyo-controller/src/schedulers/mod.rs

-            force,
-        ) {
+        let Ok(mut client) = NodeGrpcClient::connect(format!("http://{}", node.addr)).await else {
+            warn!("Failed to connect to worker to stop; this likely means it is dead");


If we wanted to be careful about making sure a pipeline was actually dead we'd coordinate this behavior with timeouts/heartbeats. Might not be worth it for what is likely not the main mode of deployment.

Yeah, I think there are ways of making this more robust but waiting for a heartbeat does mean taking longer to recover failed pipelines.

There are basically two cases:

The node process has died, in which case the workers will also have died (generally) because they are child processes of the node

We have a network partition between the controller and the node

In case 1, we're doing the right thing. In case 2, the worker and the node will each die on their own as they fail to talk to the controller.

mwylde · 2023-07-21T00:17:24Z

We could probably bump the message size limits up to the point where that isn't an issue. Have a preference?

We actually can't, there's a hard limit in tonic and we've always had it configured as high as it can be set. It also looks like that limit has just been lowered to 4MB in the newest version of tonic.

Improvements to node scheduler

0dc1cbc

jacksonrnewhouse approved these changes Jul 21, 2023

View reviewed changes

mwylde merged commit b32e657 into master Jul 21, 2023

mwylde deleted the node branch July 21, 2023 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to node scheduler #213

Improvements to node scheduler #213

mwylde commented Jul 20, 2023

jacksonrnewhouse commented Jul 21, 2023

jacksonrnewhouse Jul 20, 2023

mwylde Jul 21, 2023

mwylde commented Jul 21, 2023 •

edited

Loading

Improvements to node scheduler #213

Improvements to node scheduler #213

Conversation

mwylde commented Jul 20, 2023

jacksonrnewhouse commented Jul 21, 2023

jacksonrnewhouse Jul 20, 2023

Choose a reason for hiding this comment

mwylde Jul 21, 2023

Choose a reason for hiding this comment

mwylde commented Jul 21, 2023 • edited Loading

mwylde commented Jul 21, 2023 •

edited

Loading