RFC: Shard refactoring and user control #1639
zeylahellyer
started this conversation in
Development & RFCs
Replies: 2 comments 1 reply
-
I like all of this except for |
Beta Was this translation helpful? Give feedback.
1 reply
-
One issue I can see is that |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What?
Refactor the gateway shard implementation to remove background tasks and bring control points directly to the user. Rather than spawn a processor and its tasks for shards in the background and use channels to communicate back and forth with the "shard" interface we provide users, we can instead bring the processing directly into the user's hands by merging the shard processor with the shard interface users control.
Why?
Users currently don't programmatically receive many of the errors the shard processor may encounter: failure to deserialize events, abnormal WebSocket closures, channels closing, events that are out of sequence, and so on. These are logged (and has a history of causing grievances due to liberal logging levels), but can't cleanly be sent to users. Meanwhile, all that the shard processor can send users are the properly deserialized events the shard processor receives.
Users have little control over this: shards can be used to indicate to the shard processor that they would like to be shutdown (but receive no real-time notification of when this happens, if it even does!), request for gateway commands to be sent over the WebSocket, and request the WebSocket connection be started, but there's little or no feedback on whether and when these take place. By moving this logic directly into the shard, we can support these operations with real-time feedback and errors and open up new possibilities of control, such as pausing processing and allowing shard restarts.
There are three forms of operations that can be done with shards: status, sending, and receiving. Status operations have an API like so:
Meanwhile, sending looks like this:
and receiving like so:
This is mostly "fine":
start
can provide the user an error if the shard couldn't be started;command
can tell the user if it couldn't serialize the command or the shard wasn't started; and receiving events can tell the user when the event stream ends due to the shard processor no longer processing events. However, where we run into problems is thatstart
only handles some errors, while most of the work that can error is passed off into the background by the shard processor. The story is similar withcommand
and receiving events: you can infer some basic information, but you can't possibly know the full story.How?
Much of the infrastructure supporting the separation of shards and their processors can be removed by consolidating shards. Here's what such an API may look like:
At its core, this isn't fundamentally different in usage:
Notably, this will make users handle a Result. We don't need to offload the decision making on when to shutdown or reconnect a shard to users -- we can still do that. This Result is simply a way for deserialization and processing errors to be presented to users, and decide to take their own actions if they want to.
This has a benefit of removing a significant amount of what would now be cruft in the gateway crate: the Heartbeater can now be inlined to event calls; the Socket Forwarder and Emitter no longer needs to exist as intermediary messaging layers; Sessions can now be cleaned up and no longer needs to use channels and cells for propogation; and implementations of what does remain will just end up being simpler to maintain and read.
One important detail is that we can increase our bus factor on the gateway. The gateway has historically been difficult to understand, read, and document, due to the layers of separation between individual components in it. It's difficult to understand where and how the Socket Forwarder communicates with the Shard Processor, and how the chain of logic propogates down to end users. By implementing this change, we can remove all of these layers and end up only with the Shard implementation.
When?
There is nothing blocking this. I have already started on an implementation. Refactoring the gateway and its internal components like the socket forwarder and the heartbeater have been long term goals of mine as a way to refactor and document the shard processor as a whole. This is a way to do all of these in one fell swoop.
Beta Was this translation helpful? Give feedback.
All reactions