-
Notifications
You must be signed in to change notification settings - Fork 2.6k
client/network-gossip: Redesign validator #6335
Comments
There should also be an opportunity to write something different alongside the existing code so we aren't required to rewrite finality-grandpa as a precondition for the refactoring to go in. Moving everything up to the higher layer introduces some complications, as you basically have to reimplement all the gossip logic at the higher level if you want different peers to receive different messages based on their perceived protocol state. To avoid that, we'd need these properties:
We'd also need some stuff for triggering predicate-gated message repropagations (analogous to |
True. We could as well introduce
Are you referring to the "Removing the concept of a Validator" suggestion here? My description might be misleading. I am not suggesting to move all the logic of `client/network-gossip/src/state_machine.rs into the respective upper layers. The gossip system would still
In order to make (2) and (3) aware of the upper layer protocol state one can for example have |
I just want to point out that an alternative way to implement backpressure is not to "notify the caller" (if I understand correctly, the caller here is the Upper Layer), but rather for the caller to place all outgoing items on a bounded priority buffer. If the producer attempts to add to a full buffer, old obsolete items are dropped using producer/application-specific logic - since only the producer (i.e. Upper Layer using the terminology of this PR) knows precisely what is the best item to drop. The consumer takes items from the buffer in priority order, with the priority also being defined by the producer/application, whenever it is ready to receive work again. This can either be via polling the buffer, or via a blocking read of the buffer (which is effectively a notification in terms of OS primitives, but I gather you meant "notify" in a more general / application-level sense). Using this idea, all tricky synchronisation issues are pushed into the buffer implementation. Then it is also unnecessary for components to hold Arc references to each other. |
Status quo
A gossip
Validator
is called by aGossipEngine
to validate gossip messages. It can decide whether a given message should be:Passed to the higher layer and propagated to other nodes.
Passed to the higher layer but not propagated to other nodes.
Discarded
In order to make these decisions it needs to be aware of the state of the upper layer gossip message consumer. Thus a validator communicates to two components, (1) the
GossipEngine
and (2) the upper layer consumer.Reproduce
https://bramp.github.io/js-sequence-diagrams/
All methods of the
Validator
trait are synchronous in the sense that they either return imidiately or block the underlying OS thread. Communication between the 3 components (upper layer, validator, gossip engine) is thus encouraged to happen by wrapping the validator in anArc<Mutex<>>
and having both upper layer and gossip engine hold a reference (see finality-granpda).Sharing a gossip validator via an
Arc<Mutex<>>
is problematic because:Both sides (upper layer, gossip engine) operate via green threading (Rust futures). Using conventional locking can block a thread and thus block all tasks running on the same thread.
There is no way for the validator to exercise back-pressure, while it can simply return an error when being overloaded there is no way for the validator to notify the caller that it is ready to receive work again.
In case a validator is not just a pure validator, but actually handles a lot more (e.g. sending neighbor packets) there is no way for a validator to communicate with other components other than conventional
Arc<Mutex<>>
primitives orfutures::channel::mpsc
unbounded channels.One can not use a futures channel as the validator has no way of yielding control to another task (
return Poll::Pending
). One can not use astd::sync::mpsc
channel as one would otherwise block the entire OS thread and thus block all tasks running on the same thread.Possible alternatives
Making
Validator
methods asyncOne could make the methods of the
Validator
trait async. This would enable a validator to exercise back-pressure and be able to talk to other components (even though I doubt the latter is a good idea in the first place).Adjustments needed between
Validator
andGossipEngine
are doable, asGossipEngine
never clones itsValidator
and implementsFuture
.Adjustments needed between
Validator
and e.g. finality-grandpa are hard as validator methods are called within functions that can not easily yield (e.g. note_set, note_round, note_commit_finalized, ...) and the validator being cloned in many places.Removing the concept of a Validator
Reproduce
https://bramp.github.io/js-sequence-diagrams/
Instead of having the
GossipEngine
ask aValidator
to validate a message to then pass it on to an upper layer, instead remove the concept of aValidator
and have theGossipEngine
always pass messages to the upper layer, which validates the message and passes it (or a message id) back to theGossipEngine
if it should be propagated further (libp2p gossipsub uses the same pattern).This reduces (1) the amount of components, (2) the amount of needed communication channels and (3) the amount of concurrency.
Adjustments needed on the
GossipEngine
side are easy as instead of propagating after validation it then propagates on request by the upper layer.Adjustments needed e.g. in finality-grandpa are hard as (1) finality-grandpa does more than validating messages in its
Validator
(e.g. neighbor packets) and (2) bothValidator
as well asGossipEngine
are cloned and thus shared many times across different finality-grandpa components.Priority for this issue
While I find this to be definitely worth fixing, especially as more and more Polkadot components are building on top of this, there are more urgent things to do, e.g.:
Introducing back-pressure into write-notifications. (The two are interdependent as today finality-grandpa is sending messages via its
Validator
which it would not be able to do if writing a message would beasync
.)Introducing back-pressure into notification-stream
The text was updated successfully, but these errors were encountered: