Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subsystem] improve parallelization by consuming messages inbetween overseer signals in parallel #933

Open
drahnr opened this issue Oct 5, 2021 · 4 comments
Assignees
Labels
I4-refactor Code needs refactoring.

Comments

@drahnr
Copy link

drahnr commented Oct 5, 2021

The futures API we could consider to use more in order to avoid the stalling subsystems:

https://docs.rs/futures/0.3.17/futures/prelude/stream/trait.StreamExt.html#method.buffer_unordered

Currently we use a bunch of for msg in source.await() { match msg {} } style patterns which is rather prohibitive of any parallelization.
There are a few perils of Signals vs regular Messages (again) that need to be taken into account. The sequencing within the overseer is not sufficient. This might require some additional logic around BufferUnordered or a custom impl that is Signal/Message-semantic aware.

This will have two effects:

  • individual messages can no longer block the whole execution flow
  • easy tuning knobs for parallelization per subsystem
  • maintain per subsystem backpressure
@drahnr drahnr changed the title [subsystem] refactor how messages are consumed [subsystem] refactor how overseer messages are consumed Oct 5, 2021
@drahnr drahnr changed the title [subsystem] refactor how overseer messages are consumed [subsystem] improve parallelization by how overseer messages are consumed Oct 5, 2021
@drahnr drahnr changed the title [subsystem] improve parallelization by how overseer messages are consumed [subsystem] improve parallelization by consuming messages inbetween overseer signals in parallel Oct 5, 2021
@rphmeier
Copy link
Contributor

rphmeier commented Oct 5, 2021

There are certain instances where we actually want specific messages to block execution flow. A prime example is handling overseer signals. For instance, paritytech/polkadot#3924 only works because we ensure that the dispute coordinator handles collecting backing votes and candidates before we handle a message from a corresponding approval task that issues an IssueLocalStatement message. The overseer model is designed for synchronization around the advancement of the relay chain, and we should take care to preserve that property where needed.

@rphmeier
Copy link
Contributor

rphmeier commented Oct 5, 2021

As a design choice, I would advocate for non-parallel as a default and we should be explicit about which work we do in the background. There are likely diminishing returns for processing more and more messages in parallel, and it may even reduce performance at some point due to increased reliance on shared state.

@drahnr
Copy link
Author

drahnr commented Oct 6, 2021

The idea is not so much about the fact that compute power is limited or speeding things up, but rather being bottlenecked on a single subsystem that does some (relatively speaking) heavier computations and hence become a bottleneck when doing serial processing as we currently do. The above approach is particularly interesting for those.

I fully agree to stating things explicitly.

If we can guarantee on synchronization on Overseer::Signal, keeping the total ordering (rather use buffer() than buffer_unordered(), except for the heavy processing part, this has little chance of failure. This would imply splitting subsystem state updates into pre and post stage of doing the heavy and/or blocking processing part which indeed would require some additional investigations.

For a few individual subsystems this could play out very well, for others, less so.

Candidates to consider are those subsystems that are:

  • blocking on an oneshot::Receiver which corresponding oneshot::Sender was sent to another subsystem
  • dealing with erasure code recovery or signature verification operations

@drahnr
Copy link
Author

drahnr commented Apr 5, 2022

Re-prioritizing in face of paritytech/polkadot#4911

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
bkchr pushed a commit that referenced this issue Apr 10, 2024
* do not spawn additional task for on-demand relays

* compilation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I4-refactor Code needs refactoring.
Projects
Status: Backlog
Development

No branches or pull requests

6 participants