Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5 / 5] Introduce approval-voting-parallel #4849

Open
wants to merge 66 commits into
base: master
Choose a base branch
from

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Jun 20, 2024

This is the implementation of the approach described here: #1617 (comment) & #1617 (comment) & #1617 (comment).

Description of changes

The end goal is to have an architecture where we have single subsystem(approval-voting-parallel) and multiple worker types that would full-fill the work that currently is fulfilled by the approval-distribution and approval-voting subsystems. The main loop of the new subsystem would do just the distribution of work to the workers.

The new subsystem will have:

  • N approval-distribution workers: This would do the work that is currently being done by the approval-distribution subsystem and in addition to that will also perform the crypto-checks that an assignment is valid and that a vote is correctly signed. Work is assigned via the following formula: worker_index = msg.validator % WORKER_COUNT, this guarantees that all assignments and approvals from the same validator reach the same worker.
  • 1 approval-voting worker: This would receive an already valid message and do everything the approval-voting currently does, except the crypto-checking that has been moved already to the approval-distribution worker.

On the hot path of processing messages no synchronisation and waiting is needed between approval-distribution and approval-voting workers.

Screenshot 2024-06-07 at 11 28 08

Guidelines for reading

The full implementation is broken in 5 PRs and all of them are self-contained and improve things incrementally even without the parallelisation being implemented/enabled, the reason this approach was taken instead of a big-bang PR, is to make things easier to review and reduced the risk of breaking this critical subsystems.

After reading the full description of this PR, the changes should be read in the following order:

  1. [1 / 5] Optimize logic for gossiping assignments #4848, some other micro-optimizations for networks with a high number of validators. This change gives us a speed up by itself without any other changes.
  2. [2 / 5] Make approval-distribution logic runnable on a separate thread #4845 , this contains only interface changes to decouple the subsystem from the Context and be able to run multiple instances of the subsystem on different threads. No functional changes
  3. [3 / 5] Move crypto checks in the approval-distribution #4928, moving of the crypto checks from approval-voting in approval-distribution, so that the approval-distribution has no reason to wait after approval-voting anymore. This change gives us a speed up by itself without any other changes.
  4. [4 / 5] Make approval-voting runnable on a worker thread #4846, interface changes to make approval-voting runnable on a separate thread. No functional changes
  5. This PR, where we instantiate an approval-voting-parallel subsystem that runs on different workers the logic currently in approval-distribution and approval-voting.
  6. The next step after this changes get merged and deploy would be to bring all the files from approval-distribution, approval-voting, approval-voting-parallel into a single rust crate, to make it easier to maintain and understand the structure.

Results

Running subsystem-benchmarks with 1000 validators 100 fully ocuppied cores and triggering all assignments and approvals for all tranches

Approval does not lags behind.

Master

Chain selection approved  after 72500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a

With this PoC

Chain selection approved  after 3500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a

Gathering enough assignments

Enough assignments are gathered in less than 500ms, so that gives un a guarantee that un-necessary work does not get triggered, on master on the same benchmark because the subsystems fall behind on work, that number goes above 32 seconds on master.

Screenshot 2024-06-20 at 15 48 22

Cpu usage:

Master

CPU usage, seconds                     total   per block
approval-distribution                96.9436      9.6944
approval-voting                     117.4676     11.7468
test-environment                     44.0092      4.4009

With this PoC

CPU usage, seconds                     total   per block
approval-distribution                 0.0014      0.0001 --- unused
approval-voting                       0.0437      0.0044.  --- unused
approval-voting-parallel              5.9560      0.5956
approval-voting-parallel-0           22.9073      2.2907
approval-voting-parallel-1           23.0417      2.3042
approval-voting-parallel-2           22.0445      2.2045
approval-voting-parallel-3           22.7234      2.2723
approval-voting-parallel-4           21.9788      2.1979
approval-voting-parallel-5           23.0601      2.3060
approval-voting-parallel-6           22.4805      2.2481
approval-voting-parallel-7           21.8330      2.1833
approval-voting-parallel-db          37.1954      3.7195.  --- the approval-voting thread.

Enablement strategy

Because just some trivial plumbing is needed in approval-distribution and approval-voting to be able to run things in parallel and because this subsystems plays a critical part in the system this PR proposes that we keep both ways of running the approval work, as separated subsystems and just a single subsystem(approval-voting-parallel) which has multiple workers for the distribution work and one worker for the approval-voting work and switch between them with a comandline flag.

The benefits for this is twofold.

  1. With the same polkadot binary we can easily switch just a few validators to use the parallel approach and gradually make this the default way of running, if now issues arise.
  2. In the worst case scenario were it becomes the default way of running things, but we discover there are critical issues with it we have the path to quickly disable it by asking validators to adjust their command line flags.

Next steps

  • Make sure through various testing we are not missing anything
  • Polish the implementations to make them production ready
  • Add Unittest Tests for approval-voting-parallel.
  • Define and implement the strategy for rolling this change, so that the blast radius is minimal(single validator) in case there are problems with the implementation.
  • Versi long running tests.
  • Add relevant metrics.

@ordian @eskimor @sandreim @AndreiEres, let me know what you think.

@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from a591635 to bd5529d Compare June 20, 2024 12:02
@alexggh alexggh changed the title Introduce approval-voting-parallel [5 / 5] Introduce approval-voting-parallel Jun 20, 2024
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-4-5 branch from cb57906 to 4b3f489 Compare July 2, 2024 11:40
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from bd5529d to 5da132d Compare July 2, 2024 11:40
@alexggh alexggh changed the base branch from alexaggh/approval-voting-parallel-4-5 to alexaggh/approval-voting-parallel-2-5 July 2, 2024 11:46
@alexggh alexggh changed the base branch from alexaggh/approval-voting-parallel-2-5 to master July 2, 2024 11:51
@alexggh alexggh changed the base branch from master to alexaggh/approval-voting-parallel-2-5 July 2, 2024 12:04
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-2-5 branch from 7add070 to 3a0ba90 Compare July 3, 2024 12:16
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from 5da132d to 1592d0b Compare July 3, 2024 12:16
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-2-5 branch from 3a0ba90 to 78bb23d Compare July 5, 2024 11:36
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from 1592d0b to 0b5e321 Compare July 5, 2024 11:36
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-2-5 branch from 78bb23d to daf343f Compare July 5, 2024 13:13
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from 0b5e321 to 49a0312 Compare July 5, 2024 13:13
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-2-5 branch from daf343f to 25f1f0a Compare July 10, 2024 14:42
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from 49a0312 to 0746ba3 Compare July 10, 2024 14:42
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-2-5 branch from 25f1f0a to d13e1c8 Compare July 12, 2024 07:04
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from 0746ba3 to cd2d303 Compare July 12, 2024 07:04
@alexggh alexggh force-pushed the alexaggh/approval-voting-parallel-5-5 branch from cd2d303 to aa42eff Compare July 12, 2024 07:15
@alexggh alexggh marked this pull request as ready for review July 12, 2024 07:22
github-merge-queue bot pushed a commit that referenced this pull request Jul 16, 2024
This is part of the work to further optimize the approval subsystems, if
you want to understand the full context start with reading
#4849 (comment),
however that's not necessary, as this change is self-contained and nodes
would benefit from it regardless of subsequent changes landing or not.

While testing with 1000 validators I found out that the logic for
determining the validators an assignment should be gossiped to is taking
a lot of time, because it always iterated through all the peers, to
determine which are X and Y neighbours and to which we should randomly
gossip(4 samples).

This could be actually optimised, so we don't have to iterate through
all peers for each new assignment, by fetching the list of X and Y peer
ids from the topology first and then stopping the loop once we took the
4 random samples.

With this improvements we reduce the total CPU time spent in
approval-distribution with 15% on networks with 500 validators and 20%
on networks with 1000 validators.

## Test coverage:

`propagates_assignments_along_unshared_dimension` and
`propagates_locally_generated_assignment_to_both_dimensions` cover
already logic and they passed, confirm that there is no breaking change.

Additionally, the approval voting benchmark measure the traffic sent to
other peers, so I confirmed that for various network size there is no
difference in the size of the traffic sent to other peers.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last piece of the puzzle is looking good !

Something that I think could be improved is reducing the blast radius of the approval_voting_parallel_enabled flag. I see it in all subsystems that interact with approval voting.

polkadot/node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/approval-voting-parallel/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/approval-voting-parallel/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/approval-voting-parallel/src/metrics.rs Outdated Show resolved Hide resolved
polkadot/node/subsystem-types/src/messages.rs Outdated Show resolved Hide resolved
polkadot/node/service/src/overseer.rs Show resolved Hide resolved
polkadot/node/service/src/overseer.rs Show resolved Hide resolved
polkadot/node/network/bridge/src/rx/mod.rs Show resolved Hide resolved
@alexggh
Copy link
Contributor Author

alexggh commented Sep 10, 2024

Something that I think could be improved is reducing the blast radius of the approval_voting_parallel_enabled flag. I see it in all subsystems that interact with approval voting.

Thought about this a bit, one thing to have in mind here is that these are temporary, once the new subsystem becomes the norm they will be remove/cleaned up, so I don't think we will have them more than one/two releases.

	if approval_voting_parallel_enabled {
		send_new_subsystem_message
	} else {
		send_old_subsystem_message
	}

The root cause for them is that with the new parallelised approach we want to:

  1. Consolidate the two subsystems approval-distribution and approval-voting in a single one(approval-voting-parallel), hence both the callers of the approval-voting and approval-distribution have to know to send the new messages, so even if we don't introduce a new subsystem and use the existing approval-distribution as the orchestrator of the workers as suggested here we would still need to introduce the above if else in all the callers of the approval-voting.

  2. We want to keep the both approaches of running the original 2 subsystems and the parallelised one and switch between them with a runtime knob, if we would just have the parallelised mode then this logic would go away and it will once we built enough trust the parallelised mode is hardened enough to be the default. My opinion, here is that the benefits outweighs the temporary ugliness that this knob needs to be plumbed in the subsystems that interact with approval voting subsystems.

So from my perspective we have three options here:

a) Don't support both modes of running and in this case we rely on discovering all the bugs in our QA phase on versi, westend and rococo and then everyone on kusama and polkadot will start running in the new parallel mode. The rollback path in that case would be to ask people to go to the previous polkadot binary and with that any other unrelated features and bugs will be rolled-back. I don't think we should go this route.

b) Do not add a new subsystem approval-voting-parallel and use approval-distribution as the orchestrator, this would reduce a bit the touched surface, but it won't solve all paths because all approval-voting callers would still need to have this switch logic between the old and new mode. Additionally, I think this will make it a bit harder to review the current PR and distinguished between the normal mode with 2 subsystems and the new proposed parallel mode.

c) Take into consideration that this logic will be clean-up/removed anyway, so I don't think is worth investing adding other abstractions on top of it, while the solution is not the most elegant it is the fastest and simplest to understand and to cleanup once we don't need it anymore. With that said there are a few suggestions I can easily implement to drop the subsystem_disabled and subsystem_enabled flags by having an alternative to validator_overseer_builder where we don't instantiate the unneeded subsystems and use the Dummysubsystem like we do in collator_overseer_builder, but unfortunately we would still need the knobs where we decide if we send the ApprovalVotingParallelMessage or the older one.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall from a light review

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2024
This is part of the work to further optimize the approval subsystems, if
you want to understand the full context start with reading
#4849 (comment),

# Description
This PR contain changes to make possible the run of single
approval-voting instance on a worker thread, so that it can be
instantiated by the approval-voting-parallel subsystem.

This does not contain any functional changes it just decouples the
subsystem from the subsystem Context and introduces more specific trait
dependencies for each function instead of all of them requiring a
context.

This change can be merged  independent of the followup PRs.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Base automatically changed from alexaggh/approval-voting-parallel-2-5 to master September 12, 2024 10:12
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 🚀

@sandreim
Copy link
Contributor

sandreim commented Sep 12, 2024

Something that I think could be improved is reducing the blast radius of the approval_voting_parallel_enabled flag. I see it in all subsystems that interact with approval voting.

Thought about this a bit, one thing to have in mind here is that these are temporary, once the new subsystem becomes the norm they will be remove/cleaned up, so I don't think we will have them more than one/two releases.

I have no better idea on how to make this better, so best is we stick with current approach.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @alexggh !

Looking forward to seeing this live. We should try to increase Kusama to 750 para validators after the para inherent runtime weight fix is applied.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Status: Backlog
Status: Pending Audit
Development

Successfully merging this pull request may close these issues.

5 participants