-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not run unneeded subsystems on collator and its alongside node #3061
Conversation
The CI pipeline was cancelled due to failure one of the required jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like this! At the cost of some minor code duplication, we get rid of some hacky conditional logic and gain more flexibility in instantiating different type of overseer for collators.
Noticed a couple of things that can be cleaned up further, but other than that LGTM.
Nice! Do you see anything speaking against reusing this also for the minimal collator node? In the end we have the nearly identical overseer here again:
|
Yeah, that's a good point, and that's what I tried first. It seemed to be quite logical but came out as usual. The current "full scale" overseer is implemented through the |
looks like cumulus pov recovery test is failing, @skunert do you know why wouldn't it work with minimal overseer? |
@ordian looks like I forgot to include something required 🤔
|
Fixed it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point, and that's what I tried first. It seemed to be quite logical but came out as usual. The current "full scale" overseer is implemented through the OverseerBuilder which is generic over RuntimeClient, and over DefaultSubsystemClient as well. They are two different levels of API implementation and work fine like that.
Hmm I have not taken a deep dive on that code, but on first sight it looks like DefaultSubsystemClient
only implements the single trait RuntimeApiSubsystemClient
. So why does the overseer building codepath even need to know about it. Can we not modify that code to just take some type that implements RuntimeApiSubsystemClient
(and iirc AuxStore
and ChainApiBackend
). This is satisfied by both BlockChainRpcClient
and DefaultSubsystemClient
.
But anyway, I don't insist that this needs to be investigated right now. Code looks good 👍
@skunert makes sense! I'll try to follow up on that. |
resolve #3116 a follow-up on #3061 (review): - [x] reuse collator overseer builder for polkadot-node and collator - [x] run zombienet test (0001-parachains-smoke-test.toml) - [x] make wasm build errors more user-friendly for an easier problem detection when using different toolchains in Rust --------- Co-authored-by: ordian <write@reusable.software> Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
resolve paritytech#3116 a follow-up on paritytech#3061 (review): - [x] reuse collator overseer builder for polkadot-node and collator - [x] run zombienet test (0001-parachains-smoke-test.toml) - [x] make wasm build errors more user-friendly for an easier problem detection when using different toolchains in Rust --------- Co-authored-by: ordian <write@reusable.software> Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
Currently, collators and their alongside nodes spin up a full-scale overseer running a bunch of subsystems that are not needed if the node is not a validator. That was considered to be harmless; however, we've got problems with unused subsystems getting stalled for a reason not currently known, resulting in the overseer exiting and bringing down the whole node.
This PR aims to only run needed subsystems on such nodes, replacing the rest with
DummySubsystem
.It also enables collator-optimized availability recovery subsystem implementation.
Partially solves #1730.