-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider refactoring subsystem initialization in overseer #586
Comments
Yeah, indeed - |
One thing that should definitely be expunged from the codebase is this offensive pattern of having subsystems disable themselves internally according to configuration. This defies all logic and should be removed as soon as possible. |
As subsystems need to be disabled often not individually, but together, I propose this refactoring strategy which avoids duplicate
struct OptionalRequirements {
// private fields for backwards compatibility
// some suggestions without digging
execute_pvfs: bool,
build_collations: bool,
collator_protocol_mode: Sending/Receiving
// other conditionals
}
// some baked-in defaults for things like validators, polkadot full nodes, collators, parachain full nodes, minimal, etc.
struct OptionalRequirementsBuilder {
// builder pattern
} The optional requirements should be as backwards compatible as possible. i.e. it should encapsulate all the optional tasks a node running on Polkadot should be able to do. This does shift some of the burden of thought onto collation libraries like Cumulus, but well-defined presets should help with this. |
Sounds like a good idea. @skunert already looked into this for the minimal node work and can provide some feedback here. Some work on him on this: paritytech/orchestra#41 & paritytech/orchestra#32 |
For reference, my original approach was to get rid of the unneeded subsystems altogether, not just replacing them with a The approach described in this issue looks good to me, even though the goal is a bit different. IIUC we are always spawning the same number of susbsystems, but like on the collator we replace disabled ones by |
Feature flags are going to be less flexible than requirements passed in at runtime, especially if we consider Polkadot-node-as-a-library |
As found by @alexggh, collators seem to connect to validators over We could have a different network config for collators as well: https://github.com/paritytech/polkadot/blob/d74c60b6489f31f437edb1a19de0d0f6b8ad432e/node/service/src/lib.rs#L852 |
This is intentional. We need the validator peerset to recover PoVs. If we don't want them to be connected there, we would probably need to introduce an extra peerset for this (as normal full nodes also enable this). |
Availability recovery doesn't use gossip, but req/resp protocol with We're talking about gossip messages, such as approvals, assignments, bitfields, which are no concern to collators. Having open slots on validators seems fine, but not on collators. |
Yeah I know, I thought you would still need to allow in the peerset, but my understanding was wrong ;) |
This tick collator node on Versi seems to be sending out messages over the validation peer set. This seems like gossip to me. ![]() |
Looked a bit more and I think we are having two small independent problems:
However, they do take some random sample and will gossip the assignments based on that, but that's like 4 samples and that's what you see here, I don't think we should rely on that since they aren't doing any consensus work.
In my opinion 1) should be fixed as part of this refactoring, but independent of that I think we should still enforce 2) and make sure the random sampling is taken from the active authorities for the given session something around this lines: paritytech/polkadot@3ef84bf |
Are these request expensive to process or is there any validation w.r.t. whom they're received from? Syncing protocol has a problem I would like to get rid of in which anybody can send |
Loading the data from disc is probably the most expensive thing of these requests. There is also no validation, as anyone should be able to do this requests. |
@altonen they are open to any peers, but restricted in total requests number https://github.com/paritytech/polkadot/blob/d74c60b6489f31f437edb1a19de0d0f6b8ad432e/node/network/protocol/src/request_response/mod.rs#L226-L234 and request sizes https://github.com/paritytech/polkadot/blob/d74c60b6489f31f437edb1a19de0d0f6b8ad432e/node/network/protocol/src/request_response/mod.rs#L160-L168 If substrate makes use of this limited queue in a fair fashion (e.g. one node can't occupy all requests slots), then we don't need to worry about a potential attack vector here. |
Fair enough. There is not fairness (at least not currently) in who gets to occupy the queue and it's filled FIFO. For syncing it was relatively easy to show the issue with the current request-response implementation, steal attention from the target, and get other connected nodes disconnected due to request timeouts. |
* Add AccountIdConverter impl to Kusama and Polkadot primitives * Add missing message lane config constants * Add more consts * Add another missing const * Move consts in primitives so that they're consistent across files * Move types and consts to more intuitive locations * Downgrade hyper from v0.13.8 to v0.13.6 This conflicts with a requirement on the Polkadot side which requires that hyper is =v0.13.6 * Update hyper to v0.13.9 * Update async-io to v1.3.1 * Update socket2 from v0.3.15 to v0.3.18 * Update message weight/size constants * Make BlockWeights/Length parameter types Allows us to re-use these types from both the runtime and the message lane config files without creating a new instance of them. * Remove uneccesary weight constants These can be found in the `runtime-common` crate used by Polkadot/Kusama. The constants there will also be the most up-to-date versions.
Currently we include all subsystems in
RealOverseerGen
, irregardless of whether they are needed or not. A couple of them are only initialized based on passed-in parameters. E.g. there is a conditional for whether pvf-checker is enabled, and if not we spawn aDummySubsystem
, which is a bit clumsy.Proposal
We could instead nix
RealOverseerGen
and have three new structs that implementOverseerGen
:ValidatorOverseerGen
CollatorOverseerGen
FullNodeOverseerGen
Then for each we only initialize the subsystems that are relevant to that kind of node.
Edit: in the future there can be nodes that are both validators and collators. See paritytech/polkadot#7566 (comment). Perhaps a capabilities-based approach would be better, or a builder pattern, e.g.
new().is_validator(true).is_collator(true)
.I'm not too familiar with the code so looking for feedback. cc @ordian
Followup
After this has been done we can move the worker binary check (where we call
determine_workers_paths
) intoCandidateValidation::new
. This would ensure that the workers are only required for nodes that actually need them (validators).The text was updated successfully, but these errors were encountered: