-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADR] Light Client Architecture #185
Conversation
Looks good! 👍 As an aside, Dropbox recently published a post on their experience rewriting their sync engine in Rust, and I found the two points below quite interesting, as they seem to match the conclusions you've drawn yourself from your experiments:
|
The light clients requires an expanding set of peers to get headers | ||
from. The larger and more diverse the set of peers; the better the | ||
chances that the light client will detect a fork. The peer list can be | ||
seeded with peers from configuration and will can crawl the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a question.
Should the light client be running the light client protocol only with 1 peer and only running some kind minimal fork detection protocol with other peers?
I don't think it makes any sense to be running the entire light client protocol on multiple peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By running the entire light client protocol on multiple peers do you mean fetching headers from multiple peers? I think the idea is to get the primary header from a single peer and then asynchronously fetch the same header from multiple peers to perform fork detection. If the headers differ, then the different header will also go through verification. If the header differs and also pass verification, we have a fork on the main chain. Is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is what I'm thinking.
Couple thoughts from review and discussions:
We also discussed mocking functions instead of data - so instead of traits flowing into the Verifier, we can use events that have all the fields we expect plus some closures for any data-structure specific computation we don't want to or can't do upfront (eg voting_power_in(), which verifies signatures ). A larger question I have is about how to better visualize the flow of events here, for instance how to handle cases where a component should output multiple events, or where the same event should go to multiple components. |
The light client requests validator sets and signed headers from peers. | ||
The order of data requested from peers is determined by the specific verification | ||
algorithm being followed. Specifically either sequential or bisection. | ||
Additionally the Requester is expected to request signed header for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may lead to mixed concerns between Requester and Detector modules. If you are trying to archive “better isolation of concerns”, it could be beneficial to leave the logic of picking what nodes to ask for headers to Detector module (also, when to ask for headers). This does not necessary mean Detector module would have to store a list of peers internally. It can use some kind of shortcuts (e.g. “all” - all backup nodes, “random” - random backup node, “randomN{N}” - random N backup nodes)
![Decomposition Diagram](assets/light-client-decomposition.png) | ||
|
||
### State | ||
The light client maintains state about commit/header/validators abbreviated as LightBlocks. It also has state pertaining to peers and evidence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LightBlocks will need to maintain ValidatorSet and NextValidatorSet. NextValidatorSet is used for bisection using the trustingperiod, where we check the NextValidatorSet of an old block agains the commit/validatorSet of the new block,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: is there something wrong with using ValidatorSet (not NextValidatorSet)? (other than validator set might have changed between ValidatorSet and NextValidatorSet)
If I understand correctly there is a single thread of execution, so all blocks shown are functional, correct?
I have many more questions :) is it possible to have a review/ working session for this? |
Also, why is there a "Relayer" there? |
Thanks for your thoughts @ancazamfir , I'll try to answer inline
Yes precisely; each flow is a synchronous method call and each delegation to a subcomponent is also a synchronous method call.
Do you mean the repeated references to the demuxer in each diagram? Yeah perhaps we can remove those and just emphasise the state changes.
I think in FastSync v2 we concluded that the scheduler didn't need to run in it's own go-routine. We've integrated that learning here and put everything in the same execution thread. What the scheduler gives us here is a component which encapsulates "next request" concerns while the demuxer encapsulate "delegation to multiple component concerns". That's the thinking at least.
We spoke about this yesterday with @josef-widder and our thinking has continued to evolve. What we want to do now is to augment the TrustedStore to contain light blocks it's received, verified and failed to verify. The TrustedStore will also contain a mapping of which peer sent which light block in which state. This way, the scheduler can make decision on which light block to request next based on the highest trusted and untrusted light block it has seen.
Yes it probably makes sense to just call this LightBlockRequest 🤦
Yes that detail is captured in the verifier but not presented in these diagrams.
Yes for sure. We are trying to get something working by the end of this week but then we can look forward to doing a full review as early as next week. |
Thanks for the update 👍 A few thoughts from me:
|
What's the status on this? Should we close it, merge it, update/expand it? |
I think at this point we should just close it and break it out into smaller ADRs which reflect the work we did with the Supervisor. |
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for the duplicated line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with the Callback approach is that they necessitate closures which have quite some ergonomics drawbacks, i.e. try semantics, scope and borrow complexities. As the general pattern seems to be that a bounded channel is constructed to transport the response it could be reworked to be the actual value send with the event to the Supervisor. Which in turn send back the response on the typed channel to the Handle
. As an added side-effect this could remove the need for the Callback
abstraction alltogether and fully embraces CSP style concurrency.
Some suggestions inline.
Co-authored-by: Alexander Simmerl <a.simmerl@gmail.com>
@xla That's a good point! Definitely something we'll need to revisit when we'll want to make things truly concurrent. |
Co-authored-by: Romain Ruetschi <romain@informal.systems>
@xla I think your right and would look forward to such a simplification but should those changes block the current PR considering it's already implemented. Might be good to get this landed to align some level of documentation with the implementation. What do you think? |
Yeah for sure! Wasn't sure if this ADR is still worked towards, but the implementation is there. Is there a follow-up issue already? Think a change to the internal Supervisor API could be in a separate ADR accompanied by a change-set implementing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚕 ➰ 🐺 🙎
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Architecture refactor the the light client based on various discussions
Rendered
Ref. #230, #167