Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consensus] Add support for commonware-consensus::authority #111

Closed
wants to merge 186 commits into from

Conversation

patrick-ogrady
Copy link
Contributor

@patrick-ogrady patrick-ogrady commented Oct 8, 2024

TODO

  • Allow requester to provide a "hint" for how many blocks it is missing (so sender can send a batch back)
  • If message not sent to random target, don't wait for timeout
  • Request channel rate limit when fetching blocks (to avoid halting message processing): [consensus/authority] Add fetch rate limiting #136
  • Add rebroadcast when stuck as a difference
  • Make propose/verify/notarization/finalization ordered for application but don't block processing of other messages (particularly in voter)
  • Add rand_distr sleep to each function in mocks::Application
  • Ensure uptime/faults are externalizable (can be proven externally)
  • add example (clock): [examples/clock] commonware-consensus example #137
  • support (+ test) catching up with "null tip" (it is possible that a node may be missing a notarized block that isn't yet finalized or the requisite null block notariztions to verify a new block): [consensus/authority] "Slow Validator" and "Recovery Catchup" Tests #138
  • add test for long propose/verify (return result after timeout, so only vote null blocks): [consensus/authority] "Slow Validator" and "Recovery Catchup" Tests #138
  • immediate null vote for proposers that have been offline (no observed activity for X views) + add to description for differences (walk back over activity timeout): [consensus/authority] "Slow Validator" and "Recovery Catchup" Tests #138
  • Replace "Missing" resolver reply with an empty array of responses: [consensus/authority] "Slow Validator" and "Recovery Catchup" Tests #138
  • Re-trigger proposal build on failure (may receive a new block or notarization that makes possible) + add min rebuild gap: [consensus/authority] "Slow Validator" and "Recovery Catchup" Tests #138
  • Add "stop height" where stop proposing/voting Up to application
  • Add metrics + cleanup code (currently a mess)
    • Standardize request functionality (move to a separate channel to avoid impacting performance at tip?) + add concurrency?
    • Revisit naming of Voter/Resolver (given both serve requests for data)
  • Add support for cross-view faults? (i.e. vote for a block that conflicts with an earlier notarization? -> to be stateless + externalizable, these faults would need to be signed by consensus) -> May not be possible because you would need to prove the absence of a null notarization for a given view to say that someone couldn't have voted for a certain branch
  • Add an associated type to the Application/Supervisor/Finalizer trait that allows a consensus dialect to provide consensus-specific context (+ represent subtle differences between View/Epoch/Height/etc without abstraction hell): [consensus] Add trait objects #146
  • Move hasher to external crate: [cryptography] Hasher #145
  • Extract unrelated changes: [consensus] Extract unrelated changes #147
  • Add Storage implementation (powered by Runtime) to preserve blocks and vote record. This is particularly necessary to sync to a new network without OOMing (which requires iterating backwards from tip with commonware-consensus::authority) + ensure safety with voting (could restart and vote for conflicting block)
    • During instantiation, ask application for last_verified block and replay from there
  • Add option for application to start from tip (no backfill)
    • Modify clock to use this mode and begin voting as soon as have seen a finalization of some container (which floors the time).
  • Add option to fetch missing blocks externally (like an API)
  • Allow deterministic::Runtime to have a reset function for shutdown (not in trait)

Open Questions

  • Right now, the inner payload doesn't maintain the block metadata but this information will need to be accessed to verify signatures over this data during syncing/linking (although this is adequately provided by deserialize_payload). Should we remove this wrapper and require parse to be called on the entire block? This probably comes down to how useful it is for an application record such info in the payload (to support some VM)...almost always, that VM has some notion of height/parent to build compatible blocks and thus this data will be replicated. At the same time, however, things like view aren't always present (i.e. the EVM) and requiring this to be provided by the payload could require breaking changes in an external VM.
    • What can get particularly weird here is when there is a parent hash for the outer and a parent hash for the inner (that reflects some VM's encoding). If we assume the implementations are entirely separate, then there is no need to do any conversions/etc and the inner can operate obliviously (only providing a hash for the outer hash construction).
    • With threshold + epoch-based constructions, it is likely that there will need to be agreement over complex out-of-block data (like commitments) and that there may be blocks (at the end of an epoch) that have no meaning to the application (and it may not even be called). How would parent hashes work here (if the outer block points to a block that wasn't relevant to the application)? Consensus could entirely obfuscate things and provide only a parent hash (of the previous payload) -> leaves everything related to heights/views/etc to application.
    • We could get away with not providing any context to the application at all (other than the payload parent hash) but that might make it difficult to know which votes/proofs to include or hard for third-party mechanisms to index things (but that might be for naught as they could use the wrapper -> would simplify application logic)? I would call this "applications as a stream" of linked hashes. Other issue would be that we couldn't identify which votes were for the canonical chain of payloads (as they include hashes for the outer blocks, not the payloads).
  • Should we maintain any shared traits? Each consensus implementation will be slightly different and adhering to a set of generic interfaces may be more confusing than it is worth (i.e. complex Rosetta implementations)? In practice, it seems reasonable that applications need to implement traits for all the consensus constructions they want to support. For example, we could have a base mocks::Application and then implement consensus-specific traits in each consensus crate.
  • We can either make data from consensus accessible via push (calling methods on some trait supported by the application) or pass some callable context to the application to make use of however it sees fit. The benefit of the latter is that the application definition can be much simpler (and may require said application to track a lot less as it can call consensus to get whatever it needs). In any case, there are still events that require consensus to call the application (like to "parse" bytes) but not everything needs to be structured this way.
    • The tempting result of pursuing a pull-based design is that builders can increase complexity as they see fit + easier to add new functionality to consensus without breaking existing integrations.

Path to main

  1. Integrate storage (via ~Runtime::Disk trait): [runtime] Add support for Storage and Blob #148
  2. Add pruneable data structure for managing views (can divide index by 10 to reduce number of separate files): [storage] Introduce Journal #149
  3. Add pruneable data structure for managing blocks: [storage] Introduce Archive #152
  4. Add Closer to runtime for "graceful shutdown": [runtime] Implement Graceful Shutdown #157
  5. Add batched singleton storage: [storage] Add support for storing a (batched) singleton #168
  6. Add index storage (for block height -> digest): [storage] Add index to Archive #176
  7. Migrate to new Backfiller architecture (not stateful): [consensus] Migrate to Backfiller + Review Design #177
  8. Integrate storage constructions into consensus to support restart
  9. Dynamically increase null vote timeout (set min/max timeout in config)
  10. Add full network shutdown and restart test (run many times and ensure no safety failure)
  11. Skip backfill mode (load last_processed from application where None means start at tip, 0 means start at genesis, and n` means to start from some arbitrary height) -> provide to consensus init
  12. Add metrics + cleanup code

@patrick-ogrady patrick-ogrady changed the title [consensus] Add support for commonware-consensus::static [consensus] Add support for commonware-consensus::authority Oct 16, 2024
Copy link

coderabbitai bot commented Oct 28, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@patrick-ogrady
Copy link
Contributor Author

Replaced by #180

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 90.94274% with 367 lines in your changes missing coverage. Please review.

Project coverage is 93.08%. Comparing base (793d38e) to head (1c6ce0f).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
consensus/src/authority/actors/voter/actor.rs 88.03% 184 Missing ⚠️
consensus/src/authority/actors/resolver/actor.rs 86.44% 100 Missing ⚠️
consensus/src/authority/prover.rs 93.56% 32 Missing ⚠️
consensus/src/authority/mocks.rs 96.65% 28 Missing ⚠️
consensus/src/authority/byzantine/conflicter.rs 95.62% 7 Missing ⚠️
consensus/src/authority/engine.rs 91.86% 7 Missing ⚠️
consensus/src/authority/byzantine/nuller.rs 94.05% 6 Missing ⚠️
consensus/src/authority/actors/voter/ingress.rs 93.54% 2 Missing ⚠️
consensus/src/authority/actors/resolver/ingress.rs 96.66% 1 Missing ⚠️
@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
+ Coverage   93.04%   93.08%   +0.04%     
==========================================
  Files          56       67      +11     
  Lines       11505    17649    +6144     
==========================================
+ Hits        10705    16429    +5724     
- Misses        800     1220     +420     
Files with missing lines Coverage Δ
consensus/src/authority/encoder.rs 100.00% <100.00%> (ø)
consensus/src/authority/mod.rs 96.98% <ø> (ø)
consensus/src/authority/actors/resolver/ingress.rs 96.66% <96.66%> (ø)
consensus/src/authority/actors/voter/ingress.rs 93.54% <93.54%> (ø)
consensus/src/authority/byzantine/nuller.rs 94.05% <94.05%> (ø)
consensus/src/authority/byzantine/conflicter.rs 95.62% <95.62%> (ø)
consensus/src/authority/engine.rs 91.86% <91.86%> (ø)
consensus/src/authority/mocks.rs 96.65% <96.65%> (ø)
consensus/src/authority/prover.rs 93.56% <93.56%> (ø)
consensus/src/authority/actors/resolver/actor.rs 86.44% <86.44%> (ø)
... and 1 more

... and 6 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 793d38e...1c6ce0f. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant