-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(repair): link shred collector with snapshot - root slot, leader schedule, verify shreds #169
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ot into shred verifier
- staked_nodes deserialization is not supported so i get the same data from vote_accounts - implementation for getSignedData was missing so i added that plus a bunch of supporting functions - also made some minor improvements to logger and service manager
dnut
changed the title
feat: derive leader schedule from snapshot, use it to verify shreds
feat(core, rand, shred-collector, utils): derive leader schedule from snapshot, use it to verify shreds
Jun 10, 2024
dnut
changed the title
feat(core, rand, shred-collector, utils): derive leader schedule from snapshot, use it to verify shreds
feat(accountsdb, core, rand, shred-collector, utils): derive leader schedule from snapshot, use it to verify shreds
Jun 10, 2024
dnut
changed the title
feat(accountsdb, core, rand, shred-collector, utils): derive leader schedule from snapshot, use it to verify shreds
feat: derive leader schedule from snapshot, use it to verify shreds
Jun 10, 2024
bugs: - infinite loop in iterator due to not incrementing - adding in dataIndex instead of subtracting - invalid proof should be index != 0
dnut
changed the title
feat: derive leader schedule from snapshot, use it to verify shreds
feat: link shred collector with snapshot. get root slot, calculate leader schedule, and verify shred signatures
Jun 13, 2024
dnut
changed the title
feat: link shred collector with snapshot. get root slot, calculate leader schedule, and verify shred signatures
feat: link shred collector with snapshot. root slot, leader schedule, and verify shreds
Jun 13, 2024
dnut
changed the title
feat: link shred collector with snapshot. root slot, leader schedule, and verify shreds
feat: shred collector <-> snapshot: root slot, leader schedule, verify shreds
Jun 13, 2024
dnut
changed the title
feat: shred collector <-> snapshot: root slot, leader schedule, verify shreds
feat: link shred collector and snapshot - root slot, leader schedule, verify shreds
Jun 13, 2024
dnut
changed the title
feat: link shred collector and snapshot - root slot, leader schedule, verify shreds
feat: link shred collector with snapshot - root slot, leader schedule, verify shreds
Jun 13, 2024
0xNineteen
changed the title
feat: link shred collector with snapshot - root slot, leader schedule, verify shreds
feat(repair): link shred collector with snapshot - root slot, leader schedule, verify shreds
Jun 24, 2024
0xNineteen
requested changes
Jun 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm - just a few things
…tSlotsInEpoch calculation
0xNineteen
previously approved these changes
Jun 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - awesome pr 🔥
InKryption
approved these changes
Jul 1, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Previously, the shred collector would start collecting shreds from an arbitrary point defined on the CLI, and it was not verifying signatures. The snapshot would be loaded separately, and have no impact on this process.
There are two big problems with the existing approach:
New
This change links the snapshot with the shred collector. Now the snapshot is processed first, and then it is used to inform the shred collector about where to start, and about how to verify shreds (the leader schedule). This includes:
--test-repair-for-slot
This required the implementation of a random number generator to be consistent with the leader schedule calculation done by the agave client. Zig std includes a chacha rng, but it is not compatible with rust's rand_chacha, since the rust crate is not IETF compliant, and it uses a novel approach to reuse previously generated random data.
Leader schedule flexibility
This introduces three ways to use the leader schedule:
validator
command will calculate the leader schedule from the snapshot and use it to verify shreds.--leader-schedule
option you can pass in the leader schedule from the CLI, instead of calculating it.leader-schedule
command you can calculate the leader schedule and print it to stdout.Issues
Currently there is a major limitation with the current approach. The node stake weights in the snapshot are slightly different from the stakes that agave uses when it calculates the leader schedule. This leads to the incorrect leader schedule being generated. If you try to run it by deriving the leader schedule from the snapshot, shreds will fail signature verification. This is why there is an option to input a known leader schedule from the CLI.
The issue may be that the stakes in the snapshot represent the values at a different point in time from when the leader schedule is supposed to be derived from. I haven't dug into this issue yet and it needs to be addressed later. For now, we can get the leader schedule from the solana cli, and provide it to sig with
sig validator --leader-schedule
.Code changes
VoteAccounts.stakedNodes
for lazy deserialization of staked node hashmap (same approach as agave)EpochSchedule.getSlotsInEpoch
validator
command and pass it to shred collector.leader-schedule
command to print the leader schedule in the same format as solana cli.--leader-schedule
option tovalidator
command to allow passing a known leader schedule, instead of calculating it.AppBase
: application-wide state that needs to be initialized for every command.loadSnapshot
function that is reused in the validator and leader-schedule commands to load the snapshotLoadedSnapshot
: all the state that is produced by loadSnapshotleader_schedule.zig
new fileleaderSchedule
: calculate the leader schedule from the minimum required inputs.leaderScheduleFromBank
: conveniently determine the leader schedule from a bank.SlotLeaderProvider
: Abstraction to represent any approach of providing a slot leader.SingleEpochLeaderSchedule
: Basic slot leader provider that can only handle the epoch it was initialized with.ChaChaRng
: generate the same stream as rust's rand_chacha crate.ChaCha
(barebones chacha stream generator) withBlockRng
(rng state manager)WeightedRandomSampler
: randomly select the same items as the rand crate's WeightedIndex.Logger.logf
so you can pass the log level as a runtime parameter.Run
calculating leader schedule from snapshot
Currently by default, sig uses "test_data" with snapshots that are not valid for any actual cluster, rather than downloading a snapshot. To run this code using a snapshot, you'll likely need to customize sig to use a different snapshot directory. This will tell sig to download and load a fresh snapshot from the cluster. It will then use this snapshot to calculate the leader schedule.
You can create a new snapshot directory like this:
To download the snapshot and print the leader schedule derived from the snapshot (outputs the same format as solana leader-schedule):
To download the snapshot and run the validator using the leader schedule derived from the snapshot:
use known leader schedule
If you want to provide a leader schedule to sig, rather than calculate it, you can pass it in using the cli option
--leader-schedule
. If you don't want to download a snapshot, you should also pass the start slot in the CLI.You can create a leader schedule file using either of these commands:
Pass the file to sig like this:
Or pipe it directly. Specify
--
to indicate stdin.