-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dump state to s3 by multiple nodes #9049
Conversation
when we roll out this, can we somehow include some baseline template to config file to give users some hint on how to enable the feature? |
Is it common for multiple nodes to end up trying to upload the same part? |
nearcore/src/state_sync.rs
Outdated
let mut existing_nums = HashSet::new(); | ||
for name in file_names { | ||
let splitted: Vec<_> = name.split("_").collect(); | ||
let part_id = splitted.get(2).unwrap().to_string().parse::<u64>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to a function. String manipulation is error-prone, and moving it to a separate function:
- abstracts away the complexity
- lets us unit-test the complexity
For now, we don't expect users to dump state yet, we will start expecting that once the decentralized version is ready. |
this will happen from time to time, we expect this to be less common at the start of dump of an epoch, and become more frequent as number of parts left to dump for an epoch decreases. |
@@ -446,31 +447,43 @@ trait StatePartReader { | |||
fn get_state_part_reader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can be refactored to remove StatePartReader
and StatePartWriter
, but it's fine to do it in a followup PR.
nearcore/src/state_sync.rs
Outdated
async fn state_sync_dump( | ||
shard_id: ShardId, | ||
chain: Chain, | ||
epoch_manager: Arc<dyn EpochManagerAdapter>, | ||
shard_tracker: ShardTracker, | ||
runtime: Arc<dyn RuntimeAdapter>, | ||
chain_id: String, | ||
restart_dump_for_shards: Vec<ShardId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider removing restart_dump_for_shards
from ClientConfig
and from Config
(i.e. NearConfig::config
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, restart is still handy to have. In situations where latest epoch is marked as alldumped by error, unless we wipe out data from rocksdb, the node will never dump for the epoch again, restart_dump_for_shards
solves this issue.
nearcore/src/state_sync.rs
Outdated
parts_to_dump.len().try_into().unwrap(); | ||
let mut cnt_parts_dumped = num_parts - cnt_parts_to_dump; | ||
let timer = Instant::now(); | ||
while timer.elapsed().as_secs() <= 60 && !parts_to_dump.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define a constant instead of magic value 60
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
56f5a85
to
5051015
Compare
To enable dumping with multiple nodes, do the following:
to
~/.near/config.json
on your node;2. run
AWS_ACCESS_KEY_ID="xxxx" AWS_SECRET_ACCESS_KEY="xxx" ./target/release/neard run
How the multi-node dump work?
Each node will have one thread per shard, and will check s3 for missing parts per shard for the epoch it's dumping every 60 seconds. After checking missing parts, it will draw one part_id without replacement from the missing parts, generate and upload to s3 repeatedly until time's up. Then the thread repeat the process until all parts of the specific shard and epoch is dumped. Then the thread switch to the latest epoch that their chain head is in.
The nodes that are dumping state do not need to communicate to each other, they use s3 to check what's dumped and what's not.
The frequency of checking s3 for missing parts will be a trade off between repeated dumps and s3 latency: the more often the check, the less repeats happen, but the more s3 check latency hurts the speed of dump. May end up using a frequency different than 60s.