feat: Dump state of every epoch to S3 #8661

nikurt · 2023-03-01T12:21:20Z

Start a thread per shard to do the dumping
AWS credentials are provided as environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
In config.json specify both config.state_sync.s3_bucket and config.state_sync.s3_region to enable the new behavior.
No changes to the behavior of the node if those options are not enabled in config.json.
State is persisted to RocksDB such that restarts of the node are well handled.
Some useful metrics are exported.
The node assumes it's the only node in the this and all alternative universes that does the dumping.
- Unclear how to use multiple nodes to complete the dump faster
TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags
- Do we even need tags?

nikurt · 2023-03-01T12:46:37Z

Tags on state parts look like this:

nikurt · 2023-03-01T12:48:29Z

cc: @mm-near

core/chain-configs/src/client_config.rs

core/primitives/src/syncing.rs

nearcore/src/state_sync.rs

mm-near · 2023-03-08T17:27:05Z

nearcore/src/state_sync.rs

+    // Create a connection to S3.
+    let s3_bucket = config.client_config.state_sync_s3_bucket.clone();
+    let s3_region = config.client_config.state_sync_s3_region.clone();
+    let bucket = s3::Bucket::new(


here we assume that credentials are magically set in the environment?

Yes, added a comment.
Added a suggestion to an error message.
Will also add to the documentation.

nearcore/src/state_sync.rs

mm-near · 2023-03-08T17:50:44Z

nearcore/src/state_sync.rs

+                // The actual dumping of state to S3.
+                tracing::info!(target: "state_sync_dump", shard_id, ?epoch_id, epoch_height, %sync_hash, %state_root, parts_dumped, num_parts, "Creating parts and dumping them");
+                let mut res = None;
+                for part_id in parts_dumped..num_parts {


nit: I'd move this code (until 242) into separate method

Moved as much as I can, because I can't move all of it.
The function will need to be async because it calls put_object. But passing chain to an async function is not an option, because Chain isn't Send.

nearcore/src/state_sync.rs

chain/chain/src/store.rs

mm-near · 2023-03-08T18:02:57Z

chain/chain/src/store.rs

+    /// for example 'STATE_SYNC_DUMP:2' for shard_id=2.
+    fn state_sync_dump_progress_key(shard_id: ShardId) -> Vec<u8> {
+        let mut key = b"STATE_SYNC_DUMP:".to_vec();
+        key.extend(shard_id.to_le_bytes());


I also wonder if this key shouldn't contain the epoch id information

Not including epoch id in the key, because only one epoch dump per shard is possible at a time.
I can think of adding epoch id to the key for the purpose of storing the history of dumping epochs to external storage, but doesn't seem necessary.

* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?

nikurt added 5 commits February 28, 2023 17:25

feat: Dump state to S3

713b5b5

fix

904a49c

Metrics

7a22907

Metrics

648b077

Metrics

ff463af

Metrics

7d9e513

nikurt and others added 5 commits March 1, 2023 15:14

Add config options, change state part file naming - include num_parts.

c531891

Add config options, change state part file naming - include num_parts.

9f5eefa

Add config options, change state part file naming - include num_parts.

c353637

Save state part to RocksDB because why not.

d58c5ff

Merge branch 'master' into nikurt-statesync-s3

b880f4a

nikurt requested review from mm-near and mzhangmzz March 2, 2023 14:19

nikurt marked this pull request as ready for review March 2, 2023 14:19

nikurt requested a review from a team as a code owner March 2, 2023 14:19

Merge

eae3809

mm-near approved these changes Mar 8, 2023

View reviewed changes

nikurt and others added 6 commits March 9, 2023 19:10

Merge

9cf6e99

Merge branch 'master' into nikurt-statesync-s3

2aea76c

mm-near's comments

1324bff

Make StateSyncConfig fields optional.

9221bc5

Fix a compilation issue.

0ca50d3

Changelog and documentation.

31a6ad0

nikurt added the S-automerge label Mar 9, 2023

nikurt added 2 commits March 10, 2023 09:18

clippy

731a246

clippy

85811af

near-bulldozer bot merged commit 787268b into near:master Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Dump state of every epoch to S3 #8661

feat: Dump state of every epoch to S3 #8661

nikurt commented Mar 1, 2023 •

edited

Loading

nikurt commented Mar 1, 2023

nikurt commented Mar 1, 2023

mm-near Mar 8, 2023

nikurt Mar 9, 2023

mm-near Mar 8, 2023

nikurt Mar 9, 2023

mm-near Mar 8, 2023

nikurt Mar 9, 2023

feat: Dump state of every epoch to S3 #8661

feat: Dump state of every epoch to S3 #8661

Conversation

nikurt commented Mar 1, 2023 • edited Loading

nikurt commented Mar 1, 2023

nikurt commented Mar 1, 2023

mm-near Mar 8, 2023

Choose a reason for hiding this comment

nikurt Mar 9, 2023

Choose a reason for hiding this comment

mm-near Mar 8, 2023

Choose a reason for hiding this comment

nikurt Mar 9, 2023

Choose a reason for hiding this comment

mm-near Mar 8, 2023

Choose a reason for hiding this comment

nikurt Mar 9, 2023

Choose a reason for hiding this comment

nikurt commented Mar 1, 2023 •

edited

Loading