-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dump state of every epoch to S3 #8661
Conversation
cc: @mm-near |
// Create a connection to S3. | ||
let s3_bucket = config.client_config.state_sync_s3_bucket.clone(); | ||
let s3_region = config.client_config.state_sync_s3_region.clone(); | ||
let bucket = s3::Bucket::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we assume that credentials are magically set in the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added a comment.
Added a suggestion to an error message.
Will also add to the documentation.
// The actual dumping of state to S3. | ||
tracing::info!(target: "state_sync_dump", shard_id, ?epoch_id, epoch_height, %sync_hash, %state_root, parts_dumped, num_parts, "Creating parts and dumping them"); | ||
let mut res = None; | ||
for part_id in parts_dumped..num_parts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd move this code (until 242) into separate method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved as much as I can, because I can't move all of it.
The function will need to be async
because it calls put_object
. But passing chain
to an async
function is not an option, because Chain
isn't Send
.
/// for example 'STATE_SYNC_DUMP:2' for shard_id=2. | ||
fn state_sync_dump_progress_key(shard_id: ShardId) -> Vec<u8> { | ||
let mut key = b"STATE_SYNC_DUMP:".to_vec(); | ||
key.extend(shard_id.to_le_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this key shouldn't contain the epoch id information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not including epoch id in the key, because only one epoch dump per shard is possible at a time.
I can think of adding epoch id to the key for the purpose of storing the history of dumping epochs to external storage, but doesn't seem necessary.
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
* Start a thread per shard to do the dumping * AWS credentials are provided as environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * In `config.json` specify both `config.state_sync.s3_bucket` and `config.state_sync.s3_region` to enable the new behavior. * No changes to the behavior of the node if those options are not enabled in `config.json`. * State is persisted to RocksDB such that restarts of the node are well handled. * Some useful metrics are exported. * The node assumes it's the only node in the this and all alternative universes that does the dumping. * * Unclear how to use multiple nodes to complete the dump faster * TODO: Speed this up by doing things in parallel: obtain parts, upload parts, set tags * * Do we even need tags?
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
config.json
specify bothconfig.state_sync.s3_bucket
andconfig.state_sync.s3_region
to enable the new behavior.config.json
.