-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[State Sync] Support transaction syncing from the account bootstrap version #318
Conversation
storage/aptosdb/src/lib.rs
Outdated
) -> Result<Box<dyn StateSnapshotReceiver<AccountStateBlob>>> { | ||
gauged_api("get_state_snapshot_receiver", || { | ||
// Get the target version and expected root hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msmouse -- this is the code that I mostly want to ensure is correct, i.e., once we've identified a version V
to download all account states, we update the various data stores and start downloading all the states. Once all states are downloaded, the db should be in the correct state to move forward. From the smoke test, I don't see any complaints from the executor, but let me know if there's a data store or something that's missed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added logic doesn't really belong here.. (it's more specific to state-sync)
At least give it a right name if you see the whole set of logic indeed belongs to the same function?
Is it that the helpers requires access to the stores so you have to do it on the DB side? How about refactor the restore_handler and use it from state sync? (I think the handler was created so that DB internal details are not exposed to the restore tooling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't quite understand what this is doing -- the txns are AT OR AFTER the state snapshot version, right? In my mind, we should do the snapshot first and then the ledger. Also strictly speaking, the txn, txn_out and events AT the version should be skipped, right? (you don't need them to start applying the next version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I'll handle this by exposing an additional method in DbWriter
that initializes all state once the accounts have all successfully synced 😄
@@ -261,6 +256,11 @@ impl<V: VMExecutor> ChunkExecutorTrait for ChunkExecutor<V> { | |||
)?; | |||
self.commit_chunk() | |||
} | |||
|
|||
fn reset(&self) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always hated the reset()
interface, and was willing to kill it after the legacy interfaces x_and_y()
gets removed together with state sync v1. Is it possible on the call site to just recreate the ChunkExecutor
?
(Doesn't feel too strong either, probably not worth a huge refactor.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it's gonna be a big-ish refactor to recreate it. If you're okay with it, let's leave it as is for now. We can clean it up when we come to handle the fine grain storage changes.
storage/aptosdb/src/backup/utils.rs
Outdated
/// This file contains utilities that are helpful for performing | ||
/// database restore operations, as required by db-restore and | ||
/// state sync v2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comments for the whole file better be with "///!" (and maybe brought up before the uses?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call it "restore_utils" to be a bit more explicit.
// The channel through which to notify the state snapshot receiver of new data chunks | ||
state_snapshot_notifier: Option<mpsc::Sender<StorageDataChunk>>, | ||
|
||
// The writer to storage (required for account state syncing) | ||
storage: Arc<dyn DbWriter>, | ||
} | ||
|
||
impl StorageSynchronizer { | ||
pub fn new<ChunkExecutor: ChunkExecutorTrait + 'static>( | ||
// TODO(joshlind): the compiler isn't happy with automatically deriving this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol.. any idea why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's because deriving Clone
only really works for simple cases. There's a bunch of cases where it doesn't, e.g.,: rust-lang/rust#64417 and rust-lang/rust#26925. In our case, we have to manually define it. I'll update the comment to mention this.
storage/aptosdb/src/lib.rs
Outdated
// Save the target ledger info | ||
utils::save_ledger_infos( | ||
self.db.clone(), | ||
self.ledger_store.clone(), | ||
&[target_ledger_info], | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we should really make sure all the epoch ending ledger infos to be available in every node -- assuming we don't publish any waypoints other than the genesis, trust from any downstream can only be established with all of those available.
(or maybe I misread it, and you did that outside of this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so this already happens in the bootstrapper. Before we get to syncing any data, the bootstrapper will fetch all epoch changes and verify them, e.g.,: https://github.com/aptos-labs/aptos-core/blob/main/state-sync/state-sync-v2/state-sync-driver/src/bootstrapper.rs#L32. It will also make sure the waypoint is verified along the way.
This function only gets given the most recent to write to the DB. So, we're holding all epoch changes in memory but only writing the latest when we do the account state sync. Does this make sense?
storage/aptosdb/src/lib.rs
Outdated
) -> Result<Box<dyn StateSnapshotReceiver<AccountStateBlob>>> { | ||
gauged_api("get_state_snapshot_receiver", || { | ||
// Get the target version and expected root hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added logic doesn't really belong here.. (it's more specific to state-sync)
At least give it a right name if you see the whole set of logic indeed belongs to the same function?
Is it that the helpers requires access to the stores so you have to do it on the DB side? How about refactor the restore_handler and use it from state sync? (I think the handler was created so that DB internal details are not exposed to the restore tooling)
storage/aptosdb/src/lib.rs
Outdated
) -> Result<Box<dyn StateSnapshotReceiver<AccountStateBlob>>> { | ||
gauged_api("get_state_snapshot_receiver", || { | ||
// Get the target version and expected root hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't quite understand what this is doing -- the txns are AT OR AFTER the state snapshot version, right? In my mind, we should do the snapshot first and then the ledger. Also strictly speaking, the txn, txn_out and events AT the version should be skipped, right? (you don't need them to start applying the next version)
✅ Deploy Preview for aptos-developer-docs canceled.
|
Thanks @msmouse, this is ready for another look. 😄 As discussed, the notable changes:
Let me know if I've missed anything! |
storage/aptosdb/src/lib.rs
Outdated
version: Version, | ||
output_with_proof: TransactionOutputListWithProof, | ||
) -> Result<()> { | ||
// Update the merkle accumulator using the given proof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I ask is to assert output_with_proof
is of length 1, at this point.
And document it in the interface.
Better if we pass in types that imply that.. :D
/// Get a (stateful) state snapshot receiver. | ||
/// | ||
/// Chunk of accounts need to be added via `add_chunk()` before finishing up with `finish_box()` | ||
fn get_state_snapshot_receiver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshLind - You might want to rebase as this API has changed, we no longer expose AccountStateBlob
in the DbWriter anymore.
Thanks, all! /land |
@JoshLind ❗ Unable to run the provided command on a closed PR |
What you talking about Aptos-bot? :P /land |
Forge run: https://circleci.com/gh/aptos-labs/aptos-core/6436 |
Forge run: https://circleci.com/gh/aptos-labs/aptos-core/6457 |
Forge run: https://circleci.com/gh/aptos-labs/aptos-core/6470 |
Motivation
This PR updates the state sync driver to support transaction/output syncing from the account bootstrap version. For example, if a node downloads all account states at a version
V
, the driver will begin syncing transactions/outputs fromV+1
. The flow is as follows:V
and fetch the epoch ending ledger info atV
.V
(to identify the expected state root hash and bootstrap the account merkle accumulator).V
and write the accounts to the database via thestate_snapshot_receiver
. Once all account states are written, the chunk executor will be reset (to force a read from the db).V
and start syncing fromV+1
.The PR offers the following commits:
reset()
method to the ChunkExecutorTrait. This is so that we can reset the executor after performing an account state sync.restore_utils
module so that we can share it between db-restore and state sync and update the state sync driver to start syncing from the bootstrapping version.Remaining steps before this is finalized:
Have you read the Contributing Guidelines on pull requests?
Yes.
Test Plan
There is a smoke test that covers some of this functionality. I've also manually inspected the logs to ensure the correct execution paths are taken. However, there's a still of bunch of testing that needs to happen in relation to proofs and malicious data responses.
Related PRs
None, but this PR relates to: #245