-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle incorrect shutdown of the off-chain GraphQL worker #1584
Comments
xgreenx
added a commit
that referenced
this issue
Jan 19, 2024
Closes #1549 ## Overview The change extracts the off-chain-related logic from the executor and moves it to the GraphQL off-chain worker. It creates two new concepts - Off-chain and On-chain databases where the GraphQL worker has exclusive ownership of the database and may modify it without intersecting with the On-chain database. ## Challenges caused by the change Delegating updating of the state to something other than `BlockImporter` causes several new problems: - The commitment to the on-chain and off-chain databases is done in different places. The off-chain database may be out of sync with the on-chain database due to race conditions. - The result of the block execution(receipts, statuses) is not stored anywhere and may be lost due to emergency shutdown. We don't want to duplicate on-chain data inside of the off-chain database, so the GraphQL service works with two sources of data, which leads to two problems: - The off-chain database may be out of sync with the on-chain database due to race conditions causing failing requests. - The view of the databases during the GraphQL request may change, causing invalid responses with a mix of old and new data. We had this problem before, but now it is more critical. ## Solutions to the challenges ### Out of sync The change applies two steps to solve this issue. The main one is a new trait for the database: ```rust /// Provides a view of the storage at the given height. /// It guarantees to be atomic, meaning the view is immutable to outside modifications. pub trait AtomicView<View>: Send + Sync { /// Returns the view of the storage at the given `height`. fn view_at(&self, height: BlockHeight) -> StorageResult<View>; /// Returns the view of the storage for the latest block height. fn latest_view(&self) -> View; } ``` Another one to await on the `BlockCommiter` side finishing processing the `ImportResult` by all listeners. The goal of the trait is to provide an immutable read-only view of the database at a specific time. However, this trait has not yet been implemented properly during this PR and will be implemented in the following PRs. The `view_at` requires functionality from #451. We already can implement the `latest_view` method via [`RocksDB::Transaction`](https://github.com/facebook/rocksdb/wiki/Transactions#reading-from-a-transaction), but it is better to do it after merging #1576. Waiting on the `BlockImporter` side is a temporary solution to not escalate the problem. But maybe we can keep it later to guarantee the consistent state of the blockchain. ### Losing result of execution The `AtomicView` trait also solves the issue of losing the state of the execution because it is possible to get a view of the database at a specific block height and execute the block again receiving the same execution result. Waiting inside the `BlockImporter` guarantees that we will not lose more than one `ImportResult`. ### Inconsistent database view within GraphQL requests The GraphQL now has `ReadDatabase`: ```rust pub type OnChainView = Arc<dyn OnChainDatabase>; pub type OffChainView = Arc<dyn OffChainDatabase>; pub struct ReadDatabase { on_chain: Box<dyn AtomicView<OnChainView>>, off_chain: Box<dyn AtomicView<OffChainView>>, } ``` It implements the `view` method that returns the `ReadView` type. The `ReadView` implements all required methods by using internal on-chain view and off-chain view. The `AtomicView` allows us to get the `last_view` of the off-chain database and get the `view_at(off_chain_view.last_height())` of the on-chain database creating a consistent view for both databases at a specific height. The change also adds a `ViewExtension` to the GraphQL that creates a `ReadView` for each request. ```rust /// The extension that adds the `ReadView` to the request context. /// It guarantees that the request works with the one view of the database, /// and external database modification cannot affect the result. struct ViewExtension; #[async_trait::async_trait] impl Extension for ViewExtension { async fn prepare_request( &self, ctx: &ExtensionContext<'_>, request: Request, next: NextPrepareRequest<'_>, ) -> ServerResult<Request> { let database: &ReadDatabase = ctx.data_unchecked(); let view = database.view(); let request = request.data(view); next.run(ctx, request).await } } ``` ## Implementation details - The `ExecutionResult` now also has receipts for the transaction along with its status. The off-chain worker will insert them later into the database, while the `dry_run` can fetch them immediately. - All API requests now work with the `ReadView` instead of the `Database` type. The `ReadDatabase` is only used in one place in the `ViewExtension`. - The `BlockImpoerter::comit_result` now is `async` and awaits for the previous block to be processed by all listeners. The execution of the `execute_and_commit` now runs `verify_and_execute_block` in the spawned task in the `tokio_rayon`. ## Follow up - #1580 - #1581 - #1582 - #1583 - #1584
This was referenced Jul 2, 2024
xgreenx
added a commit
that referenced
this issue
Jul 5, 2024
Before, we required all services to process the import result before committing a new result. It was done to reduce the gap between on-chain and off-chain databases and minimize the damage from the #1584 problem. The #1584 is no longer a problem with #2004 fix. Because of that, we can allow committing more blocks in parallel while other services are processing old ones. It improves synchronization speed because we have a buffer before we wait for other services to catch up. It is very actual for cases when other services are busy right now with other work, but soon will be available to process `ImportResult`. The default value size of the buffer is `1024`. ### Before requesting review - [x] I have reviewed the code myself --------- Co-authored-by: Hannes Karppila <2204863+Dentosal@users.noreply.github.com>
crypto523
pushed a commit
to crypto523/fuel-core
that referenced
this issue
Oct 7, 2024
Closes FuelLabs/fuel-core#1549 ## Overview The change extracts the off-chain-related logic from the executor and moves it to the GraphQL off-chain worker. It creates two new concepts - Off-chain and On-chain databases where the GraphQL worker has exclusive ownership of the database and may modify it without intersecting with the On-chain database. ## Challenges caused by the change Delegating updating of the state to something other than `BlockImporter` causes several new problems: - The commitment to the on-chain and off-chain databases is done in different places. The off-chain database may be out of sync with the on-chain database due to race conditions. - The result of the block execution(receipts, statuses) is not stored anywhere and may be lost due to emergency shutdown. We don't want to duplicate on-chain data inside of the off-chain database, so the GraphQL service works with two sources of data, which leads to two problems: - The off-chain database may be out of sync with the on-chain database due to race conditions causing failing requests. - The view of the databases during the GraphQL request may change, causing invalid responses with a mix of old and new data. We had this problem before, but now it is more critical. ## Solutions to the challenges ### Out of sync The change applies two steps to solve this issue. The main one is a new trait for the database: ```rust /// Provides a view of the storage at the given height. /// It guarantees to be atomic, meaning the view is immutable to outside modifications. pub trait AtomicView<View>: Send + Sync { /// Returns the view of the storage at the given `height`. fn view_at(&self, height: BlockHeight) -> StorageResult<View>; /// Returns the view of the storage for the latest block height. fn latest_view(&self) -> View; } ``` Another one to await on the `BlockCommiter` side finishing processing the `ImportResult` by all listeners. The goal of the trait is to provide an immutable read-only view of the database at a specific time. However, this trait has not yet been implemented properly during this PR and will be implemented in the following PRs. The `view_at` requires functionality from FuelLabs/fuel-core#451. We already can implement the `latest_view` method via [`RocksDB::Transaction`](https://github.com/facebook/rocksdb/wiki/Transactions#reading-from-a-transaction), but it is better to do it after merging FuelLabs/fuel-core#1576. Waiting on the `BlockImporter` side is a temporary solution to not escalate the problem. But maybe we can keep it later to guarantee the consistent state of the blockchain. ### Losing result of execution The `AtomicView` trait also solves the issue of losing the state of the execution because it is possible to get a view of the database at a specific block height and execute the block again receiving the same execution result. Waiting inside the `BlockImporter` guarantees that we will not lose more than one `ImportResult`. ### Inconsistent database view within GraphQL requests The GraphQL now has `ReadDatabase`: ```rust pub type OnChainView = Arc<dyn OnChainDatabase>; pub type OffChainView = Arc<dyn OffChainDatabase>; pub struct ReadDatabase { on_chain: Box<dyn AtomicView<OnChainView>>, off_chain: Box<dyn AtomicView<OffChainView>>, } ``` It implements the `view` method that returns the `ReadView` type. The `ReadView` implements all required methods by using internal on-chain view and off-chain view. The `AtomicView` allows us to get the `last_view` of the off-chain database and get the `view_at(off_chain_view.last_height())` of the on-chain database creating a consistent view for both databases at a specific height. The change also adds a `ViewExtension` to the GraphQL that creates a `ReadView` for each request. ```rust /// The extension that adds the `ReadView` to the request context. /// It guarantees that the request works with the one view of the database, /// and external database modification cannot affect the result. struct ViewExtension; #[async_trait::async_trait] impl Extension for ViewExtension { async fn prepare_request( &self, ctx: &ExtensionContext<'_>, request: Request, next: NextPrepareRequest<'_>, ) -> ServerResult<Request> { let database: &ReadDatabase = ctx.data_unchecked(); let view = database.view(); let request = request.data(view); next.run(ctx, request).await } } ``` ## Implementation details - The `ExecutionResult` now also has receipts for the transaction along with its status. The off-chain worker will insert them later into the database, while the `dry_run` can fetch them immediately. - All API requests now work with the `ReadView` instead of the `Database` type. The `ReadDatabase` is only used in one place in the `ViewExtension`. - The `BlockImpoerter::comit_result` now is `async` and awaits for the previous block to be processed by all listeners. The execution of the `execute_and_commit` now runs `verify_and_execute_block` in the spawned task in the `tokio_rayon`. ## Follow up - FuelLabs/fuel-core#1580 - FuelLabs/fuel-core#1581 - FuelLabs/fuel-core#1582 - FuelLabs/fuel-core#1583 - FuelLabs/fuel-core#1584
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It is possible that the node was shut down before we processed all imported blocks. It could lead to some missed blocks and the database's inconsistent state. Because the result of block execution is not stored on the chain, it is impossible to actualize the database without executing the block at the previous state of the blockchain. When
AtomicView<Storage>::view_at
is implemented, we can process all missed blocks and actualize the database here.Blocked by #451
The text was updated successfully, but these errors were encountered: