-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supply data splitting in system crash scenario for Toolkit #5700
Comments
|
Of course not, the content description refers to the snapshot dataset generated by the lite tool, which is used for a fullnode quickstart. |
One more question: Does it crash during snapshot generation using Litetool? |
No, the crash means that the entire database may have experienced DbStore chaos due to disasters such as abnormal shutdowns, but the data of the entire database is complete. Titles don't convey everything well because of the length limitation. Your confusion can all be answered from the content, so I can now assume that the title is confusing to you. I haven't found a more suitable title right now, so do you have any suggestions? |
@tomatoishealthy No more confusion for now. |
I want to state this issue at Core Devs Community Call 12 |
Development has already begun and is expected to be completed next week. |
Development is basically completed: https://github.com/tomatoishealthy/java-tron/tree/hotfix/snapshot-inconsistent-for-litetool Plan to merge in release v4.8.0. |
Trace #5876 for working. |
Background
The current database adopts a multi-instance model with a checkpoint mechanism to ensure the atomicity for write-op. The Checkpoint has evolved through two versions.
v1: Retain all data changes of the latest block and write them atomically to the
tmp
database. When the service stops abnormally, restore the complete data of the latest block from thetmp
database. However,tmp
only retains the data changes of the latest block, so there may be data inconsistencies in the crash scenario.v2: Retain all data changes of the blocks within the last 2 minutes to simulate the function similar to WAL and solve the problem of inconsistent data in the crash scenario.
Now, when using the Toolkit to split a snapshot, the data in the Checkpoint and DbStore need to be merged to obtain the complete snapshot data.
When generating a snapshot with Checkpoint v2, only the data of the latest block is read. However, Checkpoint v2 is composed of multiple consecutive blocks. This behavior may miss some data.
Example: One Checkpoint whose version is v2 retains the data of three blocks, which are
block1
&block2
&block3
. It is expected to obtain the block body ofblocknumber = 2
which only exists in block2 (and due to a crash, DbStore did not persist it in time.). But if only the data ofblock3
is retrieved, the result will be null, and all blocks should be traversed in reverse order to obtain the corresponding data.Rationale
For security and convenience considerations, an external interface should be provided and responsible for all database query operations. It is prohibited to skip this interface to access the database.
The interface should meet the following conditions:
Specification
checkpointV2FlatMap
: Store the merged Checkpoint v2 datainitFlatCheckpointV2()
: InitializecheckpointV2FlatMap
getDataFromSourceDB()
: The query interface, all data is queried through this interfaceTest Specification
Build a data set with the service stops normally (such as
kill -15
), split the data set with both the new version tool and the old one, and then compare the data consistency.Construct a data set with the service stops abnormally (such as
kill -9
), split the data set respectively, and then compare data consistency.Scope Of Impact
This issue is to fix data inconsistency incurred by splitting DB with Toolkit and will not affect the fullnode.
Implementation
The changes:
checkpointV2FlatMap
checkpointV2FlatMap
in order which isinitFlatCheckpointV2()
. This logic needs to be placed in the first step of service startup to ensure that any subsequent read operations can get the correct data.getDataFromSourceDB()
, and all queries from the original database are unified through this interfaceFor
getDataFromSourceDB()
:checkpointV2FlatMap
. If the result is empty, try to read it from the DbStoretmp
. If the result is empty, try to read it from the DbStoregetDataFromSourceDB()
logicInit
checkpointV2FlatMap
The text was updated successfully, but these errors were encountered: