-
Notifications
You must be signed in to change notification settings - Fork 23
TwoCommit
There can occur situations, when during connection commit blobbers will have different states. Less than enough to recover data blobbers can receive write marker and commit connection S1, the rest of bobbers can stay with previous uncommitted state S0 (broken). If a client will decide to repair such allocation, there won't be enough bobbers to restore committed state and there is no way to rollback, since blobbers have already merged changes.
It can happen, when blobbers will crash during connection commit, or client will send commit only partially to blobbers.
To provide a way to get back to previous correct state (S0), even if it is committed on part of the network, we introduce double commit approach. We add new state of the filesystem "pre-committed" S1 (in yellow)
We should assume that client-blobber interactions (data upload, connection_commit) are decoupled from blobber-blockchain-validator interactions (write_marker submit, challenge complete). It is achieved mainly by using chains of writemarkers during completing challenges, which in order allows blobbers not to store all the versions of allocations, but use the latest one and do not store any earlier versions.
- user uploads 2 files, f1 and f2
- user commits them with writemarker W1
- files are moved to precommit directory with version 1
- user uploads f3 and deletes f2
- user commits with writemarker W2
- f1 and f2 are moved to filestore with version 1
- f3 and delete of f2 (REF table record) are moved to the precommit dir with version 2
- user uploads f4
- user commits with W3
- f2 is deleted in filestore, f3 is added changing fs version to 2
- f4 is moved to precommit changing the version to 3
- user uploads 2 files, f1 and f2
- user commits them with writemarker W1
- files are moved to precommit directory with version 1
- user uploads f3 and deletes f2
- user commits with writemarker W2
- f1 and f2 are moved to filestore with version 1
- f3 and delete of f2 (REF table record) are moved to the precommit dir with version 2
- user uploads f4
- user rollbacks with empty writemarker W2e
- precommit dir is discarded as well as temp dir and allocation gets back to version 1
- f3 is aploaded and committed with writemarker W3
The Write Marker is a cryptographic proof used to track and verify committed writes to blobbers, ensuring data integrity and synchronization across the network.
To maintain a consistent allocation state across blobbers, we introduce a common allocation versioning mechanism using the timestamp
field of the Write Marker.
In cases where a user needs to undo a partial or incorrect write, we utilize an Empty Write Marker as a rollback mechanism. This allows blobbers to revert to a previously committed state, ensuring data consistency and preventing unintended writes.
A Write Marker consists of the following fields:
type WriteMarker struct {
AllocationRoot string `json:"allocation_root"` // Root hash of the current allocation state
PreviousAllocationRoot string `json:"prev_allocation_root"` // Root hash of the previous state
FileMetaRoot string `json:"file_meta_root"` // Merkle root of the file metadata
AllocationID string `json:"allocation_id"` // Unique identifier for the allocation
Size int64 `json:"size"` // Size of the committed data
ChainSize int64 `json:"chain_size"` // Total committed data size in the chain
ChainHash string `json:"chain_hash"` // Cumulative hash of all Write Markers in the chain
ChainLength int `json:"chain_length"` // Number of markers in the chain
BlobberID string `json:"blobber_id"` // Blobber storing the data
Timestamp int64 `json:"timestamp"` // Timestamp marking the version of the allocation
ClientID string `json:"client_id"` // User issuing the Write Marker
Signature string `json:"signature"` // Digital signature for validation
}
Allocation versioning ensures that all blobbers maintain a consistent state of the stored data by assigning a common version to each allocation. This versioning mechanism prevents inconsistencies across the storage network, especially when handling concurrent writes, rollbacks, and state synchronization.
The version of an allocation is determined by the timestamp of the corresponding Write Marker during a commit operation. By enforcing a uniform versioning scheme, we ensure seamless synchronization across all blobbers in the network.
Each allocation version is defined by the timestamp
field of the Write Marker submitted during a write operation. This ensures that all blobbers in an allocation agree on the same version.
- All Write Markers issued for a single commit must have the exact same timestamp.
- The allocation version is determined by the
timestamp
of the Write Marker used at the pre-commit directory level. - Blobbers rely on this timestamp to synchronize allocation states and ensure consistency across storage nodes.
In cases where only a subset of blobbers receive a Write Marker, the allocation versioning mechanism allows the system to roll back to a previous state to maintain consistency. This prevents inconsistencies caused by network failures, incomplete writes, or misbehaving blobbers.
If blobbers operate with different versions of the same allocation, data inconsistency can arise. To mitigate this:
- Blobbers must reject Write Markers that do not match the expected allocation version.
- Clients must issue Write Markers with a uniform timestamp across all blobbers within an allocation.
- The previous allocation root is stored in the Write Marker, allowing verification of changes between versions.
A rollback is required when an allocation needs to be reverted to a previously committed state due to:
- Failed writes
- Partial commits (some blobbers accepted, others didn't)
- Incorrect data updates
An Empty Write Marker is used to perform a rollback, discarding uncommitted changes and restoring the last valid allocation state.
-
Identify the Target State:
- The rollback timestamp must match the
timestamp
of the pre-commit directory to be reverted. - The
allocation_root
in the rollback marker is set to the allocation root of the last valid state.
- The rollback timestamp must match the
-
Issue an Empty Write Marker:
- The rollback marker contains zeroed-out data changes, effectively nullifying the write.
- The
size
field in the rollback marker is set to the negative of the previous Write Marker size, canceling the write operation.
-
Update Chain Hash and Chain Size:
- The system recalculates the new chain hash and chain size, ensuring that the rollback is cryptographically linked to the write history.
This mechanism ensures that blobbers can independently verify and apply rollbacks, maintaining a consistent and reliable allocation state.
Writemarker is used as a proof of rollback, so instead of writemarker chains validator and blobber should operate on wrietmarker trees, pretty simple though.
The only situation when trees will be used is when rolled back allocation root is challenged, in that case validator provides following proof:
This tree shows to validator that blobber doesn't have allocation of version 1 locally, also it doesn't have any direct siblings of it, because it was rolled back, instead it has allocation version of n that was built on common ancestor root 0.
At the moment changes are stored to getAllocTempDir
(tmp) directory. We will add new directory getAllocPreCommitDir
(precommit). After changes are pre-committed they are moved to this directory from getAllocTempDir
. It will allow us to store S1 (pic) in the different isolated sandbox.
It is the main part of double-commit protocol, we pre-commit on CommitWrite, but instead of connectionObj.ApplyChanges
we move changes to getAllocPreCommitDir
dir.
When client commits write blobber does several actions:
- commit previously pre-committed data with
connectionObj.ApplyChanges
and moving them fromgetAllocPreCommitDir
to filesystem as before on commit - pre-commit current data that is speculative, move it to
getAllocPreCommitDir
fromgetAllocTempDir
##Rollback
We add Rollback, to be able to rollback pre-committed state S1(pic) to committed S0(pic) state. We do not delete pre-committed changes though, since these changes could be already committed to blockchain and challenged, instead we store them locally for some time, but remove from getAllocPreCommitDir
to some other temp dir. These changes won't be used anyhow in the future, only to complete challenges.
/v1/file
and /v1/dir
uses pre-committed state to manipulate it, so instead of merged state before, we will use files from connectionObj.ApplyChanges
section to build state upon