-
Notifications
You must be signed in to change notification settings - Fork 23
Repair
Hitenjain14 edited this page Mar 18, 2025
·
2 revisions
The repair process ensures that all blobbers in a decentralized storage allocation maintain a consistent version of data. This is necessary when:
- A blobber misses a commit, leading to data inconsistency.
- A user adds or replaces a blobber in the storage allocation.
Since data is erasure encoded using Reed-Solomon coding, we can recover the original data as long as at least data_shards
number of blobbers have the correct data. The repair process synchronizes all blobbers to the same allocation root, ensuring consistency and integrity.
Decentralized storage relies on multiple independent blobbers to store data redundantly using erasure encoding. However, due to failures, inconsistencies arise:
- A blobber may have missed a commit, leading to data mismatches.
- When a new blobber is added or replaced, it starts with an empty or outdated state.
- Data integrity needs to be enforced by ensuring all blobbers maintain the same allocation root, representing the latest version of stored data.
To resolve this, a structured repair process is required to restore all blobbers to the same version.
- The client fetches the allocation roots from all participating blobbers.
- The client groups blobbers into sets based on their allocation roots.
- The set with at least
data_shards
blobbers that share the same allocation root is considered the master set. - Blobbers not in the master set are secondary blobbers that require repair.
- A lead blobber is chosen from each set to act as a representative.
- The lead blobber lists all files in a paginated manner.
- The client processes file lists using a diff function to determine:
- Missing Files: Files present in the master set but absent in secondary blobbers.
- Extra Files: Files present in secondary blobbers but missing in the master set.
- Modified Files: Files with mismatched file hashes, indicating a need for update.
- Based on this analysis, file operations are queued for execution.
- Batch processing is used for high throughput.
- Files requiring repair are downloaded from the master set and uploaded to secondary blobbers.
- Pipelining: Data is streamed from the master set directly to secondary blobbers, preventing disk writes and maximizing throughput.
- The repair process iterates until all files are processed.
- Once all files are synchronized, all blobbers should have the same allocation root as the master set.
- This ensures that all blobbers in the allocation are fully synchronized and maintain data consistency.
sequenceDiagram
participant SDK
participant LeadBlobberMaster(Blobber1) as Lead Blobber (Master Set - Root A)
participant LeadBlobberSecondary(Blobber5) as Lead Blobber (Secondary Set - Root B)
participant Blobber2 as Blobber 2 (Master Set - Root A)
participant Blobber3 as Blobber 3 (Master Set - Root A)
participant Blobber4 as Blobber 4 (Master Set - Root A)
participant Blobber6 as Blobber 6 (Secondary Set - Root B)
participant DiffFunction as Diff Function
participant Executor as Operation Executor
%% Step 1: Fetch Allocation Roots and Consensus
SDK->>LeadBlobberMaster(Blobber1): Fetch Allocation Root (A)
SDK->>Blobber2: Fetch Allocation Root (A)
SDK->>Blobber3: Fetch Allocation Root (A)
SDK->>Blobber4: Fetch Allocation Root (A)
SDK->>LeadBlobberSecondary(Blobber5): Fetch Allocation Root (B)
SDK->>Blobber6: Fetch Allocation Root (B)
SDK->>SDK: Take Consensus (4 Data Shards)
SDK->>SDK: Form Master Set (Blobber1-4, Root A) & Secondary Set (Blobber5-6, Root B)
loop Paginated File Listing
SDK->>LeadBlobberMaster(Blobber1): List Files (Paginated)
SDK->>LeadBlobberSecondary(Blobber5): List Files (Paginated)
SDK->>DiffFunction: Compare Files Across Master (A) & Secondary (B)
DiffFunction->>SDK: Return Batch of Repair Operations
SDK->>Executor: Process Batch of Operations
end
SDK->>SDK: Repeat Until All Files Are Processed
SDK->>SDK: All Blobbers Now Have Same Allocation Root (A)