A workspace cache all historical operations of this transaction.
- Bind a workspace for each transaction
- Before committing a transaction, any active abort only cleanup this workspace.
- On committing a transaction, push all accumulated changes to the relevant
DN
and execute 2PC commit process.
Workspace is created on committing.
PrePrepareCommit
: try push changes to statemachine- Any error, go to
PrepareRollback
- Else, go to
PrepareCommit
- Any error, go to
PrepareCommit
- Bind prepare timestamp
- Confliction check. Any error, go to
PrepareRollback
- Build WAL entry
- Append to WAL
- Enqueue flush waiting queue
PrepareRollback
- Notify coordinator aborted
- Enqueue commit waiting queue
- Wait WAL
- Notify coordinator prepared
- Enqueue commit waiting queue
- Wait Committed|Aborted
ApplyCommit
if committedApplyRollback
if aborted
- Fetch the catalog snapshot from a
DN
at the first time
+-----+ +-----+ +-----+
| DB1 | | DB2 | | DB3 |
+--+--+ +-----+ +-----+
- Check unique constraints base on the catalog snapshot. Return duplicated error if violiated.
- Fetch a unique database id and create a database entry
type DBEntry struct {
// Unique identity
Id uint64
// Database name: should be unique
Name string
// Create timestamp
CreatedAt []byte
// Delete timestamp
DeletedAt []byte
}
- Actively Abort Cleanup workspace only
- Commit
2PC commit process. Push all accumulated changes to the relevant
DN
- Fetch the catalog snapshot from a
DN
at the first time - Find the database entry base on the catalog snapshot. Return not-found error if not found.
- Update the entry as deleted
- Actively Abort Cleanup workspace only
- Commit
2PC commit process. Push all accumulated changes to the relevant
DN
Almost same as Create|Drop database
CN-Workspace
- Fetch the metadata snapshot and all cached data from the relevant
DN
at the first time - Dedup on the workspace local store
- Dedup on the snapshot
- Append to the workspace local store
- Actively Abort Cleanup workspace only
- Commit
2PC commit process. Push all accumulated changes to the relevant
DN
DN-Workspace
- Cache all writes
- In PrePrepareCommit, push all append nodes to the statemachine. Do delta dedup.
CN-Workspace
- Add delete node to the workspace local store
- Actively Abort Cleanup workspace only
- Commit
2PC commit process. Push all accumulated changes to the relevant
DN
DN-Workspace
- Cache all writes
- In PrePrepareCommit, push all delete nodes to the statemachine.
CN-Workspace
- Fetch the metadata snapshot and all cached data from the relevant
DN
at the first time - Provide a block-iterator
- Workspace local store block
- Snapshot blocks
- In-memory block
- Remote block (base block + delete file)
- Remote block + in-memory delta
Database name is "DBA", Table name is "TBLA". Snapshot is of PK="1".
- Try delete PK="1"
- Scan and find physical address of PK="1"
- Delete by physical address
- Insert a tuple with PK="1"
- Scan one column
- Commit
- Get the database snapshot from one
DN
when build the plan. Store it in the transaction workspace
+----------+
+------>| TBLA |
| +----------+
| +----------+
|------>| TBLB |
| +----------+
| +----------+
|------>| TBLC |
| +----------+
+----------------+ | +----------+
| DBA | ----------+------>| TBLD |
+----------------+ | +----------+
| +----------+
|------>| TBLE |
| +----------+
| +----------+
|------>| TBLF |
| +----------+
| +----------+
+------>| TBLG |
+----------+
-
Get the metadata snapshot and log tail of
TBLA
fromDN
. Store them in the transaction workspace -
Scan on the metadata snapshot
Refer metadata for details. Snapshot
+----------------------------------------------------------------------+ | Metadata Snapshot | +--------------+ +--------------+ +--------------+ +--------------+ | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | +---+----+-----+ +---+----+-----+ +---+----+-----+ +---+----+-----+ | 1 |xx/1| | | 2 |xx/2|yy/2 | | 3 |xx/3|yy/3 | | 4 | | | +---+----+-----+ +---+----+-----+ +---+----+-----+ +---+----+-----+
Log tail
+-------------+ | Deletes Map | +-------------+ +---------+ | 1 |----------> | DelMask | +-------------+ +---------+ | 3 |----------> | DelMask | +-------------+ +---------+ +-------------+ | Data Map | +-------------+ +---------+ | 4 |--------> | Batch | +-------------+ | +---------+ | +-----------+ | | Zonemap | +---> +-----+-----+ | Min | Max | +-----+-----+
Pseudocode
def ScanCol($colIdx): for $metaInfo in $snapshot: if $metaInfo.BaseLoc != "": $blkMeta = $cache.Get($metaInfo.BaseLoc) $data = $cache.Get($blkMeta[$colIdx].DataLoc) else: $data = $dataMap.Load($metaInfo.Id) if $metaInfo.DeltaLoc != "": $roDels = $cache.Get($metaInfo.DeltaLoc) $deletes = $deletesMap.Load($metaInfo.Id) $deletes = $deletes.Or($roDels) if $deletes == None: return $data $cloned = $data.Clone() $cloned.ApplyDeletes($deletes) return $cloned
TODO: Build ART tree for unsorted block in
CN
? -
Delete by physical address
For example, delete row 10 on block 2
+-------------+ | Deletes Map | +-------------+ +---------+ | 1 |----------> | DelMask | +-------------+ +---------+ | 3 |----------> | DelMask | +-------------+ +---------+ | 2 |----------> | [10] | --------- Newly added +-------------+ +---------+
-
Dedup
def Dedup($pk): for $metaInfo in $snapshot: if $metaInfo.BaseLoc != "": $blkMeta = $cache.Get($metaInfo.BaseLoc) $pkMeta = $blkMeta[$pkIdx] if $pk < $pkMeta.Min or $pk > $pkMeta.Max: continue $bf = $cache.Get($pkMeta.BfLoc) if not $bf.MayContains($pk): continue $data = $cache.Get($pkMeta.DataLoc) if not $data.Find($pk): continue if $pk is in deletes map: continue return Duplicated else: $data = $dataMap.Load($metaInfo.Id) $data apply deletes if not $data.Find($pk): continue return Duplicated return Ok
-
Append a new tuple
- Add a transient block in the workspace
+----------------------------------------------------------------------------------------+ | Metadata Snapshot | +--------------+ +--------------+ +--------------+ +--------------+ +--------------+ | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | +---+----+-----+ +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+----+---+ | 1 |xx/1| | | 2 |xx/2|yy/2 | | 3 |xx/3|yy/3 | | 4 | | | |Tid+0| | | +---+----+-----+ +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+-+--+---+ | | Transient block
- Append the tuple into the transient block
+-------------+ | Data Map | +-------------+ +---------+ | 4 |------+---> | Batch | +-------------+ | +---------+ | Tid+0 |---+ | +-----------+ +-------------+ | | | Zonemap | | +---> +-----+-----+ | | Min | Max | | +-----+-----+ | +---------+ |------> | Batch | | +---------+ | +-----------+ | | Zonmap | +------> +-----+-----+ | Min | Max | +-----+-----+
-
Scan one column. Same as step 3.
-
Commit
- PreCommit
- Collect delete nodes and transient blocks as commands
- Send collected commands to the relevant
DN
- DoCommit
- PreCommit
Database name is "DBA", table name is "TBLA".
- Insert tuples
- Bulk load a data block
- Delete a tuple
- Commit
- Get the database snapshot from one
DN
when build the plan. Store it in the transaction workspace - Get the metadata snapshot and log tail of
TBLA
fromDN
. Store them in the transaction workspace+----------------------------------------------------+ | Metadata Snapshot | +--------------+ +--------------+ +--------------+ | MetaInfo | | MetaInfo | | MetaInfo | +---+----+-----+ +---+----+-----+ +---+----+-----+ | 1 |xx/1| | | 2 |xx/2|yy/2 | | 3 | | | +---+----+-----+ +---+----+-----+ +---+----+-----+ +-------------+ | Deletes Map | +-------------+ +---------+ | 1 |----------> | DelMask | +-------------+ +---------+ +-------------+ | Data Map | +-------------+ +---------+ | 3 |--------> | Batch | +-------------+ | +---------+ | +-----------+ | | Zonemap | +---> +-----+-----+ | Min | Max | +-----+-----+
- Dedup
- Append tuples
+-----------------------------------------------------------------------+ | Metadata Snapshot | +--------------+ +--------------+ +--------------+ +--------------+ | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+---+----+ | 1 |xx/1| | | 2 |xx/2|yy/2 | | 3 | | | |Tid+0| | | +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+---+----+ +-------------+ | Data Map | +-------------+ +---------+ | 3 |------+---> | Batch | +-------------+ | +---------+ | Tid+0 |----+ | +-----------+ +-------------+ | | | Zonemap | | +---> +-----+-----+ | | Min | Max | | +-----+-----+ | +---------+ |-----> | Batch | | +---------+ | +-----------+ | | Zonmap | +-----> +-----+-----+ | Min | Max | +-----+-----+
- Load a data block
- Dedup
1. Fetch the block zonemap and bloomfilter 2. Dedup on each block of a snapshot
- Add into the metadata snapshot in the workspace
+------------------------------------------------------------------------------------------+ | Metadata Snapshot | +--------------+ +--------------+ +--------------+ +--------------+ +--------------+ | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | | MetaInfo | +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+---+----+ +-----+--+-----+ | 1 |xx/1| | | 2 |xx/2|yy/2 | | 3 | | | |Tid+0| | | |Tid+1| | | +---+----+-----+ +---+----+-----+ +---+----+-----+ +-----+---+----+ +-----+--+-----+ +-------------+ | Data Map | +-------------+ +---------+ | 3 |------+---> | Batch | +-------------+ | +---------+ | Tid+0 |----+ | +-----------+ +-------------+ | | | Zonemap | | Tid+1 |-+ | +---> +-----+-----+ +-------------+ | | | Min | Max | | | +-----+-----+ | | +---------+ | |-----> | Batch | | | +---------+ | | +-----------+ | | | Zonmap | | +-----> +-----+-----+ | | Min | Max | | +-----+-----+ | +---------+ |--------> | Batch | | +---------+ | +-----------+ | | Zonmap | |--------> +-----+-----+ | | Min | Max | | +-----+-----+ | +-----------+ +--------> | BFIndex | +-----------+
- Delete a tuple
- Scan by filter and get a matched tuple on block
Tid+1
- Delete by physical address
+-------------+ | Deletes Map | +-------------+ +---------+ | Tid+1 |----------> | DelMask | +-------------+ +---------+
- Scan by filter and get a matched tuple on block