Skip to content

Tapir Deep Dive

olesya13 edited this page Feb 23, 2018 · 4 revisions

Objective

  • Verify the correctness of Tapir
  • Evaluate the current implementation of Tapir

Questions

For this section: paper - Tapir Paper Extended Version; TLA+ spec for IR - in github repo, TLA+ spec for TAPIR is not complete; code - in github repo

IR operation processing protocol with view change (paper, p.5)

Paper TLA Spec Code Answer/Decision
Item 1 in 3.2.2: If a client receives responses with different view numbers, it notifies the replicas in the older view None Not implemented
Item 4 in 3.2.2: client waits until it receives f+1 CONFIRM responses in the same view before returning result to the application protocol. client received a quorum of confirms and take the result in the biggest view number Implemented (client.cc, lines 421-445): client waits until it receives f+1 CONFIRM responses. Not implemented: responses in the same view
Consensus op processing, item 4 in 3.2.1: Otherwise, the client takes the slow path: once it receives f + 1 responses the same as paper client.cc, lines 183-231: Couldn't find Quorum size check

IR view change protocol (paper, pp.6-7) - not implemented

Paper TLA Spec Code Answer/Decision
Item 3: Once the new leader receives f records from f other replicas, it considers all records with the highest value of v' (last view number) and join these into a master record R. - which might be less than f+1, but later in the proof: The leader merges all operations from the records of f+1 non-faulty replicas into the master record, considers all records with the highest value of v' (last view number)
At the top of p.7: The view change is complete after at least f + 1 replicas have exchanged and merged records and SYNC’d with the master record. None
Unclear how Merge(d, u) works on tentative consensus operations. Why do we need f/2 + 1 for d set?
  • Item 1 in view change protocol (on page 6) says "A replica notices the need for a view change either based on a timeout, because it is a recovering replica, or because it received a DO-VIEW-CHANGE message for a view with a large number than its own current view-number." What timeout is it?

  • Item 3 in view change protocol (on page 6) says "Once the new leader receives f records from f other replicas, it considers all records with the highest value of v'. It uses a merge function to join these into a master record R." Why does the leader receive f records, not 1 record, or f+n records?

  • Inside Merging Records section (on page 6), the paper says "IR asks the application protocol to decide the consensus result for the remaining TENTATIVE consensus operations, which either: (1) have a matching result, which we define as the majority result, in at least f/2+1 records or (2) do not." Why f/2 + 1, not f+1, or some other numbers? We only got matching result from f/2+1 replicas, why do we call it majority result?

  • On page 6, the paper says "For operations in d, IR cannot tell whether the operation succeeded with the majority result on the fast path, or whether it took the slow path and the application protocol decide'd a different result that was later lost. For example, in the lock server, OK could be the majority result if only f/2+1 replicas replied OK, but the other replicas might have accepted a conflicting lock request. However, it is also possible that the other replicas did respond OK, in which case OK would have been a successful response on the fast-path." I don't understand this part. By counting the number of matching results, we can figure out if an operation in d is from fast path, right? What is the meaning of *the application protocol decide'd a different result that was later lost"?

TAPIR transaction processing protocol and IR support (paper, pp.10-11)

Paper TLA Spec Code Answer/Decision
Tapir-Decide: why does it return Abort if any replica returns Abort?
Tapir-Merge: IR operations are unordered, to merge d and u sets in which order should we take operations? Order might influence result of merge, since we do OCC-Check. We do not care? Tapir-Merge is not implemented

TAPIR coordinator recovery (paper, pp.12-13) - no TLA+ spec, no implementation

Paper TLA Spec Code Answer/Decision
Coordinator changes: Is there any relation between coordinator view and replica view? When do we need Merge() and Sync() for coordinator change? Do we merge replica records into new master copy in backup shard when we do coordinator change?
Coordinator changes, item 4: the replica sends StartCoordinator. Which replica, the one that initiated coordinator change?
Cooperative termination, item 1: the backup coordinator polls the participants with Prepare. Where does backup coordinator get unfinalized transactions, from its prepared-list? If they are from prepared-list, what are the guarantees that this list is up-to-date?

TAPIR correctness

  • In Atomicity section (on page 14), the paper says "TAPIR replicas always execute Commit, even if they did not prepare the transaction, so Commit will eventually commit the transaction at every participant if it executes at one participant." If the replica did not prepare a transaction, how can it commit the transaction?

  • In Durability section (on page 15), the paper says "On Commit, TAPIR replicas use the transaction timestamp included in Commit to order the transaction in their log, regardless of when they execute it, thus maintaining the original linearizable ordering." Not sure how to interpret regardless of when they execute it.

TAPIR Use Case Verification

  • View Change Mechanism: We would like to use one example to clarify our understanding on view change mechanism in IR. Suppose we have a 5-host cluster where host h1 is on version v1, host h2 is on version v2, and host h3, h4, and h5 are on version v3. There are several status options for the 5-host cluster described, for instance: 1. all in normal status (looks impossible), 2. h1 failed, everyone else normal (looks possible), 3. h1 and h2 failed, everyone else normal (looks possible). Are these configurations possible? If possible, then how do host h1 and h2 discover that they require view change? And how do h1 and h2 eventually upgrade to version v3? By reading the paper, we have a overall idea on how view change works, e.g. client identifies mismatch view versions and then trigger view change process, the hosts elect one leader to coordinate view change, etc. But we hope someone can lead us go through the process once to ensure that our understanding is correct.

  • Operation Order: IR does not guarantee operation orders on replicas. But operation order is important in transactions. We need one example to demonstrate how Tapir handles this situation.

Replica 1 Replica 2 Replica 3
Client 1 sends Prepare for txn(put(A, value1), timestamp = 1, id = 1_1)
Prepare: res = OK, status = tentative Prepare: res = OK, status = tentative message lost
Add txn1_1 to prepared list Add txn1_1 to prepared list
IR client takes slow path for txn1_1
Client 2 sends Prepare for txn(put(A, value2), timestamp = 2, id = 2_1)
OCC Check OCC Check
Prepare: res = OK, status = tentative Prepare: res = OK, status = tentative Prepare: res = OK, status = tentative
Add txn2_1 to prepared list Add txn2_1 to prepared list Add txn2_1 to prepared list
IR client takes fast path for txn2_1, sends Finalize
Client2 commits txn2_1
A = value2 A = value2 A = value2
IR client decide for txn1_1, sends Finalize with res = OK
Client1 commits txn1_1
A = value1 A = value1 A = value1

Running Tapir

Install dependencies

# install linuxbrew on ubuntu
sudo apt update && sudo apt install linuxbrew-wrapper

# install dependencies with brew
brew install pkg-config
brew install protobuf
brew install openssl
cd /usr/local/include 
ln -s ../opt/openssl/include/openssl .

Build

cd $TAPIR_HOME && make clean && make build

Run Test

cd $TAPIR_HOME/store/tools && run_test.sh

Pending Issues

  • run_test.sh requires three hosts to run
  • The config files and scripts in $TAPIR_HOME/store/tools should be adjusted to run run_test.sh

References