RFC: In-memory Pessimistic Locks #77

sticnarf · 2021-10-14T03:21:58Z

This is a more aggresive optimization than pipelined pessimistic lock. It tries to put pessimistic locks only in memory and not replicate them through Raft while not decreasing the success rate of pessimistic transactions.

According to preliminary tests, this optimization reduces disk write flow and raftstore CPU by about 20%.

cc @HunDunDM @youjiali1995 @gengliqi @cfzjywxk @MyonKeminta @longfangsong

BusyJay · 2021-10-14T05:08:46Z

Can you write a section to reasoning correctness explicitly?

youjiali1995

There is a notable difference between pipelined pessimistic lock and in memory pessimistic lock -- if a pessimistic lock is written, it's never lost in pipelined pessimistic lock but in memory pessimistic lock may be lost. Can you elaborate on that?

text/0077-in-memory-pessimistic-locks.md

sticnarf · 2021-11-11T08:53:54Z

The new commit adds some more detailed handlings about leader transfer and some explanations about correctness.

zhangjinpeng87 · 2021-11-20T14:23:07Z

text/0077-in-memory-pessimistic-locks.md

+
+- Pessimistic locks are written into a region-level lock table.
+- Pessimistic locks are sent to other peers before a voluntary leader transfer.
+- Pessimistic locks in the source region are sent to the target region before a region merge.


How about tikv store level failure?

How about the leader encounters network isolation?

The behavior under these store-level failures is not different from using pipelined pessimistic lock. The pessimisitc transactions are not guaranteed to commit successfully, but they can commit if no write conflicts have happen.

pipelined pessimistic lock is off by default? so this feature will not open by default?

Pipielined pessimistic lock is enabled by default from 5.0. So, this feature will be enabled by default as well.

Little-Wallace · 2021-11-20T16:32:56Z

text/0077-in-memory-pessimistic-locks.md

+
+By default, a Raft message has a size limit of 1 MiB. We will guarantee that the total size of in-memory pessimistic locks in a single region will not exceed the limit. This will be discussed later.
+
+If the transfer leader message is lost or rejected, we need to revert `valid` to `true`. But it is not possible for the leader to know. So, we have to use a timeout to implement it. That means, we can revert `valid` to `true` if the leader is still not transferred after some period. And if the leader receives the `MsgTransferLeader` response from the follower after the timeout, it should ignore the message and not trigger a leader transfer.


I can not understand. Because raft will only append and apply log in order as they are send, we just need to propose the entry MsgTransferLeader after lock. Just like that.

fn propose_transfer_leader(&mut self, msg: RaftMessage) { if !self.locks.is_empty() { let lock_msg = RaftMessage::new(); for lock in self.locks.take() { lock_msg.append(lock); if lock_msg.size() > 1024 * 1024 { // 1MB self.propose(lock_msg); } } self.propose(lock_msg); } self.propose(msg); }

Well, my initial thought was to not change the transfer leader procedure. Using a proposal is simpler.

Another difference is the impact of leader transafer on ongoing pessimistic transactions such as the fallback processing described in this document. If the impact is ok we could use the simpler way and make it a whole, it's easier to maintain.

The timeout mechanism is still needed. The leader transfer timeout is election timeout in raft. We can check it on tick.

text/0077-in-memory-pessimistic-locks.md

youjiali1995 · 2021-11-22T07:05:31Z

text/0077-in-memory-pessimistic-locks.md

+
+By default, a Raft message has a size limit of 1 MiB. We will guarantee that the total size of in-memory pessimistic locks in a single region will not exceed the limit. This will be discussed later.
+
+If the transfer leader message is lost or rejected, we need to revert `valid` to `true`. But it is not possible for the leader to know. So, we have to use a timeout to implement it. That means, we can revert `valid` to `true` if the leader is still not transferred after some period. And if the leader receives the `MsgTransferLeader` response from the follower after the timeout, it should ignore the message and not trigger a leader transfer.


The timeout mechanism is still needed. The leader transfer timeout is election timeout in raft. We can check it on tick.

text/0077-in-memory-pessimistic-locks.md

youjiali1995

LGTM

cfzjywxk

LGTM

moonsphere · 2021-11-25T10:40:20Z

text/0077-in-memory-pessimistic-locks.md

+
+- Reading exactly the key of which pessimistic lock is lost is not affected, because the pessimistic lock is totally invisible to the reader.
+- If a secondary 2PC lock is read while the primary lock is still in the pessimistic stage, the reader will call `CheckTxnStatus` to the primary lock:
+  - If the primary lock exists, `min_commit_ts` of the lock is advanced, so the reader will not be blocked. **This operation must be replicated through Raft.** Otherwise, if the primary lock is lost, we may allow a smaller commit TS, breaking snapshot isolation.


This operation must be replicated through Raft.

maybe this can get some inspiration from CRDB... (every read cmd need to go through raft is a little expensive.

it seems they also can have similar push-txn mechanics, but they only update timestamp in memory tsCache instead of updating a field in the lock(write-intent), the new leader also couldn't get old pushed timestamp info because of lost memory.

in the previous version, it seems they choose to update tcCache low water ts to new lease time to let conflict write txn report error & retry...https://github.com/cockroachdb/cockroach/blob/df826cdf700a79948d083827ca67967016a1a1af/pkg/kv/kvserver/replica_proposal.go#L383... (it's unfriendly to user but safe..

but it seems in this year, they take some new improvements cockroachdb/cockroach@a7472e3, to transfer ts cache during leader transfer and range merge, maybe have litter chance to force the user to retry in commit phase.. (ps: I'm not carefully read this new code, maybe wrong 😄

This does not mean every read cmd will go through Raft. It is rare in practice to update the min_commit_ts in the pessimistic lock. Keys are prewritten in parallel, modifying the pessimistic lock only happens when secondary keys are prewritten and the primary key is not prewritten.

I haven't understood the new CRDB approach yet. We will get the pessimistic lock lost under unexpected crashes. But then, the ts cache cannot be transferred as well... Currently, I don't get how it will decrease the failure rate...

This does not mean every read cmd

thanks for explaining 😄 ~

I don't get how it will decrease the failure rate

I mean it will help decrease failure rate for "transfer leader", it seems previous version transfer leader also will set low water as newLease.Start just like crash - -

but for real "unexpected crashes", the pr should have no improvement for fail rate (

Little-Wallace · 2021-11-28T07:15:43Z

text/0077-in-memory-pessimistic-locks.md

+- A different transaction can resolve the pessimistic lock when it encounters the pessimistic lock in `AcquirePessimisticLock` or `Prewrite`. So, if the lock is lost, `PessimisticRollback` will find no lock and do nothing. No change is needed.
+- `TxnHeartBeat` will fail after the loss of pessimistic locks. But it will not affect correctness.
+
+### Lock migration


When TiKV add a learner for this region, it will send a snapshot to learner node. But how we could keep the consistency for both lockcf and writecf?

I means that this memory structure need give a snapshot-like result, so that not transaction will effect this result, which is scanning data to send to learner node.

Pessimistic locks are totally ignored for all reading requests. So, we can care only about the leader.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

…ng leader Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

* RFC: In-memory Pessimistic Locks Signed-off-by: Yilin Chen <sticnarf@gmail.com> * clarify where to delete memory locks after writing a lock CF KV Signed-off-by: Yilin Chen <sticnarf@gmail.com> * Elaborate transfer leader handlings and add correctness section Signed-off-by: Yilin Chen <sticnarf@gmail.com> * add an addition step of proposing pessimistic locks before transferring leader Signed-off-by: Yilin Chen <sticnarf@gmail.com> * clarify about new leaders of region split Signed-off-by: Yilin Chen <sticnarf@gmail.com> * Add tracking issue link Signed-off-by: Yilin Chen <sticnarf@gmail.com> * update design and correctness analysis of lock migration Signed-off-by: Yilin Chen <sticnarf@gmail.com> * add configurations Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com>

* RFC: RawKV Batch Export (#76) Signed-off-by: pingyu <yuping@pingcap.com> * rawkv bulk load: add description for pause merge (#74) * rawkv bulk load: add description for pause merge Signed-off-by: Peng Guanwen <pg999w@outlook.com> * Update text/0072-online-bulk-load-for-rawkv.md Co-authored-by: Liangliang Gu <marsishandsome@gmail.com> Signed-off-by: Peng Guanwen <pg999w@outlook.com> * Add future improvements Signed-off-by: Peng Guanwen <pg999w@outlook.com> Co-authored-by: Liangliang Gu <marsishandsome@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * ref pd#4112: implementation detail of PD Signed-off-by: pingyu <yuping@pingcap.com> * ref pd#4112: implementation detail of PD Signed-off-by: pingyu <yuping@pingcap.com> * remove raw cf Signed-off-by: Andy Lok <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * update Signed-off-by: Andy Lok <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * update pd design Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * revert to keyspace_next_id Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * RFC: Improve the Scalability of TSO Service (#78) Signed-off-by: pingyu <yuping@pingcap.com> * make region size dynamic (#82) Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * update pd url Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * address comment Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * resolve pd flashback problem Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * update rfcs Signed-off-by: Andy Lok <andylokandy@hotmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * RFC: In-memory Pessimistic Locks (#77) * RFC: In-memory Pessimistic Locks Signed-off-by: Yilin Chen <sticnarf@gmail.com> * clarify where to delete memory locks after writing a lock CF KV Signed-off-by: Yilin Chen <sticnarf@gmail.com> * Elaborate transfer leader handlings and add correctness section Signed-off-by: Yilin Chen <sticnarf@gmail.com> * add an addition step of proposing pessimistic locks before transferring leader Signed-off-by: Yilin Chen <sticnarf@gmail.com> * clarify about new leaders of region split Signed-off-by: Yilin Chen <sticnarf@gmail.com> * Add tracking issue link Signed-off-by: Yilin Chen <sticnarf@gmail.com> * update design and correctness analysis of lock migration Signed-off-by: Yilin Chen <sticnarf@gmail.com> * add configurations Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * propose online unsafe recovery (#91) Signed-off-by: Connor1996 <zbk602423539@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * physical isolation between region (#93) Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Signed-off-by: pingyu <yuping@pingcap.com> * wip Signed-off-by: pingyu <yuping@pingcap.com> * update Signed-off-by: pingyu <yuping@pingcap.com> * update Signed-off-by: pingyu <yuping@pingcap.com> * Apply suggestions from code review Co-authored-by: Xiaoguang Sun <sunxiaoguang@users.noreply.github.com> Signed-off-by: pingyu <yuping@pingcap.com> * fix case Signed-off-by: pingyu <yuping@pingcap.com> Signed-off-by: pingyu <yuping@pingcap.com> Signed-off-by: Andy Lok <andylokandy@hotmail.com> Signed-off-by: andylokandy <andylokandy@hotmail.com> Signed-off-by: Jay Lee <BusyJayLee@gmail.com> Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Liangliang Gu <marsishandsome@gmail.com> Co-authored-by: Peng Guanwen <pg999w@outlook.com> Co-authored-by: Andy Lok <andylokandy@hotmail.com> Co-authored-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Jay <BusyJay@users.noreply.github.com> Co-authored-by: Yilin Chen <sticnarf@gmail.com> Co-authored-by: Connor <zbk602423539@gmail.com> Co-authored-by: Xiaoguang Sun <sunxiaoguang@users.noreply.github.com>

sticnarf force-pushed the mempl branch from 85a68f1 to 6c3ee91 Compare October 14, 2021 03:22

sticnarf force-pushed the mempl branch from 6c3ee91 to 76904b0 Compare October 15, 2021 01:59

youjiali1995 self-requested a review October 19, 2021 09:22

youjiali1995 reviewed Oct 20, 2021

View reviewed changes

text/0077-in-memory-pessimistic-locks.md Show resolved Hide resolved

sticnarf force-pushed the mempl branch 2 times, most recently from d4c5da7 to e772d05 Compare November 11, 2021 08:52

zhangjinpeng87 reviewed Nov 20, 2021

View reviewed changes

Little-Wallace reviewed Nov 20, 2021

View reviewed changes

youjiali1995 reviewed Nov 22, 2021

View reviewed changes

youjiali1995 approved these changes Nov 22, 2021

View reviewed changes

cfzjywxk approved these changes Nov 23, 2021

View reviewed changes

sticnarf mentioned this pull request Nov 23, 2021

Tracking Issue: In-memory Pessimistic Locks tikv/tikv#11452

Closed

14 tasks

moonsphere reviewed Nov 25, 2021

View reviewed changes

Little-Wallace reviewed Nov 28, 2021

View reviewed changes

sticnarf added 8 commits March 9, 2022 19:18

RFC: In-memory Pessimistic Locks

bcdf992

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

clarify where to delete memory locks after writing a lock CF KV

96ef5ac

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Elaborate transfer leader handlings and add correctness section

9028408

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

add an addition step of proposing pessimistic locks before transferri…

e5d91b5

…ng leader Signed-off-by: Yilin Chen <sticnarf@gmail.com>

clarify about new leaders of region split

d9b0bb6

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Add tracking issue link

cf130c9

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

update design and correctness analysis of lock migration

8574e94

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

add configurations

4415580

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

sticnarf force-pushed the mempl branch from 6a66037 to 4415580 Compare March 9, 2022 11:18

sticnarf merged commit 2334fd4 into tikv:master Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: In-memory Pessimistic Locks #77

RFC: In-memory Pessimistic Locks #77

sticnarf commented Oct 14, 2021

BusyJay commented Oct 14, 2021

youjiali1995 left a comment

sticnarf commented Nov 11, 2021

zhangjinpeng87 Nov 20, 2021

zhangjinpeng87 Nov 20, 2021

sticnarf Nov 22, 2021

zhangjinpeng87 Nov 22, 2021

sticnarf Nov 22, 2021

Little-Wallace Nov 20, 2021

sticnarf Nov 22, 2021 •

edited

Loading

cfzjywxk Nov 22, 2021

youjiali1995 Nov 22, 2021

youjiali1995 Nov 22, 2021

youjiali1995 left a comment

cfzjywxk left a comment

moonsphere Nov 25, 2021

sticnarf Nov 25, 2021

moonsphere Nov 25, 2021

Little-Wallace Nov 28, 2021

Little-Wallace Nov 28, 2021

sticnarf Nov 29, 2021


		By default, a Raft message has a size limit of 1 MiB. We will guarantee that the total size of in-memory pessimistic locks in a single region will not exceed the limit. This will be discussed later.

		If the transfer leader message is lost or rejected, we need to revert `valid` to `true`. But it is not possible for the leader to know. So, we have to use a timeout to implement it. That means, we can revert `valid` to `true` if the leader is still not transferred after some period. And if the leader receives the `MsgTransferLeader` response from the follower after the timeout, it should ignore the message and not trigger a leader transfer.

RFC: In-memory Pessimistic Locks #77

RFC: In-memory Pessimistic Locks #77

Conversation

sticnarf commented Oct 14, 2021

BusyJay commented Oct 14, 2021

youjiali1995 left a comment

Choose a reason for hiding this comment

sticnarf commented Nov 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sticnarf Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youjiali1995 left a comment

Choose a reason for hiding this comment

cfzjywxk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sticnarf Nov 22, 2021 •

edited

Loading