-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResolveLocks for pessimistic transaction that has switched primary may break transaction atomicity #42937
Comments
The problem is confirmed using the following steps. /* init */ create table t(id int primary key, v int unique);
/* init */ insert into t values (1, 10), (2, 20), (3, 30), (4, 40);
/* init */ create table t2 (id int primary key, v int);
/* init */ insert into t2 values (1, 1), (2, 2);
/* t1 */ set @@tidb_enable_async_commit = 0;
/* t1 */ set @@tidb_enable_1pc = 0;
/* t2 */ set @@tidb_enable_async_commit = 0;
/* t2 */ set @@tidb_enable_1pc = 0;
-- Enable failpoints:
-- * tikvclient/beforeAsyncPessimisticRollback = return("skip")
-- * tikvclient/twoPCRequestBatchSizeLimit = return
-- This simulates asyncPessimisticRollback failure and multi pessimistic lock requests.
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t2 */ update t set v = v + 1 where id = 2;
-- Enable failpoints:
-- * tikvclient/twoPCShortLockTTL = return
-- * tikvclient/shortPessimisticLockTTL = return
-- This makes the locks acquired by transaction t1 have a very short TTL for quickly
-- simulating the case that locks' TTL has expired.
/* t1 */ with c as (select /*+ MERGE() */ v from t2 where id = 1 or id = 2) update c join t on c.v = t.id set t.v = t.v + 1;
-- t1 blocked by t2
-- In my test environment, the key of index(v):10 is selected as the primary.
/* t3 */ update t2 set v = v + 2; -- Change the rows that will be updated by t1.
/* t2 */ commit;
-- t1 resumed, row 3 and row 4 are updated
/* t1 */ update t set v = 0 where id = 1; -- This requires locking the key index(v):10 again.
-- Enable failpoint:
-- * tikvclient/beforeCommit = 1*return("delay")
-- with delays randomly in 0 to 5 seconds
/* t1 */ commit;
-- t1 blocked by failpoint. Now, the initial state described in the issue is constructed.
-- sleep 1s
/* t2 */ insert into t values (5, 11);
-- t2 encounters lock on key index(v):11 that's not cleaned up whose primary
-- points to index(v):10, and it succeeds after resolving locks.
-- Wait for t1.
-- t1 is supposed to fail since the transaction in t2 should roll it back. However,
-- it returns success to the client.
/* t1 */ admin check table t; -- Fails Note that this procedure doesn't stably produce the failure since the delay time in failpoint Also note that the data-index inconsistency is not the only phenomena it may cause - I'm afraid in most cases it causes data inconsistency that's hardly noticeable. The admin-check failure:
The log of committing secondaries:
The test code: tidb/tests/realtikvtest/pessimistictest/pessimistic_test.go Lines 3390 to 3461 in 59c37c5
|
cc @cfzjywxk |
close tikv#14636, ref pingcap/tidb#42937 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#14636, ref pingcap/tidb#42937 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#14636, ref pingcap/tidb#42937 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
… requests (tikv#14637) close tikv#14636, ref pingcap/tidb#42937 Makes TiKV support checking whether the lock is primary when handling check_txn_status. Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lidezhu <lidezhu@pingcap.com>
@shiyuhang0 I think you or your team also need to notice this since TiSpark uses the java client to resolve locks |
… requests (#14637) (#14659) close #14636, ref #14637, ref pingcap/tidb#42937 Makes TiKV support checking whether the lock is primary when handling check_txn_status. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: zyguan <zhongyangguan@gmail.com> Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com> Co-authored-by: zyguan <zhongyangguan@gmail.com>
…le pessimistic lock written before switching primary (pingcap#42990) close pingcap#42937
… requests (#14637) (#14661) close #14636, ref #14637, ref pingcap/tidb#42937 Makes TiKV support checking whether the lock is primary when handling check_txn_status. Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Bug Report
* I haven't strictly confirm that the problem does exist yet, but it worth discussion, I think.
Giventhe fact that:
Consider this case:
Initially, transaction T1 produces such state of data:
Steps to construct this state...
Then:
check_txn_status
on key2 (though it's not the real primary)check_txn_status
is called on a primary lock. It finds that the lock (written during T1's prewrite) is expired, so it considers T1 to be failed and rolls back the lock.The text was updated successfully, but these errors were encountered: