Jepsen: multiple conflicting appends sometimes succeed #2765

snaury · 2024-03-14T13:30:27Z

I rewrote jepsen test to merge multiple micro ops into a single query (does UPSERT from SELECT with LEFT JOIN and GROUP BY) and started to get inconsistent read failures. It appears that multiple transactions claim the same list index and successfully overwrite it, even though only one transaction should have succeeded.

A good example of such failure:

We append to key 192
One transaction appends 630, another appends 637, both succeeds
Other transactions see the list ending in 630, but then it changes to 637

History has this info on two appends:

# append 630
{:index 300389, :time 240680658687, :type :ok, :process 1547, :f :txn,
    :value [
        [:append 192 630]
        [:append 3 52]],
    :modified-txn [
        [:batch #{192 3} #jepsen.ydb.append.OperationBatch{:ops [[:append 192 630] [:append 3 52]], :reads [], :writes [[192 630] [3 52]], :readmap {}}]],
    :debug-info [
        [
            {:tablet 72075186226226475, :gen 25, :read-version "v1710410837721/18446744073709551615", :op "read", :node 50003, :read-id 2, :snapshot-repeatable true, :lock-tx-id 844430958736899}
            {:tablet 72075186226227048, :gen 2, :read-version "v1710410837721/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 844430958736899}]
        [
            {:tablet 72075186226227048, :node 50008, :gen 2, :op "kqp-tx", :tx-id 281481004377038, :step 1710410838273, :version "v1710410838273/281481004377038"}
            {:tablet 72075186226226475, :node 50003, :gen 25, :op "kqp-tx", :tx-id 281481004377038, :step 1710410838273, :version "v1710410838273/281481004377038"}]]}

# append 637
{:index 300437, :time 240689511468, :type :ok, :process 1159, :f :txn,
    :value [
        [:append 192 637]
        [:append 196 45]
        [:append 196 46]],
    :modified-txn [
        [:batch #{192 196} #jepsen.ydb.append.OperationBatch{:ops [[:append 192 637] [:append 196 45]], :reads [], :writes [[192 637] [196 45]], :readmap {}}]
        [:commit nil [:append 196 46]]],
    :debug-info [
        [
            {:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562955991686003}
            {:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562955991686003}]
        [
            {:tablet 72075186226227048, :gen 2, :op "kqp-tx", :node 50008, :snapshot "v1710410838260/18446744073709551615", :tx-id 562955991686035, :snapshot-repeatable true, :lock-tx-id 562955991686035}
            {:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 0, :snapshot-repeatable true, :lock-tx-id 562955991686035}
            {:tablet 72075186226227048, :node 50008, :gen 2, :op "kqp-tx", :tx-id 562955991686134, :version "v1710410838283/18446744073709551615"}]]}

What I could infer from this:

Tx with :append 192 630 committed first at v1710410838273/281481004377038, it was a pretty straight forward batch, single query with a separate commit, which applied deferred effects directly at commit time
Tx with :append 192 637 was part of a batch query (without commit), it performed reads at v1710410838260/18446744073709551615 with :lock-tx-id 562955991686003, deferred effects should have been buffered in memory
Then the second tx performed [:append 196 46] with fused commit. Since it append reads from the table it needed to flush deferred effects, and you can see it happened with :tx-id 562955991686035 and :lock-tx-id 562955991686035. Notice LockTxId is different (not 562955991686003) and matches TxId. This means AcquireLocksTxId was 0, and this can only happen when transaction has absolutely no locks in the list.
Since lock 562955991686003 (apparently) wasn't in memory we committed without checking it, and it successfully committed write of value 637 to the same index as 630.
All cases I've seen so far had :read-id that was not zero, apparently read was restarted for some reason, but current debug-info is only passed on success.
Currently debug-info doesn't log any info about acquired and validated locks, so it's unclear what the read actually returned (it's unclear whether the lock was successfully taken, for example, or whether it was initially broken).

I started seeing this after switching to complex query, so join and group by might be involved somehow.

The text was updated successfully, but these errors were encountered:

snaury · 2024-03-14T13:32:00Z

@ulya-sidorina @ssmike @spuchin please take a look

snaury · 2024-03-14T17:13:29Z

One more case, with additional info about locks:

key 259
values: 263, 267

{:index 404617, :time 228748470226, :type :ok, :process 102, :f :txn,
    :value [
        [:append 259 263]],
    :modified-txn [
        [:commit nil [:batch #{259} #jepsen.ydb.append.OperationBatch{:ops [[:append 259 263]], :reads [], :writes [[259 263]], :readmap {}}]]],
    :debug-info [
        [
            {:tablet 72075186226312326, :acquired-locks [{:lock-id 844431139706297, :counter 135}], :gen 167, :read-version "v1710434960553/18446744073709551615", :op "read", :node 50003, :read-id 7, :snapshot-repeatable true, :lock-tx-id 844431139706297}
            {:tablet 72075186226312326, :gen 167, :op "kqp-tx", :node 50003, :locks-op "commit", :tx-id 844431139707274, :version "v1710434961384/18446744073709551615", :validate-locks [{:lock-id 844431139706297, :counter 135}]}]]}

{:index 404673, :time 228758805235, :type :ok, :process 25, :f :txn,
    :value [
        [:append 259 267]
        [:append 260 3]],
    :modified-txn [
        [:batch #{259 260} #jepsen.ydb.append.OperationBatch{:ops [[:append 259 267] [:append 260 3]], :reads [], :writes [[259 267] [260 3]], :readmap {}}]],
    :debug-info [
        [
            {:tablet 72075186226312326, :gen 167, :read-version "v1710434960564/18446744073709551615", :op "read", :node 50003, :read-id 7, :snapshot-repeatable true, :lock-tx-id 844431139706471}]
        [
            {:tablet 72075186226312326, :node 50003, :gen 167, :op "kqp-tx", :tx-id 844431139707697, :version "v1710434961392/0"}]]}

What's notable here is the read for keys 259/260 was acquiring lock 844431139706471, the result didn't acquire any locks for some reason (maybe it wasn't the first read result or something), and commit happily performed writes without checking any locks, since nobody asked.

snaury · 2024-03-14T17:38:32Z

I think I might have an idea of what happened. The read probably had keys 260 and 259 (in that order), read result for key 260 was empty, but when reading key 259 we got a page fault. Since we don't want to repeat work performed so far, and don't want to send empty results either, we continued in a secondary local db transaction without sending the first result. But since we acquire locks in the first transaction we were supposed to send their list, but while migrating to a secondary transaction those acquired locks are lost, and never sent again in the result. So kqp doesn't know about them and never ask datashard to check them again.

snaury · 2024-03-14T22:26:02Z

Kinda confirmed with more debug info (and even better query batching):

key: 354
values: 20, 16

{:index 552317, :time 315279330363, :type :ok, :process 269, :f :txn,
    :value [
        [:append 354 20]],
    :modified-txn [
        [:batch nil [[:append 354 20]]]],
    :debug-info [
        [
            {:tablet 72075186226372915, :acquired-locks [{:lock-id 562956296446503, :counter 14}], :gen 1, :read-version "v1710451670871/18446744073709551615", :op "read", :finished true, :node 50006, :keys 0, :seqno 1, :read-id 1, :ranges 1, :snapshot-repeatable true, :lock-tx-id 562956296446503}]
        [
            {:tablet 72075186226372915, :gen 1, :op "kqp-tx", :node 50006, :locks-op "commit", :tx-id 562956296446568, :version "v1710451670891/18446744073709551615", :validate-locks [{:lock-id 562956296446503, :counter 14}]}]]}

{:index 552354, :time 315285857518, :type :ok, :process 264, :f :txn,
    :value [
        [:append 354 16]
        [:append 351 731]
        [:append 354 17]],
    :modified-txn [
        [:batch nil [[:append 354 16] [:append 351 731] [:append 354 17]]]],
    :debug-info [
        [
            {:tablet 72075186226372915, :gen 1, :read-version "v1710451670861/18446744073709551615", :op "read", :node 50006, :keys 0, :seqno 1, :read-id 1, :ranges 2, :snapshot-repeatable true, :lock-tx-id 844431262076086, :read-rows 80}
            {:tablet 72075186226372915, :gen 1, :read-version "v1710451670861/18446744073709551615", :op "read", :finished true, :node 50006, :keys 0, :seqno 2, :read-id 1, :ranges 2, :snapshot-repeatable true, :lock-tx-id 844431262076086, :read-rows 116}]
        [
            {:tablet 72075186226372915, :node 50006, :gen 1, :op "kqp-tx", :tx-id 844431262076188, :version "v1710451670892/0"}]]}

It is clearly visible that the second tx performed a read which produced two results, and the first one with :seqno 1 doesn't have :acquired-locks. Read iterators are currently supposed to take locks for all queries (see :ranges 2) on the first iteration.

snaury · 2024-03-14T22:30:24Z

Must have been broken since ebcd04f (main, 24.1), which wasn't merged into current production stable versions (23.3 and 23.4).

…atform#2765.

…atform#2765. (ydb-platform#2825)

…2765. (#2839)

snaury self-assigned this Mar 14, 2024

snaury added a commit to snaury/ydb that referenced this issue Mar 15, 2024

Fix missing locks on read iterator empty result elision. Fixes ydb-pl…

cf49baa

…atform#2765.

snaury closed this as completed in e91747e Mar 15, 2024

snaury added a commit to snaury/ydb that referenced this issue Mar 15, 2024

Fix missing locks on read iterator empty result elision. Fixes ydb-pl…

9a073ec

…atform#2765. (ydb-platform#2825)

snaury added a commit that referenced this issue Mar 16, 2024

24-1: Fix missing locks on read iterator empty result elision. Fixes #…

9edd757

…2765. (#2839)

GrigoriyPA mentioned this issue Mar 19, 2024

Fixed error due to empty token #2966

Merged

shnikd mentioned this issue Mar 20, 2024

Test alter temp table #3011

Merged

robdrynkin mentioned this issue Mar 21, 2024

Cherry pick commit 39728d817646a9dcf7d0a7cb6e1edaff97b363e3 #3035

Closed

This was referenced Mar 22, 2024

Support sequences backup with current value #3089

Merged

YDBDOCS-632: Documentation of temporary tables #3137

Merged

GrigoriyPA mentioned this issue Mar 26, 2024

YQ-2738 add integration tests for YT in KQP #3150

Merged

shnikd mentioned this issue Mar 26, 2024

Test temp indexes #3163

Merged

This was referenced Mar 29, 2024

Unmute kafka tests #3285

Merged

Fix kafka counters and RU #3348

Merged

shnikd mentioned this issue Apr 4, 2024

Support create sequence in pg parser #3465

Merged

This was referenced Apr 6, 2024

Fix kafka read session verify #3522

Merged

Fix kafka read session partitions releases #3528

Merged

Kafka fix verigy and partitions releases to 24-1 #3536

Closed

shnikd mentioned this issue Apr 11, 2024

Support create sequence #3662

Merged

GrigoriyPA mentioned this issue Apr 24, 2024

YQ-3006 fix drop table for external entities #4063

Closed

StekPerepolnen mentioned this issue Apr 26, 2024

NBYDB 14/tenantinfo disconnected merge 24-1 #4129

Closed

mregrock mentioned this issue May 15, 2024

stable 24 1 #4538

Closed

StekPerepolnen mentioned this issue May 24, 2024

merge 24 1/hc hive sync status fix #4840

Closed

FloatingCrowbar mentioned this issue Jun 7, 2024

Merge discarded messages sensors #5326

Closed

FloatingCrowbar mentioned this issue Jun 7, 2024

stable 24 1 #5327

Closed

CyberROFL mentioned this issue Jun 19, 2024

fixes #5741

Closed

CyberROFL mentioned this issue Jul 12, 2024

24-2: Fix bugs in: change exchange split, removing schema snapshots & emitting of resolved timestamps #6614

Closed

GrigoriyPA mentioned this issue Jul 15, 2024

KqpRun sync stable version with main #6670

Closed

CyberROFL mentioned this issue Jul 30, 2024

24-1-async-replication: Replicate index tables, ssl connections, connection_string in description #7242

Closed

StekPerepolnen mentioned this issue Aug 23, 2024

merge/stable 24 2/query float test asan #8196

Closed

GrigoriyPA mentioned this issue Sep 26, 2024

YQ-3684 fix screts not found error handling #9822

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jepsen: multiple conflicting appends sometimes succeed #2765

Jepsen: multiple conflicting appends sometimes succeed #2765

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024 •

edited

Loading

snaury commented Mar 14, 2024 •

edited

Loading

Jepsen: multiple conflicting appends sometimes succeed #2765

Jepsen: multiple conflicting appends sometimes succeed #2765

Comments

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024

snaury commented Mar 14, 2024 • edited Loading

snaury commented Mar 14, 2024 • edited Loading

snaury commented Mar 14, 2024 •

edited

Loading

snaury commented Mar 14, 2024 •

edited

Loading