-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jepsen: multiple conflicting appends sometimes succeed #2765
Comments
@ulya-sidorina @ssmike @spuchin please take a look |
One more case, with additional info about locks:
What's notable here is the read for keys 259/260 was acquiring lock 844431139706471, the result didn't acquire any locks for some reason (maybe it wasn't the first read result or something), and commit happily performed writes without checking any locks, since nobody asked. |
I think I might have an idea of what happened. The read probably had keys 260 and 259 (in that order), read result for key 260 was empty, but when reading key 259 we got a page fault. Since we don't want to repeat work performed so far, and don't want to send empty results either, we continued in a secondary local db transaction without sending the first result. But since we acquire locks in the first transaction we were supposed to send their list, but while migrating to a secondary transaction those acquired locks are lost, and never sent again in the result. So kqp doesn't know about them and never ask datashard to check them again. |
Kinda confirmed with more debug info (and even better query batching):
It is clearly visible that the second tx performed a read which produced two results, and the first one with |
Must have been broken since ebcd04f (main, 24.1), which wasn't merged into current production stable versions (23.3 and 23.4). |
I rewrote jepsen test to merge multiple micro ops into a single query (does UPSERT from SELECT with LEFT JOIN and GROUP BY) and started to get inconsistent read failures. It appears that multiple transactions claim the same list index and successfully overwrite it, even though only one transaction should have succeeded.
A good example of such failure:
History has this info on two appends:
What I could infer from this:
:append 192 630
committed first atv1710410838273/281481004377038
, it was a pretty straight forward batch, single query with a separate commit, which applied deferred effects directly at commit time:append 192 637
was part of a batch query (without commit), it performed reads atv1710410838260/18446744073709551615
with:lock-tx-id 562955991686003
, deferred effects should have been buffered in memory[:append 196 46]
with fused commit. Since it append reads from the table it needed to flush deferred effects, and you can see it happened with:tx-id 562955991686035
and:lock-tx-id 562955991686035
. Notice LockTxId is different (not 562955991686003) and matches TxId. This meansAcquireLocksTxId
was 0, and this can only happen when transaction has absolutely no locks in the list.:read-id
that was not zero, apparently read was restarted for some reason, but current debug-info is only passed on success.I started seeing this after switching to complex query, so join and group by might be involved somehow.
The text was updated successfully, but these errors were encountered: