You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have have a problem where because events are persisted in a queue in a client_reader worker, there is no guarantee that they are available to read on other workers. So when we fire off a backfill request from /messages, those backfilled messages aren't necessarily available to paginate with after the backfill completes (even on the worker that put them in the persister queue).
serverB has event1 stored as an outlier from previous requests (specifically from MSC3030 jump to date pulling in a missing prev_event after backfilling)
Client on serverB calls /messages?dir=b
serverB:client_reader1 accepts the request and drives things
serverB:client_reader1 has some backward extremities in range and requests /backfill from serverA
serverB:client_reader1 processes the events from backfill including event1 and puts them in the _event_persist_queue
serverB:master picks up the events from the _event_persist_queue and persists them to the database, de-outliers event1 and invalidates its own cache and sends them over replication
serverB:client_reader1 starts assembling the /messages response and gets event1 out of the stale cache still as an outlier
serverB:client_reader1 responds to the /messages request without event1 because outliers are filtered out
serverB:client_reader1 finally gets the replication data and invalidates its own cache for event1 (too late, we already got the events from the stale cache and responded)
In a nutshell, we've written the test expecting "read-after-write consistency" but we don't have that.
It's exactly this but it really sucks that calling /messages doesn't include events we just backfilled for that request. This is a general problem with Synapse though, see issues labeled with https://github.com/matrix-org/synapse/labels/Z-Read-After-Write. In this case, it's all within the same /messages request so it's a little more insidious.
Having this be possible makes it even more of a reason that we should indicate gaps in /messages, MSC3871
The text was updated successfully, but these errors were encountered:
matrixbot
changed the title
Dummy issue
Race condition with replication means /messages backfill lacks read-after-write consistency between workers
Dec 21, 2023
This issue has been migrated from #14211.
We have have a problem where because events are persisted in a queue in a
client_reader
worker, there is no guarantee that they are available to read on other workers. So when we fire off a backfill request from/messages
, those backfilled messages aren't necessarily available to paginate with after the backfill completes (even on the worker that put them in the persister queue).CI failure: https://github.com/matrix-org/synapse/actions/runs/3182998161/jobs/5189731097#step:6:15343 (from discussion). This specific CI flake was addressed in matrix-org/complement#492
Here is what happens:
serverB
hasevent1
stored as anoutlier
from previous requests (specifically from MSC3030 jump to date pulling in a missingprev_event
after backfilling)serverB
calls/messages?dir=b
serverB:client_reader1
accepts the request and drives thingsserverB:client_reader1
has some backward extremities in range and requests/backfill
fromserverA
serverB:client_reader1
processes the events from backfill includingevent1
and puts them in the_event_persist_queue
serverB:master
picks up the events from the_event_persist_queue
and persists them to the database, de-outliersevent1
and invalidates its own cache and sends them over replicationserverB:client_reader1
starts assembling the/messages
response and getsevent1
out of the stale cache still as anoutlier
serverB:client_reader1
responds to the/messages
request withoutevent1
becauseoutliers
are filtered outserverB:client_reader1
finally gets the replication data and invalidates its own cache forevent1
(too late, we already got the events from the stale cache and responded)It's exactly this but it really sucks that calling
/messages
doesn't include events we just backfilled for that request. This is a general problem with Synapse though, see issues labeled with https://github.com/matrix-org/synapse/labels/Z-Read-After-Write. In this case, it's all within the same/messages
request so it's a little more insidious.Having this be possible makes it even more of a reason that we should indicate gaps in
/messages
, MSC3871The text was updated successfully, but these errors were encountered: