-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix MSC3030 /timestamp_to_event
returning outliers
that it has no idea whether are near a gap or not
#14215
Fix MSC3030 /timestamp_to_event
returning outliers
that it has no idea whether are near a gap or not
#14215
Conversation
/timestamp_to_event
returning outliers
it has no idea whether are near a gap or not/timestamp_to_event
returning outliers
it has no idea whether are near a gap or not
389a4cc
to
8071e83
Compare
/** | ||
* Make sure the event isn't an `outlier` because we have no way | ||
* to later check whether it's next to a gap. `outliers` do not | ||
* have entries in the `event_edges`, `event_forward_extremeties`, | ||
* and `event_backward_extremities` tables to check against | ||
* (used by `is_event_next_to_backward_gap` and `is_event_next_to_forward_gap`). | ||
*/ | ||
AND outlier = ? /* False */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only new thing in get_event_id_for_timestamp(...)
is here. The rest is just moving stuff around with the f-string
8071e83
to
aebdb22
Compare
aebdb22
to
3c009c7
Compare
/timestamp_to_event
returning outliers
it has no idea whether are near a gap or not/timestamp_to_event
returning outliers
that it has no idea whether are near a gap or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane to me.
Thanks for the review @H-Shay 🦏 |
Slight side note: ignoring outliers is a good thing to do here. Though we might (in future) want to do more here to actually check if we think we have events around that timestamp (rather than a gaping hole). Consider the case where the server was down for a few weeks (or left and rejoined the room), in that period we won't have any events obviously, but queries to this endpoint will currently return events from one of those two "chunks" of DAG either side of the hole. Instead, the server should try and detect this case and hit out over federation instead. This is hard right now, as we don't really track the "chunks" of DAG we have, so makes it hard to detect when we have holes. |
Fix MSC3030
/timestamp_to_event
endpoint returningoutliers
that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether anoutlier
is next to a gap is because our gap checks rely on entries in theevent_edges
,event_forward_extremeties
, andevent_backward_extremities
tables which is not the case foroutliers
.Also fixes MSC3030 Complement
can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint
test flake. Although this acted flakey in Complement, ifsync_partial_state
raced and beat us before/timestamp_to_event
, then even if we retried the failing/context
request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return anoutlier
event so that test will always go and ask over federation.Fix #13944
Why did this fail before? Why was it flakey?
Sleuthing the server logs on the CI failure, it looks like
hs2:/timestamp_to_event
found$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU
as anoutlier
event locally. Then when we went and asked for it via/context
, since it's anoutlier
, it was filtered out of the results ->You don't have permission to access that event.
This is reproducible when
sync_partial_state
races and persists$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU
as anoutlier
before we evaluateget_event_for_timestamp(...)
. To consistently reproduce locally, just add a delay at the start ofget_event_for_timestamp(...)
so it always runs aftersync_partial_state
completes.In a run where it passes, on
hs2
,get_event_for_timestamp(...)
finds a different event locally which is next to a gap and we request from a closer one fromhs1
which gets backfilled. And since the backfilled event is not anoutlier
, it's returned as expected during/context
.With this PR, Synapse will never return an
outlier
event so that test will always go and ask over federation.Dev notes
Complement:
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)