Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

When loading current ids, sort by stream_id to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be null #13585

Merged
merged 4 commits into from
Aug 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13585.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix loading the current stream position behind the actual position.
13 changes: 11 additions & 2 deletions synapse/storage/util/id_generators.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,8 +460,17 @@ def _load_current_ids(
# Cast safety: this corresponds to the types returned by the query above.
rows.extend(cast(Iterable[Tuple[str, int]], cur))

# Sort so that we handle rows in order for each instance.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct order that we want? "Sort by [what] so that ... because [why]"

The previous code sorted it by instance_name because that was the first column selected in the SQL. Why do we want to do that? That code has been there since it was introduced

instance_name can be null instead of master which blows up the sort code. Another question is why do we see it as null sometimes instead of master in monolith mode?


As @richvdh pointed out, it does work to sort worker instances. But it's unclear to me why we want that or if that is intended. It feels like we might not need to sort at all.

Further discussion: https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$XjldnX6z-QfMK-eAS77tEIbwN5kNA3tq8s9temUwONs?via=matrix.org&via=element.io&via=beeper.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(requesting review for this comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My best guest is the line:

self._current_positions[instance] = stream_id

Where if we didn't ensure that we handled the rows in ascending stream ID order (per-instance) then we could end up setting the current position to a lower stream ID than we ought to.

I think its enough to do rows.sort(key=lambda _, stream_id: stream_id) or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @erikjohnston!

I don't know of the symptoms this would manifest as or underlying problems this causes but I guess this fixes some other bugs with incorrect stream_ids (all the places we use MultiWriterIdGenerator).

Probably solving some duplicate primary key database errors at the very least. And maybe larger bugs like to_device messages not sending.

rows.sort()
# Sort by stream_id (ascending, lowest -> highest) so that we handle
# rows in order for each instance because we don't want to overwrite
# the current_position of an instance to a lower stream ID than
# we're actually at.
def sort_by_stream_id_key_func(row: Tuple[str, int]) -> int:
(instance, stream_id) = row
# If `stream_id` is ever `None`, we will see a `TypeError: '<'
# not supported between instances of 'NoneType' and 'X'` error.
return stream_id

rows.sort(key=sort_by_stream_id_key_func)

with self._lock:
for (
Expand Down