Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add ability to wait for replication streams #7542

Merged
merged 9 commits into from
May 22, 2020
Merged

Conversation

erikjohnston
Copy link
Member

The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).

Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.

People probably want to look at this commit by commit.

@erikjohnston erikjohnston requested a review from a team May 20, 2020 17:26
@erikjohnston
Copy link
Member Author

Hmm, actually, I think I'm going to remove the reliance on internal_metadata.stream_ordering. I don' think its the right way of doing this.

@erikjohnston
Copy link
Member Author

Note that a lot of the diff is just passing stream_id through various layers.

@erikjohnston
Copy link
Member Author

Also, if people are happy with the first commit but panicking a bit about the rest with the large diff then shout and I'll split it out (but I don't think it makes sense to look at the first without the second, and vice versa)

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things, but overall looks good. Excited to see that this concept may prevent a class of future races.

synapse/replication/tcp/client.py Outdated Show resolved Hide resolved
synapse/replication/tcp/client.py Show resolved Hide resolved
synapse/replication/tcp/client.py Show resolved Hide resolved
synapse/replication/tcp/client.py Outdated Show resolved Hide resolved
synapse/replication/tcp/client.py Show resolved Hide resolved
synapse/handlers/federation.py Outdated Show resolved Hide resolved
tests/test_federation.py Outdated Show resolved Hide resolved
erikjohnston and others added 2 commits May 22, 2020 13:34
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
@erikjohnston erikjohnston merged commit 1531b21 into develop May 22, 2020
@erikjohnston erikjohnston deleted the erkj/racey_sends branch May 22, 2020 13:21
phil-flex pushed a commit to phil-flex/synapse that referenced this pull request Jun 16, 2020
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).

Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.

People probably want to look at this commit by commit.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants