This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Backfill in the background if we're doing it "just because" #15710
Merged
MadLittleMods
merged 4 commits into
develop
from
madlittlemods/backfill-anyway-in-the-background
Jun 9, 2023
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Speed up `/messages` by backfilling in the background when there are no backward extremities where we are directly paginating. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things being conflated here:
The first thing around "risk" is only really just calling out the fact that there is a behavior change here. Before this PR, if there wasn't any backward extremities where we were paginating, we would backfill anyway with the most-recent backward extremities. The exact reason why we did this wasn't specified beyond "in case we do get relevant events" but I assume it was for eventual consistency and just in case we catch a new branch of history that extends back into the place where we are paginating. Now with this PR, we are acknowledging that it's still a possibility that some useful history could come back but we don't need to block the client in this case because it's a best effort thing that could just as easily not return anything relevant since it's unlikely that a random backward extremity will lead to anything relevant to the place we are paginating and impossible in situations where the most-recent backward extremities are so far ahead of where we are paginating. And there really isn't any consequence since next time they scrollback, the history will be there since we still did the work in the background.
The second thing here is the return value of
maybe_backfill()
which is completely separate. This just signals whether we actually backfilled anything. We've historically not used it for anything since it was introduced. Coincidentally, I am using it in a new PR in order to see if there is anything new that we should go back to the database and ask about. Now that I understand the nuance better with an actual use case, I've updated the comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm starting to understand what you are saying a bit; thanks for bearing with me, I don't ever touch backwards extremities so it's an unfamiliar topic for me.
So the root of the issue is that backwards extremities could point to events anywhere in the DAG (we don't know whereabouts they are because we don't have those events locally, which is why they are backwards extremities in the first place).
So when scrolling backwards in the event stream, we technically would have to try and fetch events from backwards extremities 'in case' they might appear in the scrollback window?
But this is not a very common occurrence, so most of the time it's a waste to wait for this to happen before letting the client go on its merry way?
And then the change introduced here means we don't block on that request. In the rare case that it should have produced events in the client's request window, it now won't.
Is that roughly the right idea?
The part I'm missing now is, which of these holds?:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main gist to take away 👍
We know approximately where any backward extremity is because we have the successor events persisted (see the backward extremity explanation in the docs). And we first pick from backward extremities that are approximately in range of where we are paginating already. This is all smart and good so far. But if there are no backward extremities at or further back in history than our current pagination position, then we just backfill with the most recent backward extremities (just because).
The backfill with the most recent backward extremities "just because" is the part we're deciding to put in the background now.
We could check whether these most recent backward extremities are "close" as well but it's just extra complexity and lookup time and we could just as easily be missing that single event vs a unknown chain. And maybe we even want to do something different, where we just choose the closest extremities to our window. But the current behavior of most recent extremities does fill in history that other people are most likely to hit first if they scrollback and is a bit simple to think about (although a bit arbitrary).
There is an argument where we might want to remove this logic completely, but I think it serves a good enough purpose to fill in history in rooms that people are visiting often and unstick things so they become eventually consistent.
This is the right answer. Which does mean that clients could miss out on history but that's already the case and one of the reasons I created MSC3871 so we could indicate to clients that there is a gap that they could try refetching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable compromises. That gap suggestion by MSC3871 could well be useful for clients, thanks.