This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve get auth chain difference algorithm. #7095
Merged
Merged
Changes from 7 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
d128efc
Port event federation tests to new style
erikjohnston 278686c
Improve get auth chain difference algorithm.
erikjohnston 91a6f78
Newsfile
erikjohnston 0ebf292
Fix unit tests
erikjohnston bd2e18a
Really fix unit tests
erikjohnston d373279
Fix typos
erikjohnston fa04467
Review comments
erikjohnston f47a417
Apply suggestions from code review
erikjohnston b209466
pep8
erikjohnston File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Attempt to improve performance of state res v2 algorithm. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ | |
# limitations under the License. | ||
import itertools | ||
import logging | ||
from typing import List, Optional, Set | ||
from typing import Dict, List, Optional, Set | ||
|
||
from six.moves.queue import Empty, PriorityQueue | ||
|
||
|
@@ -103,6 +103,154 @@ def _get_auth_chain_ids_txn(self, txn, event_ids, include_given, ignore_events): | |
|
||
return list(results) | ||
|
||
def get_auth_chain_difference(self, state_sets: List[Set[str]]): | ||
"""Given sets of state events figure out the auth chain difference (as | ||
per state res v2 algorithm). | ||
|
||
This equivalent to fetching the full auth chain for each set of state | ||
and returning the events that don't appear in each and every auth | ||
chain. | ||
|
||
Returns: | ||
Deferred[Set[str]] | ||
""" | ||
|
||
return self.db.runInteraction( | ||
"get_auth_chain_difference", | ||
self._get_auth_chain_difference_txn, | ||
state_sets, | ||
) | ||
|
||
def _get_auth_chain_difference_txn( | ||
self, txn, state_sets: List[Set[str]] | ||
) -> Set[str]: | ||
|
||
# Algorithm Description | ||
# ~~~~~~~~~~~~~~~~~~~~~ | ||
# | ||
# The idea here is to basically walk the auth graph of each state set in | ||
# tandem, keeping track of which auth events are reachable by each state | ||
# set. If we reach an auth event we've already visited (via a different | ||
# state set) then we mark that auth event and all ancestors as reachable | ||
# by the state set. This requires that we keep track of the auth chains | ||
# in memory. | ||
# | ||
# Doing it in a such a way means that we can stop early if all auth | ||
# events we're currently walking are reachable by all state sets. | ||
# | ||
# *Note*: We can't stop walking an event's auth chain if it is reachable | ||
# by all state sets. This is because other auth chains we're walking | ||
# might be reachable only via the original auth chain. For example, | ||
# given the following auth chain: | ||
# | ||
# A -> C -> D -> E | ||
# / / | ||
# B -´---------´ | ||
# | ||
# and state sets {A} and {B} then walking the auth chains of A and B | ||
# would immediately show that C is reachable by both. However, if we | ||
# stopped at C then we'd only reach E via the auth chain of B and so E | ||
# would errornously get included in the returned difference. | ||
# | ||
# The other thing that we do is limit the number of auth chains we walk | ||
# at once, due to practical limits (i.e. we can only query the database | ||
# with a limited set of parameters). We pick the auth chains we walk | ||
# each iteration based on their depth, in the hope that events with a | ||
# lower depth are likely reachable by those with higher depths. | ||
# | ||
# We could use any ordering that we believe would give a rough | ||
# topological ordering, e.g. origin server timestamp. If the ordering | ||
# chosen is not topological then the algorithm still produces the right | ||
# result, but perhaps a bit more inefficiently. This is why it is safe | ||
# to use "depth" here. | ||
|
||
initial_events = set(state_sets[0]).union(*state_sets[1:]) | ||
|
||
# Dict from events in auth chains to which sets *cannot* reach them. | ||
# I.e. if the set is empty then all sets can reach the event. | ||
event_to_missing_sets = { | ||
event_id: {i for i, a in enumerate(state_sets) if event_id not in a} | ||
for event_id in initial_events | ||
} | ||
|
||
# We need to get the depth of the initial events for sorting purposes. | ||
sql = """ | ||
SELECT depth, event_id FROM events | ||
WHERE %s | ||
ORDER BY depth ASC | ||
""" | ||
clause, args = make_in_list_sql_clause( | ||
txn.database_engine, "event_id", initial_events | ||
) | ||
txn.execute(sql % (clause,), args) | ||
|
||
# The sorted list of events whose auth chains we should walk. | ||
search = txn.fetchall() | ||
erikjohnston marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Map from event to its auth events | ||
event_to_auth_events = {} # type: Dict[str, Set[str]] | ||
|
||
base_sql = """ | ||
SELECT a.event_id, auth_id, depth | ||
FROM event_auth AS a | ||
INNER JOIN events AS e ON (e.event_id = a.auth_id) | ||
WHERE | ||
""" | ||
|
||
while search: | ||
# Check whether all our current walks are reachable by all state | ||
# sets. If so we can bail. | ||
if all(not event_to_missing_sets[eid] for _, eid in search): | ||
break | ||
|
||
# Fetch the auth events and their depths of the N last events we're | ||
# currently walking | ||
search, chunk = search[:-100], search[-100:] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. vaguely wondering if a heap would be more efficient for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Me too, but heapq seems to only let you pop items one at a time I think? Which sounds inefficient There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed. we might have to roll our own impl, which may or may not work out as a winner over a sorted list. |
||
clause, args = make_in_list_sql_clause( | ||
txn.database_engine, "a.event_id", [e_id for _, e_id in chunk] | ||
) | ||
txn.execute(base_sql + clause, args) | ||
|
||
for event_id, auth_event_id, auth_event_depth in txn: | ||
event_to_auth_events.setdefault(event_id, set()).add(auth_event_id) | ||
|
||
sets = event_to_missing_sets.get(auth_event_id) | ||
if sets is None: | ||
# First time we're seeing this event, so we add it to the | ||
# queue of things to fetch. | ||
search.append((auth_event_depth, auth_event_id)) | ||
|
||
# Assume that this event is unreachable from any of the | ||
# state sets until proven otherwise | ||
sets = event_to_missing_sets.setdefault( | ||
auth_event_id, set(range(len(state_sets))) | ||
) | ||
erikjohnston marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
# We've previously seen this event, so look up its auth | ||
# events and recursively mark all ancestors as reachable | ||
# by the current event's state set. | ||
a_ids = event_to_auth_events.get(auth_event_id) | ||
while a_ids: | ||
new_aids = set() | ||
for a_id in a_ids: | ||
event_to_missing_sets[a_id].intersection_update( | ||
event_to_missing_sets[event_id] | ||
) | ||
|
||
b = event_to_auth_events.get(a_id) | ||
if b: | ||
new_aids.update(b) | ||
|
||
a_ids = new_aids | ||
|
||
# Mark that the auth event is reachable by the approriate sets. | ||
sets.intersection_update(event_to_missing_sets[event_id]) | ||
|
||
search.sort() | ||
|
||
# Return all events where not all sets can reach them. | ||
return {eid for eid, n in event_to_missing_sets.items() if n} | ||
|
||
def get_oldest_events_in_room(self, room_id): | ||
return self.db.runInteraction( | ||
"get_oldest_events_in_room", self._get_oldest_events_in_room_txn, room_id | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we trust this value of depth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Though hopefully as explained above we're using it as a hint, and so worst case if its wrong we basically pull the full auth chains out of the DB as we're currently doing