Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Consistently exclude users from directory #10914

Closed
wants to merge 9 commits into from

Conversation

DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson commented Sep 24, 2021

Gather the logic which determines if we should exclude a user from the directory into one place. I think this helps to make the code a lot more followable!

I think of this primarily as a tidy-up. But it does have a perceptible change. With this patch, rebuilding the directory will not include support and deactivated users, but it did before AFAICS.

David Robertson added 7 commits September 24, 2021 16:03
To do this:
 - move required helpers to the handlers test dir and pull them in via
 inheritance (ew)

Additionally, we no longer rebuild the user dir the the user dir
handlers tests. Instead, tests register their own users and that creates
the required user_dir entries.
I think this now fails because I've changed the user directory code to
require that users are not deactivated. I think this test was relying on
some magic to sort of create a user based on the test case attribute
"user_id". I don't think it was doing it properly by actually
registering a user proper.
This makes mypy happier
@DMRobertson DMRobertson force-pushed the dmr/consistent-user-dir-exclusion branch from 83c4b6d to ff89fb8 Compare September 24, 2021 16:54
@DMRobertson DMRobertson requested review from reivilibre, richvdh and a team September 24, 2021 17:49
@DMRobertson DMRobertson marked this pull request as ready for review September 24, 2021 17:49
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to unify these! Seems good to me.

Comment on lines +386 to +387
if self.get_if_app_services_interested_in_user(user):
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure what to think about this.
I have some appservice-generated users in my homeserver's search directory. I can't confirm or deny whether I've done something wrong to wind up with that happening, but thought it was worth mentioning in case this should trigger the question of 'do we really intend to exclude AS users from the directory?'.

Both yes and no make sense here. No, on a lot of networks (IRC probably? Discord when using a guild-bot.), the users aren't contactable.
But yes, on some networks, users are contactable and there's nothing inherently wrong with letting you discover them when trying to start a DM ...

I would guess that the 'No' cases outweigh the 'Yes' cases, so I think what you're doing is sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most of the code currently excludes AS users from the directory, but there are enough bugs that they leak through.

At least we're making it consistent here. There might be scope for a future change where we make this configurable per-appservice, but let's leave it for now.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to follow what's moving where in the unit tests, and why the changes you're making are safe - but I'm happy to take your word that it's still testing the right sort of thing. (For future reference: it might be preferable to split this into two PRs, where first of all we refactor all the tests, and secondly we clean up the code, might be preferable, as a way of demonstrating that we're not fundamentally changing behaviour).

A few thoughts and suggestions.

"""Certain classes of local user are omitted from the user directory.
Is this user one of them?
"""
if self.hs.is_mine_id(user):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please can we do an early return rather than a big nested if statement?

is_deactivated = await self.store.get_user_deactivated_status(user_id)

if not (is_support or is_deactivated):
if not await self.store.is_excluded_from_user_dir(user_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the double-negative here a bit mind-bending. Perhaps we should instead have an should_include_in_user_dir ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it's not just me. I was erring towards preserving the original logic below as much as possible, but think it'll be better this way round.

@@ -357,12 +352,8 @@ async def _handle_new_user(
# First, if they're our user then we need to update for every user
if self.is_mine_id(user_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this is redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we'll check to see if they're local within is_excluded_from_user_dir? I agree...

But. I think putting the localness check into is_excluded_from_user_dir makes things more confusing. To a first approximation, it's true that we'll never exclude a remote user from the user directory. With one exception: users_who_share_private_rooms. This contains triples (local_user_id, any_user_id, room_id) where the two users both belong to the given private room. Only local users live in the first entry of those tuples. (I think this is an optimisation?) To maintain that there's extra logic which needs to ensure we don't put remote users into that column.

I think it's going to be clearer if we have a function should_include_local_user_in_dir and make the local-vs-remote logic explicit.

await self.update_profile_in_user_dir(
user_id, profile.display_name, profile.avatar_url
)
if not await self.is_excluded_from_user_dir(user_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably fine, but for the record: what's the logic behind adding this here, and at line 353, where there was no check before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with _handle_deltas in the handlers code.

Rebuilding the user directory from scratch (here) should have the same outcome as starting with an empty directory and building it by listening for membership events (_handle_deltas). Before my changes, the latter excluded support and deactivated users, so I think the former should too.

Comment on lines +386 to +387
if self.get_if_app_services_interested_in_user(user):
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should do this test first, since it doesn't require a database hit and should be pretty quick.

Comment on lines +386 to +387
if self.get_if_app_services_interested_in_user(user):
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most of the code currently excludes AS users from the directory, but there are enough bugs that they leak through.

At least we're making it consistent here. There might be scope for a future change where we make this configurable per-appservice, but let's leave it for now.

tests/handlers/test_user_directory.py Show resolved Hide resolved
self.store, "remove_from_user_dir", return_value=defer.succeed(None)
) as mock:
self.get_success(self.handler.handle_local_user_deactivated(s_user_id))
self.get_success(mock.not_called())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do what it's supposed to? Mock doesn't seem to have a not_called method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Has this been broken for a while then!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to assert_not_called makes the test fail!

My reading: the test is trying to ensure that handle_local_user_deactivated doesn't try to delete a user from the db who would never have been in the directory. But I think handle_local_user_deactivated has never had any logic in it, so I think this test has been incorrectly passing forever.

We could make handle_local_user_deactivated call is_excluded_from_user_directory, but that involves a SELECT from the db anyway to see if they're deactivated or a support user. I'm guessing that's the same order of magnitude of delay as doing a no-op deletion? And so I think the current behaviour is fine.

I guess it's worth having this kind of test here though, to make sure that synapse doesn't fall over when we deactivate a support user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's worth having this kind of test here though, to make sure that synapse doesn't fall over when we deactivate a support user?

Additional: maybe test_user_directory isn't the right place for such a test.

self.store, "remove_from_user_dir", return_value=defer.succeed(None)
) as mock:
self.get_success(self.handler.handle_local_user_deactivated(r_user_id))
self.get_success(mock.called_once_with(r_user_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or a called_once_with method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the original author was missing an assert_ from the method names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is concerning (and it was very possibly me, though I can't remember). Is there any way we can do this that will properly result in a failure if a typo is made?

Copy link
Contributor Author

@DMRobertson DMRobertson Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different typing stubs for Mock maybe---removing its __getattr__?

Edit: or otherwise we could make get_success throw if it's called with or encounters a Mock object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of python 3.7, you can call mock.seal (https://docs.python.org/3/library/unittest.mock.html#sealing-mocks). But that doesn't really help you now, and I think it just replaces one thing to forget with another thing to forget.

DMRobertson pushed a commit that referenced this pull request Sep 30, 2021
We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the callers responsibility to make the local versus remote
distinction clear and correct.

A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.

Broken-off from #10914.
DMRobertson pushed a commit that referenced this pull request Sep 30, 2021
We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the callers responsibility to make the local versus remote
distinction clear and correct.

A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.

Broken-off from #10914.

Changes

By my reading this makes these changes:

Incremental updates:

* When an app service user registers or changes their profile, they will _not_ be added to the user directory. (Previously only support users were excluded). This is consistent with the logic that rebuilds the user directory. See also [the discussion here](#10914 (comment)).
* When a deactivated user joins a room, they will _not_ be added to the user directory. Previously they were. (This probably never happens, but the previous source code allows it.)
* When a room changes from public to private or vice versa, any deactivated or support users will _not_ be added to the directory. Previously they were.

Rebuild:

* When rebuilding the directory, exclude support and disabled users from room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the directory. Previously `_populate_user_directory_process_users` didn't do any exclusion.
DMRobertson pushed a commit that referenced this pull request Sep 30, 2021
We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the callers responsibility to make the local versus remote
distinction clear and correct.

A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.

Broken-off from #10914.

----

By my reading this makes these changes:

* When an app service user registers or changes their profile, they will
  _not_ be added to the user directory. (Previously only support and
  deactivated users were excluded). This is consistent with the logic that
  rebuilds the user directory. See also [the discussion
  here](#10914 (comment)).
* When rebuilding the directory, exclude support and disabled users from
  room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the
  directory. Previously `_populate_user_directory_process_users` didn't do
  any exclusion.
@DMRobertson
Copy link
Contributor Author

DMRobertson commented Sep 30, 2021

Thanks both for your reviews. I've chopped this up and applied your review comments, so closing this in lieu of #10891 and #10960.

DMRobertson pushed a commit that referenced this pull request Oct 4, 2021
* Introduce `should_include_local_users_in_dir`

We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the caller's responsibility to make the local versus remote
distinction clear and correct.

A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.

We also add new test cases to covert the exclusion logic.

----

By my reading this makes these changes:

* When an app service user registers or changes their profile, they will
  _not_ be added to the user directory. (Previously only support and
  deactivated users were excluded). This is consistent with the logic that
  rebuilds the user directory. See also [the discussion
  here](#10914 (comment)).
* When rebuilding the directory, exclude support and disabled users from
  room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the
  directory. Previously `_populate_user_directory_process_users` didn't do
  any exclusion.

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@DMRobertson DMRobertson deleted the dmr/consistent-user-dir-exclusion branch October 5, 2021 09:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants