Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Move update_client_ip background job from the main process to the background worker. #12251

Merged
merged 13 commits into from
Apr 1, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions synapse/replication/tcp/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,8 @@ def __init__(self, hs: "HomeServer"):
if self._is_master:
self._server_notices_sender = hs.get_server_notices_sender()

self._is_background_worker = hs.config.worker.run_background_tasks

def _add_command_to_stream_queue(
self, conn: IReplicationConnection, cmd: Union[RdataCommand, PositionCommand]
) -> None:
Expand Down Expand Up @@ -401,12 +403,27 @@ def on_USER_IP(
) -> Optional[Awaitable[None]]:
user_ip_cache_counter.inc()

if self._is_master:
if self._is_master or self._is_background_worker:
return self._handle_user_ip(cmd)
else:
return None

async def _handle_user_ip(self, cmd: UserIpCommand) -> None:
"""
Handles a User IP, branching depending on whether we are the main process
and/or the background worker.
"""
if self._is_master:
await self._handle_user_ip_as_master(cmd)

if self._is_background_worker:
await self._handle_user_ip_as_background_worker(cmd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to inline _handle_user_ip into on_USER_IP? Otherwise its a bit confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern with that is that it will mean that uninvolved workers end up creating and scheduling coroutines as background processes as well, whereas they didn't before.
I don't know how much that worries us, but since the code was already structured this way for some reason, I didn't want to remove it without understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, I'm not talking about a functional change here? _handle_user_ip is only called by on_USER_IP?

What's confusing me really is that the self._is_master or self._should_insert_client_ips a) looks odd by itself, and b) is entirely redundant given _handle_user_ip then does the exact same checks. You may as well just always call self._handle_user_ip in on_USER_IP, at which point having it as a separate function seems a bit silly?

Or am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_USER_IP returns either None or an awaitable. If it's an awaitable, it goes through extra stuff to schedule it as a background process; if it's just None then nothing happens.

Notably, the function can't (easily/nicely) be inlined and still preserve this behaviour of returning None if there's nothing to do, as I'd have to make on_USER_IP an async def in order to run potentially two coroutines back-to-back.

It may not be a big deal, but I wasn't sure and since replication overhead is pretty sensitive, I was a bit uncomfortable taking a leap of faith.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH. Right, with you now. How obscure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it took me forever where to find the code that calls on_%s because the name is dynamically constructed and I forgot that we like %s-formatting and was searching for "on_" :')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment in on_USER_IP about the whole awaitable thing please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — I also inlined the two specific handle_user_ip functions, since looking at them now, they're so short it wasn't buying any clarity.


async def _handle_user_ip_as_master(self, cmd: UserIpCommand) -> None:
assert self._server_notices_sender is not None
await self._server_notices_sender.on_user_ip(cmd.user_id)

async def _handle_user_ip_as_background_worker(self, cmd: UserIpCommand) -> None:
await self._store.insert_client_ip(
cmd.user_id,
cmd.access_token,
Expand All @@ -416,9 +433,6 @@ async def _handle_user_ip(self, cmd: UserIpCommand) -> None:
cmd.last_seen,
)

assert self._server_notices_sender is not None
await self._server_notices_sender.on_user_ip(cmd.user_id)

def on_RDATA(self, conn: IReplicationConnection, cmd: RdataCommand) -> None:
if cmd.instance_name == self._instance_name:
# Ignore RDATA that are just our own echoes
Expand Down