Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add some comments about how event push actions are stored #13445

Merged
merged 7 commits into from
Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13445.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add some comments about how event push actions are stored.
61 changes: 61 additions & 0 deletions synapse/storage/databases/main/event_push_actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,67 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Responsible for storing and fetching push actions / notifications.

There are two main uses for push actions:
1. Sending out push to a user's device; and
2. Tracking per-room per-user notification counts (used in sync requests).

For the former we simply use the `event_push_actions` table, which contains all
the calculated actions for a given user (which were calculated by the
`BulkPushRuleEvaluator`).

For the latter we could simply count the number of rows in `event_push_actions`
table for a given room/user, but in practice this is *very* heavyweight when
there were a large number of notifications (due to e.g. the user never reading a
room). Plus, keeping all push actions indefinitely uses a lot of disk space.

To fix these issues, we add a new table `event_push_summary` that tracks
per-user per-room counts of all notifications that happened before a stream
ordering S. Thus, to get the notification count for a user / room we can simply
query a single row in `event_push_summary` and count the number of rows in
`event_push_actions` with a stream ordering larger than S (and as long as S is
"recent", the number of rows needing to be scanned will be small).

The `event_push_summary` table is updated via a background job that periodically
chooses a new stream ordering S' (usually the latest stream ordering), counts
all notifications in `event_push_actions` between the existing S and S', and
adds them to the existing counts in `event_push_summary`.

This allows us to delete old rows from `event_push_actions` once those rows have
been counted and added to `event_push_summary` (we call this process
"rotation").


Comment on lines +46 to +47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this double line break on purpose? Are you trying to separate it into two sections (one for counting notifications and one for clearing them?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was vaguely intentional. Not sure if it helps others or if its just a distraction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be clearer to put a heading:

Suggested change
Clearing notifications with read receipts
=========================================

We need to handle when a user sends a read receipt to the room. Again this is
done as a background process. For each receipt we clear the row in
`event_push_summary` and count the number of notifications in
`event_push_actions` that happened after the receipt but before S, and insert
that count into `event_push_summary` (If the receipt happened *after* S then we
simply clear the `event_push_summary`).
erikjohnston marked this conversation as resolved.
Show resolved Hide resolved

Note that its possible that if the read receipt is for an old event the relevant
`event_push_actions` rows will have been rotated and we get the wrong count
(it'll be too low). We accept this as a rare edge case that is unlikely to
impact the user much (since the vast majority of read receipts will be for the
latest event).

The last complication is to handle the race where we request the notifications
counts after a user sends a read receipt into the room, but *before* the
background update handles the receipt (without any special handling the counts
would be outdated). We fix this by including in `event_push_summary` the read
receipt we used when updating `event_push_summary`, and every time we query the
table we check if that matches the most recent read receipt in the room. If yes,
continue as above, if not we simply query the `event_push_actions` table
directly.

Since read receipts are almost always for recent events, scanning the
`event_push_actions` table in this case is unlikely to be a problem. Even if it
is a problem, it is temporary until the background job handles the new read
receipt.
"""

import logging
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union, cast

Expand Down