Speed up push actions/unread counting #13846

Fizzadar · 2022-09-20T10:20:54Z

I've been spending a bunch of time looking into push action processing & badge counting and I think there'd be a real benefit to separating out push actions from summaries (notification/unread/highlight counts). There is a lot of code and complexity introduced in the mechanism of rolling event push actions into summaries, and the push actions table conflates push notifications and push counters. Suggestions:

Push actions

Firstly push actions stay as is using event_push_actions, this gets deleted on receipt and read/deleted by the pusher instances, no need for any background work. No longer used for any badge counting.

Badge counts

For badge counts we add a new table, event_push_counts, that looks roughly like (pseudo-SQL):

CREATE TABLE event_push_counts (
    user_id text,
    room_id text,
    thread_id text,
    event_stream_ordering bigint,
    notifs bigint,
    unreads bigint,
    highlights bigint,
)
ALTER TABLE `event_push_counts` ADD CONSTRAINT uniq (user_id, room_id, thread_id, event_stream_ordering);

The key here is that this is not unique per user/room/thread but also event stream ordering. This means that as events come in new rows simply get appended according to the push actions per user. This prevents any contention issues during the critical event insertion path.

This makes counting a users total unreads very simple - instead of the current loop & count per room, simply:

# Counting all events for push badges
SELECT SUM(unreads)
FROM event_push_counts
WHERE user_id = '@blah:matrix.org'

# Counting unread rooms for push badges
SELECT COUNT(room_id)
FROM event_push_counts
WHERE user_id = '@blah:matrix.org'
GROUP BY room_id

# Separated room/thread counts for sync responses
SELECT SUM(unreads), SUM(notifs), SUM(highlights), room_id, thread_id
FROM event_push_counts
WHERE user_id = '@blah:matrix.org'
AND room_id IN ('!abc:matrix.org', '!def:beeper.com')
GROUP BY room_id, thread_id

The same applies to counting unreads for rooms in sync responses.

It is still possible to summarise these by merging rows into a higher stream ordering. Like the current system this doesn't account for receipts not at the latest stream ordering, but the summarisaton could be delayed to provide a window of support for this if desired. The table is leaner than the push actions table so this shouldn't be such an issue (but still important to do to keep the table fast).

Finally, rows could be cleaned out either on receipt on as a background job processing receipts. If there was sufficient (24h?) delay before any summarisation phase, deleting on receipt shouldn't result in much contention on the table.

(If this seems sensible, I can invest time to implement over the next few weeks)

The text was updated successfully, but these errors were encountered:

clokep · 2022-09-20T12:46:48Z

@Fizzadar Just a heads up that I've been doing very invasive work on push actions & summaries to add support for threads and would hate to see us generate massive conflicts! See #12550 for a jumping off point.

Fizzadar · 2022-09-21T10:36:27Z

@clokep 👍 ty for the pointer! Have updated my post to reflect inclusion of thread_id column and query to fetch the sync unread counts!

erikjohnston · 2022-09-27T15:50:07Z

instead of the current loop & count per room

Argh that is terrible loop and I want to kill it with fire. Though surely we can do something better with the current set up? Though might be annoyingly complicated.

Finally, rows could be cleaned out either on receipt on as a background job processing receipts. If there was sufficient (24h?) delay before any summarisation phase, deleting on receipt shouldn't result in much contention on the table.

Ah, so the penny has finally dropped in my head. This is very similar to what we had before (except re-using the event_push_actions table and doing two queries for lookup), but it had an annoying edge case that absolutely tanked performance: if a user sends a bunch of messages into a room with a huge number of members it resulted in a huge number of entries in event_push_actions, which was enough to cause serious slowness when summing over the table.

Having a dedicated event_push_counts would likely be a bit better (since it'd be smaller), but I think would suffer the same failure mode, if I'm understanding correctly.

As a side note: I wonder if most of the complexity comes from the sheer amount of compat code going on, and if just clearing that out would make things much easier to reason about?

Fizzadar · 2022-10-04T10:53:30Z

Argh that is terrible loop and I want to kill it with fire. Though surely we can do something better with the current set up? Though might be annoyingly complicated.

I think this is a good point, I did briefly look into this I wonder how we could get the aggregate unread count for a user in a single DB txn vs. the loop.

Having a dedicated event_push_counts would likely be a bit better (since it'd be smaller), but I think would suffer the same failure mode, if I'm understanding correctly.

Yeah this sounds about right, smaller table but would still have to be regularly aggregated for performance (I think) which would roughly end up where we are anyway.

As a side note: I wonder if most of the complexity comes from the sheer amount of compat code going on, and if just clearing that out would make things much easier to reason about?

I think so! Wonder if removing some of that would open new doors for optimising the current tables ...

Also thinking moving some of the deletion logic into receipts would help simplify things here (#13834), as would no longer need the background worker to process receipts. Need to confirm that the change doesn't break things though.

Fizzadar · 2022-10-21T12:51:34Z

Pulled together the aggregate query to remove that loop in #14255. I think the queries there highlight well the complex nature of the current table -> summarisation table setup. I don't have any idea of how to simplify it yet though!

clokep · 2023-02-09T12:29:43Z

@Fizzadar Is this all done then with the work in #14255 or do you think there's more to do?

Fizzadar · 2023-02-21T20:55:20Z

@clokep I'd still like to PR my suggested schema in the original post. I have an experimental implementation in our fork (not currently in use for users, but POC it works). Still believe separating out the counting from push actions is the right direction here, just need to find some time to pull together something more concrete :)

Fizzadar changed the title ~~Separate push actions & summary table~~ Speed up push actions/unread counting Oct 10, 2022

Fizzadar mentioned this issue Oct 10, 2022

Optimise push action processing #13448

Open

Fizzadar mentioned this issue Oct 21, 2022

Aggregate unread notif count query for badge count calculation #14255

Merged

4 tasks

matrixbot mentioned this issue Dec 21, 2023

Speed up push actions/unread counting element-hq/synapse#13846

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up push actions/unread counting #13846

Speed up push actions/unread counting #13846

Fizzadar commented Sep 20, 2022 •

edited

Loading

clokep commented Sep 20, 2022

Fizzadar commented Sep 21, 2022

erikjohnston commented Sep 27, 2022

Fizzadar commented Oct 4, 2022

Fizzadar commented Oct 21, 2022

clokep commented Feb 9, 2023

Fizzadar commented Feb 21, 2023

Speed up push actions/unread counting #13846

Speed up push actions/unread counting #13846

Comments

Fizzadar commented Sep 20, 2022 • edited Loading

Push actions

Badge counts

clokep commented Sep 20, 2022

Fizzadar commented Sep 21, 2022

erikjohnston commented Sep 27, 2022

Fizzadar commented Oct 4, 2022

Fizzadar commented Oct 21, 2022

clokep commented Feb 9, 2023

Fizzadar commented Feb 21, 2023

Fizzadar commented Sep 20, 2022 •

edited

Loading