There is no retry logic for sending EDUs to application services #11150

matrixbot · 2023-12-19T03:29:50Z

This issue has been migrated from #11150.

When we send ephemeral events to appservices, we keep track of a stream token per appservice per EDU type. We do this for read receipts and presence updates (but not for typing, those are a little ephemeral to care about).

When a new read receipt comes through the server, we consider whether that should be sent to any connected application services. If so, we send those read receipts off. We then record the stream token of the read receipt for this appservice, even if we ended up determining that the appservice wasn't interested in the event (so that we wouldn't have to check that again later on).

This system works (although has scalability concerns), however there is no retry logic. We record the updated stream token even if sending to the application service fails:

https://github.com/matrix-org/synapse/blob/e584534403b55ad3f250f92592e30b15b01f0201/synapse/handlers/appservice.py#L239-L245

submit_ephemeral_events_for_as kicks off a background task which logs and then ignores exceptions:

https://github.com/matrix-org/synapse/blob/4b965c862dc66c0da5d3240add70e9b5f0aa720b/synapse/appservice/scheduler.py#L156-L159

The stream token is stored and updated for this appservice immediately after the background task kicks off. It makes sense to do this so that we do not end up with duplicated work while processing subsequent calls, even if sending to the appservice is slow, however we still need a way of deprecated/holding off updating the stored stream id until a 200 is returned from the appservice. Only then should we mark the appservice as successfully having processed up to that stream token.

This doesn't matter so much with read receipts or presence updates, but will become much more important when we start passing things like device lists over this channel. A blip in the network will lead to decryption errors down the line.

Note that this behaviour for the current set of supported EDUs is intentional:

https://github.com/matrix-org/synapse/blob/e584534403b55ad3f250f92592e30b15b01f0201/synapse/storage/databases/main/appservice.py#L192-L200

The stream tokens are stored in the application_services_state table, which has the schema:

synapse=# \d application_services_state
                   Table "public.application_services_state"
         Column         |         Type         | Collation | Nullable | Default 
------------------------+----------------------+-----------+----------+---------
 as_id                  | text                 |           | not null | 
 state                  | character varying(5) |           |          | 
 last_txn               | bigint               |           |          | 
 read_receipt_stream_id | bigint               |           |          | 
 presence_stream_id     | bigint               |           |          | 

Indexes:
    "application_services_state_pkey" PRIMARY KEY, btree (as_id)

The text was updated successfully, but these errors were encountered:

matrixbot closed this as completed Dec 19, 2023

matrixbot changed the title ~~Dummy issue~~ There is no retry logic for sending EDUs to application services Dec 21, 2023

matrixbot added A-Application-Service T-Enhancement labels Dec 21, 2023

matrixbot reopened this Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is no retry logic for sending EDUs to application services #11150

There is no retry logic for sending EDUs to application services #11150

matrixbot commented Dec 19, 2023 •

edited

Loading

There is no retry logic for sending EDUs to application services #11150

There is no retry logic for sending EDUs to application services #11150

Comments

matrixbot commented Dec 19, 2023 • edited Loading

matrixbot commented Dec 19, 2023 •

edited

Loading