Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no retry logic for sending EDUs to application services #11150

Open
matrixbot opened this issue Dec 19, 2023 · 0 comments
Open

There is no retry logic for sending EDUs to application services #11150

matrixbot opened this issue Dec 19, 2023 · 0 comments

Comments

@matrixbot
Copy link
Collaborator

matrixbot commented Dec 19, 2023

This issue has been migrated from #11150.


When we send ephemeral events to appservices, we keep track of a stream token per appservice per EDU type. We do this for read receipts and presence updates (but not for typing, those are a little ephemeral to care about).

When a new read receipt comes through the server, we consider whether that should be sent to any connected application services. If so, we send those read receipts off. We then record the stream token of the read receipt for this appservice, even if we ended up determining that the appservice wasn't interested in the event (so that we wouldn't have to check that again later on).

This system works (although has scalability concerns), however there is no retry logic. We record the updated stream token even if sending to the application service fails:

https://github.com/matrix-org/synapse/blob/e584534403b55ad3f250f92592e30b15b01f0201/synapse/handlers/appservice.py#L239-L245

submit_ephemeral_events_for_as kicks off a background task which logs and then ignores exceptions:

https://github.com/matrix-org/synapse/blob/4b965c862dc66c0da5d3240add70e9b5f0aa720b/synapse/appservice/scheduler.py#L156-L159

The stream token is stored and updated for this appservice immediately after the background task kicks off. It makes sense to do this so that we do not end up with duplicated work while processing subsequent calls, even if sending to the appservice is slow, however we still need a way of deprecated/holding off updating the stored stream id until a 200 is returned from the appservice. Only then should we mark the appservice as successfully having processed up to that stream token.

This doesn't matter so much with read receipts or presence updates, but will become much more important when we start passing things like device lists over this channel. A blip in the network will lead to decryption errors down the line.

Note that this behaviour for the current set of supported EDUs is intentional:

https://github.com/matrix-org/synapse/blob/e584534403b55ad3f250f92592e30b15b01f0201/synapse/storage/databases/main/appservice.py#L192-L200

The stream tokens are stored in the application_services_state table, which has the schema:

synapse=# \d application_services_state
                   Table "public.application_services_state"
         Column         |         Type         | Collation | Nullable | Default 
------------------------+----------------------+-----------+----------+---------
 as_id                  | text                 |           | not null | 
 state                  | character varying(5) |           |          | 
 last_txn               | bigint               |           |          | 
 read_receipt_stream_id | bigint               |           |          | 
 presence_stream_id     | bigint               |           |          | 

Indexes:
    "application_services_state_pkey" PRIMARY KEY, btree (as_id)
@matrixbot matrixbot changed the title Dummy issue There is no retry logic for sending EDUs to application services Dec 21, 2023
@matrixbot matrixbot reopened this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant