Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Appservice stream position got stuck ~10k behind current, preventing requests to appservices #13950

Open
turt2live opened this issue Sep 29, 2022 · 5 comments
Labels
A-Application-Service Related to AS support O-Occasional Affects or can be seen by some users regularly or most users rarely O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@turt2live
Copy link
Member

Description

A repeat of #1834 essentially

image

Steps to reproduce

Unclear - it suddenly became sad.

Homeserver

t2bot.io

Synapse Version

1.68.0 + custom patches

Installation Method

pip (from PyPI)

Platform

Ubuntu physical hardware.

Relevant log output

Available upon request.

Anything else that would be useful to know?

After manually fastforwarding the stream position and restarting the worker it appeared to be running about 3-5 minutes behind for longer than expected. This may have been due to a larger server restart causing caches to be evicted during peak hours, though.

@turt2live
Copy link
Member Author

Suppose the stream position information itself would be useful:

synapse=# select max(stream_ordering) from events;
    max
-----------
 806649123
(1 row)

synapse=# select * from appservice_stream_position;
 lock | stream_ordering
------+-----------------
 X    |       806639937
(1 row)

synapse=# update appservice_stream_position set stream_ordering = (select max(stream_ordering) from events);
UPDATE 1

@richvdh
Copy link
Member

richvdh commented Sep 29, 2022

duplicate of #11629 ?

@DMRobertson
Copy link
Contributor

Is this correlated with an upgrade to 1.68.0?

custom patches

Are these publicly shareable?

@turt2live
Copy link
Member Author

duplicate of #11629 ?

Aside from the title, possibly. This was all bridges on t2bot.io, which are not new.

Is this correlated with an upgrade to 1.68.0?

Negative. 1.68.0 was applied on Tuesday (2 days ago)

custom patches

Are these publicly shareable?

Yup: develop...t2bot:synapse:t2bot.io (relevant patches might be around the appservice transaction optimization, but it was working "fine" up until the incident, and was working fine once fast-forwarded)

@DMRobertson DMRobertson added A-Application-Service Related to AS support S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely O-Uncommon Most users are unlikely to come across this or unexpected workflow labels Sep 29, 2022
@turt2live
Copy link
Member Author

ftr, ran into a variation of this today where the stream position was ~3600 behind, fluctuating towards worse badness. It eventually caught up on its own, however.

Not sure how the appservice sender is able to fall behind like this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support O-Occasional Affects or can be seen by some users regularly or most users rarely O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

3 participants