Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

[sqlite] Upgrade to 1.68 leads to bootlooping Synapse systemd service #14100

Open
axelsimon opened this issue Oct 7, 2022 · 6 comments
Open
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@axelsimon
Copy link
Contributor

axelsimon commented Oct 7, 2022

Description

After upgrading Debian packages for matrix-synapse-py3 (from 1.61.1 to 1.68.0 in this case), Synapse fails to run.

Upon further inspection, it appears that on startup Synapse tries to run a database schema update but doesn't finish and then restarts or gets restarted:

synapse.storage.prepare_database - 527 - INFO - main - Applying engine-specific schema 71/01rebuild_event_edges.sql.sqlite

Synapse gets killed before being able to finish the db migration.

Steps to reproduce

  • apt upgrade

restarting the matrix-synapse service makes no difference and leads to the same result

Solution

The fix is very simple, tanks to @reivilibre for suggesting it: systemd kills the matrix-synapse service unit for taking too long to report it has started ok (a combination of Type=notify and long execution time for that task).

Simply systemctl edit matrix-synapse.service and add the following override to increase the timeout and give enough time for the database upgrade to finish:

[Service]
TimeoutStartSec=300


Homeserver

personal homeserver

Synapse Version

1.68.0

Installation Method

Debian packages from packages.matrix.org

Platform

Debian 11.5

Relevant log output

405072-2022-10-06 14:38:09,765 - synapse.storage.prepare_database - 119 - INFO - main - ['main', 'state']: Existing schema is 71 (+0 deltas)
405073-2022-10-06 14:38:09,765 - synapse.storage.databases.main - 304 - INFO - main - Checking database for consistency with configuration...
405074-2022-10-06 14:38:09,767 - synapse.storage.prepare_database - 411 - INFO - main - Applying schema deltas for v71
405075:2022-10-06 14:38:09,767 - synapse.storage.prepare_database - 527 - INFO - main - Applying engine-specific schema 71/01rebuild_event_edges.sql.sqlite
405076-2022-10-06 17:28:44,945 - root - 343 - WARNING - main - ***** STARTING SERVER *****

Anything else that would be useful to know?

See Solution above

@axelsimon axelsimon changed the title [sqlite] Upgrade to 1.68 fails (timeout) on 71/01rebuild_event_edges.sql.sqlite [sqlite] Upgrade to 1.68 leads to bootlooping Synapse systemd service on 71/01rebuild_event_edges.sql.sqlite Oct 7, 2022
@axelsimon axelsimon changed the title [sqlite] Upgrade to 1.68 leads to bootlooping Synapse systemd service on 71/01rebuild_event_edges.sql.sqlite [sqlite] Upgrade to 1.68 leads to bootlooping Synapse systemd service Oct 7, 2022
@richvdh
Copy link
Member

richvdh commented Oct 7, 2022

This is a repeat of #13193

@squahtx
Copy link
Contributor

squahtx commented Oct 7, 2022

As a workaround, you could try running the command described in #13193 (comment) before trying to start the server:

sudo update_synapse_database --database-config /etc/synapse/homeserver.yaml

@squahtx squahtx added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db labels Oct 7, 2022
@richvdh
Copy link
Member

richvdh commented Oct 7, 2022

I don't think we should consider this closed. We can do more to help people here (docs? Increase the timeout?)

@richvdh richvdh reopened this Oct 7, 2022
@DMRobertson
Copy link
Contributor

Does systemd send us a signal in this situation? Could we catch it and log a warning "maybe migrations are too slow, try update_synapse_database"?

@DMRobertson
Copy link
Contributor

Does systemd send us a signal in this situation?

Aha. From man systemd-service:

TimeoutStartFailureMode=, TimeoutStopFailureMode=
These options configure the action that is taken in case a daemon service does not signal start-up within its configured
TimeoutStartSec=, respectively if it does not stop within TimeoutStopSec=. Takes one of terminate, abort and kill. Both options 
default to terminate.

If terminate is set the service will be gracefully terminated by sending the signal specified in KillSignal= (defaults to 
SIGTERM, see systemd.kill(5)). If the service does not terminate the FinalKillSignal= is sent after TimeoutStopSec=. If abort
is set, WatchdogSignal= is sent instead and TimeoutAbortSec= applies before sending FinalKillSignal=. This setting may be used
to analyze services that fail to start-up or shut-down intermittently. By using kill the service is immediately terminated by
sending FinalKillSignal= without any further timeout. This setting can be used to expedite the shutdown of failing services.

Can't see that set in https://github.com/matrix-org/synapse/blob/47db2c3673290ca1e0dff3bd4fb9f461c97c67c3/contrib/systemd/matrix-synapse.service, so we must get a SIGTERM.

@richvdh
Copy link
Member

richvdh commented Oct 10, 2022

Yes, we'll get a SIGTERM in this case. We could probably add a check in our SIGTERM handler (aka hs.get_reactor().addSystemEventTrigger( "before", "shutdown",... )) to see if the migrations are still running and log a warning if so. Does feel like we're dealing with the symptoms rather than the cause though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants