-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Do not keep going if there are 5 back-to-back background update failures. #12781
Do not keep going if there are 5 back-to-back background update failures. #12781
Conversation
If we do fail 5 times, would there be any indication in CI? |
Yeah, the test will fail. (Before this, it would spin, seemingly forever) |
Will this cause user's homeservers to potentially shutdown after running for a bit? That seems quite confusing. Additionally -- will this stop all background processes from running or will the ones after the failed one continue? |
No. The logs have an error along the lines of StopIteration: [{'update_name': 'add_rooms_room_version_column', 'depends_on': None}, {'update_name': 'current_state_events_membership', 'depends_on': None}, {'update_name': 'delete_old_current_state_events', 'depends_on': 'current_state_events_membership'}, {'update_name': 'devices_last_seen', 'depends_on': None}, {'update_name': 'event_store_labels', 'depends_on': None}, {'update_name': 'insert_room_retention', 'depends_on': None}, {'update_name': 'populate_stats_process_users', 'depends_on': 'populate_stats_process_rooms'}, {'update_name': 'redactions_have_censored_ts_idx', 'depends_on': None}, {'update_name': 'redactions_received_ts', 'depends_on': None}, {'update_name': 'remove_tombstoned_rooms_from_directory', 'depends_on': None}, {'update_name': 'room_membership_forgotten_idx', 'depends_on': None}, {'update_name': 'state_groups_room_id_idx', 'depends_on': None}, {'update_name': 'users_set_deactivated_flag', 'depends_on': None}, {'update_name': 'remove_dup_outbound_pokes', 'depends_on': None}, {'update_name': 'populate_stats_process_rooms', 'depends_on': None}, {'update_name': 'users_have_local_media', 'depends_on': None}, {'update_name': 'e2e_cross_signing_keys_idx', 'depends_on': None}, {'update_name': 'user_external_ids_user_id_idx', 'depends_on': None}, {'update_name': 'rejected_events_metadata', 'depends_on': None}, {'update_name': 'chain_cover', 'depends_on': 'rejected_events_metadata'}, {'update_name': 'remove_deactivated_pushers', 'depends_on': None}, {'update_name': 'remove_stale_pushers', 'depends_on': None}, {'update_name': 'purged_chain_cover', 'depends_on': None}, {'update_name': 'populate_rooms_creator_column', 'depends_on': None}, {'update_name': 'remove_deleted_email_pushers', 'depends_on': None}, {'update_name': 'presence_stream_not_offline_index', 'depends_on': None}, {'update_name': 'remove_hidden_devices_from_device_inbox', 'depends_on': None}, {'update_name': 'local_group_updates_index', 'depends_on': None}, {'update_name': 'remove_deleted_devices_from_device_inbox', 'depends_on': None}, {'update_name': 'event_arbitrary_relations', 'depends_on': None}, {'update_name': 'remove_dead_devices_from_device_inbox', 'depends_on': None}, {'update_name': 'delete_account_data_for_deactivated_users', 'depends_on': None}, {'update_name': 'event_search_sqlite_delete_non_strings', 'depends_on': None}, {'update_name': 'cache_invalidation_index_by_instance', 'depends_on': None}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rei/work/synapse/synapse/storage/background_updates.py", line 291, in run_background_updates
result = await self.do_next_background_update(sleep)
File "/home/rei/work/synapse/synapse/storage/background_updates.py", line 413, in do_next_background_update
update_info = self._background_update_handlers[self._current_background_update + 'x']
KeyError: 'add_rooms_room_version_columnx'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rei/work/synapse/synapse/metrics/background_process_metrics.py", line 243, in run
return await func(*args, **kwargs)
File "/home/rei/work/synapse/synapse/storage/background_updates.py", line 296, in run_background_updates
raise RuntimeError(
RuntimeError: 5 back-to-back background update failures; aborting. but the homeserver otherwise continues to operate.
Yes, I think so. But that's no worse than before as no later background updates will get any time if the first one spins. This is essentially the same failure mode as before, except:
|
This is what we want! Sorry if it sounded like I was implying that was bad. This is a good thing since updates are ordered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Synapse 1.60.0rc1 (2022-05-24) ============================== This release of Synapse adds a unique index to the `state_group_edges` table, in order to prevent accidentally introducing duplicate information (for example, because a database backup was restored multiple times). If your Synapse database already has duplicate rows in this table, this could fail with an error and require manual remediation. Additionally, the signature of the `check_event_for_spam` module callback has changed. The previous signature has been deprecated and remains working for now. Module authors should update their modules to use the new signature where possible. See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600) for more details. Features -------- - Measure the time taken in spam-checking callbacks and expose those measurements as metrics. ([\#12513](#12513)) - Add a `default_power_level_content_override` config option to set default room power levels per room preset. ([\#12618](#12618)) - Add support for [MSC3787: Allowing knocks to restricted rooms](matrix-org/matrix-spec-proposals#3787). ([\#12623](#12623)) - Send `USER_IP` commands on a different Redis channel, in order to reduce traffic to workers that do not process these commands. ([\#12672](#12672), [\#12809](#12809)) - Synapse will now reload [cache config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#caching) when it receives a [SIGHUP](https://en.wikipedia.org/wiki/SIGHUP) signal. ([\#12673](#12673)) - Add a config options to allow for auto-tuning of caches. ([\#12701](#12701)) - Update [MSC2716](matrix-org/matrix-spec-proposals#2716) implementation to process marker events from the current state to avoid markers being lost in timeline gaps for federated servers which would cause the imported history to be undiscovered. ([\#12718](#12718)) - Add a `drop_federated_event` callback to `SpamChecker` to disregard inbound federated events before they take up much processing power, in an emergency. ([\#12744](#12744)) - Implement [MSC3818: Copy room type on upgrade](matrix-org/matrix-spec-proposals#3818). ([\#12786](#12786), [\#12792](#12792)) - Update to the `check_event_for_spam` module callback. Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\#12808](#12808)) Bugfixes -------- - Fix a bug introduced in Synapse 1.7.0 that would prevent events from being sent to clients if there's a retention policy in the room when the support for retention policies is disabled. ([\#12611](#12611)) - Fix a bug introduced in Synapse 1.57.0 where `/messages` would throw a 500 error when querying for a non-existent room. ([\#12683](#12683)) - Add a unique index to `state_group_edges` to prevent duplicates being accidentally introduced and the consequential impact to performance. ([\#12687](#12687)) - Fix a long-standing bug where an empty room would be created when a user with an insufficient power level tried to upgrade a room. ([\#12696](#12696)) - Fix a bug introduced in Synapse 1.30.0 where empty rooms could be automatically created if a monthly active users limit is set. ([\#12713](#12713)) - Fix push to dismiss notifications when read on another client. Contributed by @SpiritCroc @ Beeper. ([\#12721](#12721)) - Fix poor database performance when reading the cache invalidation stream for large servers with lots of workers. ([\#12747](#12747)) - Delete events from the `federation_inbound_events_staging` table when a room is purged through the admin API. ([\#12770](#12770)) - Give a meaningful error message when a client tries to create a room with an invalid alias localpart. ([\#12779](#12779)) - Fix a bug introduced in 1.43.0 where a file (`providers.json`) was never closed. Contributed by @arkamar. ([\#12794](#12794)) - Fix a long-standing bug where finished log contexts would be re-started when failing to contact remote homeservers. ([\#12803](#12803)) - Fix a bug, introduced in Synapse 1.21.0, that led to media thumbnails being unusable before the index has been added in the background. ([\#12823](#12823)) Updates to the Docker image --------------------------- - Fix the docker file after a dependency update. ([\#12853](#12853)) Improved Documentation ---------------------- - Fix a typo in the Media Admin API documentation. ([\#12715](#12715)) - Update the OpenID Connect example for Keycloak to be compatible with newer versions of Keycloak. Contributed by @nhh. ([\#12727](#12727)) - Fix typo in server listener documentation. ([\#12742](#12742)) - Link to the configuration manual from the welcome page of the documentation. ([\#12748](#12748)) - Fix typo in `run_background_tasks_on` option name in configuration manual documentation. ([\#12749](#12749)) - Add information regarding the `rc_invites` ratelimiting option to the configuration docs. ([\#12759](#12759)) - Add documentation for cancellation of request processing. ([\#12761](#12761)) - Recommend using docker to run tests against postgres. ([\#12765](#12765)) - Add missing user directory endpoint from the generic worker documentation. Contributed by @olmari. ([\#12773](#12773)) - Add additional info to documentation of config option `cache_autotuning`. ([\#12776](#12776)) - Update configuration manual documentation to document size-related suffixes. ([\#12777](#12777)) - Fix invalid YAML syntax in the example documentation for the `url_preview_accept_language` config option. ([\#12785](#12785)) Deprecations and Removals ------------------------- - Require a body in POST requests to `/rooms/{roomId}/receipt/{receiptType}/{eventId}`, as required by the [Matrix specification](https://spec.matrix.org/v1.2/client-server-api/#post_matrixclientv3roomsroomidreceiptreceipttypeeventid). This breaks compatibility with Element Android 1.2.0 and earlier: users of those clients will be unable to send read receipts. ([\#12709](#12709)) Internal Changes ---------------- - Improve event caching mechanism to avoid having multiple copies of an event in memory at a time. ([\#10533](#10533)) - Preparation for faster-room-join work: return subsets of room state which we already have, immediately. ([\#12498](#12498)) - Add `@cancellable` decorator, for use on endpoint methods that can be cancelled when clients disconnect. ([\#12586](#12586), [\#12588](#12588), [\#12630](#12630), [\#12694](#12694), [\#12698](#12698), [\#12699](#12699), [\#12700](#12700), [\#12705](#12705)) - Enable cancellation of `GET /rooms/$room_id/members`, `GET /rooms/$room_id/state` and `GET /rooms/$room_id/state/$event_type/*` requests. ([\#12708](#12708)) - Improve documentation of the `synapse.push` module. ([\#12676](#12676)) - Refactor functions to on `PushRuleEvaluatorForEvent`. ([\#12677](#12677)) - Preparation for database schema simplifications: stop writing to `event_reference_hashes`. ([\#12679](#12679)) - Remove code which updates unused database column `application_services_state.last_txn`. ([\#12680](#12680)) - Refactor `EventContext` class. ([\#12689](#12689)) - Remove an unneeded class in the push code. ([\#12691](#12691)) - Consolidate parsing of relation information from events. ([\#12693](#12693)) - Convert namespace class `Codes` into a string enum. ([\#12703](#12703)) - Optimize private read receipt filtering. ([\#12711](#12711)) - Drop the logging level of status messages for the URL preview cache expiry job from INFO to DEBUG. ([\#12720](#12720)) - Downgrade some OIDC errors to warnings in the logs, to reduce the noise of Sentry reports. ([\#12723](#12723)) - Update configs used by Complement to allow more invites/3PID validations during tests. ([\#12731](#12731)) - Fix a long-standing bug where the user directory background process would fail to make forward progress if a user included a null codepoint in their display name or avatar. ([\#12762](#12762)) - Tweak the mypy plugin so that `@cached` can accept `on_invalidate=None`. ([\#12769](#12769)) - Move methods that call `add_push_rule` to the `PushRuleStore` class. ([\#12772](#12772)) - Make handling of federation Authorization header (more) compliant with RFC7230. ([\#12774](#12774)) - Refactor `resolve_state_groups_for_events` to not pull out full state when no state resolution happens. ([\#12775](#12775)) - Do not keep going if there are 5 back-to-back background update failures. ([\#12781](#12781)) - Fix federation when using the demo scripts. ([\#12783](#12783)) - The `hash_password` script now fails when it is called without specifying a config file. Contributed by @jae1911. ([\#12789](#12789)) - Improve and fix type hints. ([\#12567](#12567), [\#12477](#12477), [\#12717](#12717), [\#12753](#12753), [\#12695](#12695), [\#12734](#12734), [\#12716](#12716), [\#12726](#12726), [\#12790](#12790), [\#12833](#12833)) - Update EventContext `get_current_event_ids` and `get_prev_event_ids` to accept state filters and update calls where possible. ([\#12791](#12791)) - Remove Caddy from the Synapse workers image used in Complement. ([\#12818](#12818)) - Add Complement's shared registration secret to the Complement worker image. This fixes tests that depend on it. ([\#12819](#12819)) - Support registering Application Services when running with workers under Complement. ([\#12826](#12826)) - Disable 'faster room join' Complement tests when testing against Synapse with workers. ([\#12842](#12842))
Synapse 1.60.0 (2022-05-31) =========================== This release of Synapse adds a unique index to the `state_group_edges` table, in order to prevent accidentally introducing duplicate information (for example, because a database backup was restored multiple times). If your Synapse database already has duplicate rows in this table, this could fail with an error and require manual remediation. Additionally, the signature of the `check_event_for_spam` module callback has changed. The previous signature has been deprecated and remains working for now. Module authors should update their modules to use the new signature where possible. See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600) for more details. Bugfixes -------- - Fix a bug introduced in Synapse 1.60.0rc1 that would break some imports from `synapse.module_api`. ([\matrix-org#12918](matrix-org#12918)) Synapse 1.60.0rc2 (2022-05-27) ============================== Features -------- - Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\matrix-org#12883](matrix-org#12883)) Bugfixes -------- - Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\matrix-org#12875](matrix-org#12875)) Internal Changes ---------------- - Improve URL previews by not including the content of media tags in the generated description. ([\matrix-org#12887](matrix-org#12887)) Synapse 1.60.0rc1 (2022-05-24) ============================== Features -------- - Measure the time taken in spam-checking callbacks and expose those measurements as metrics. ([\matrix-org#12513](matrix-org#12513)) - Add a `default_power_level_content_override` config option to set default room power levels per room preset. ([\matrix-org#12618](matrix-org#12618)) - Add support for [MSC3787: Allowing knocks to restricted rooms](matrix-org/matrix-spec-proposals#3787). ([\matrix-org#12623](matrix-org#12623)) - Send `USER_IP` commands on a different Redis channel, in order to reduce traffic to workers that do not process these commands. ([\matrix-org#12672](matrix-org#12672), [\matrix-org#12809](matrix-org#12809)) - Synapse will now reload [cache config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#caching) when it receives a [SIGHUP](https://en.wikipedia.org/wiki/SIGHUP) signal. ([\matrix-org#12673](matrix-org#12673)) - Add a config options to allow for auto-tuning of caches. ([\matrix-org#12701](matrix-org#12701)) - Update [MSC2716](matrix-org/matrix-spec-proposals#2716) implementation to process marker events from the current state to avoid markers being lost in timeline gaps for federated servers which would cause the imported history to be undiscovered. ([\matrix-org#12718](matrix-org#12718)) - Add a `drop_federated_event` callback to `SpamChecker` to disregard inbound federated events before they take up much processing power, in an emergency. ([\matrix-org#12744](matrix-org#12744)) - Implement [MSC3818: Copy room type on upgrade](matrix-org/matrix-spec-proposals#3818). ([\matrix-org#12786](matrix-org#12786), [\matrix-org#12792](matrix-org#12792)) - Update to the `check_event_for_spam` module callback. Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\matrix-org#12808](matrix-org#12808)) Bugfixes -------- - Fix a bug introduced in Synapse 1.7.0 that would prevent events from being sent to clients if there's a retention policy in the room when the support for retention policies is disabled. ([\matrix-org#12611](matrix-org#12611)) - Fix a bug introduced in Synapse 1.57.0 where `/messages` would throw a 500 error when querying for a non-existent room. ([\matrix-org#12683](matrix-org#12683)) - Add a unique index to `state_group_edges` to prevent duplicates being accidentally introduced and the consequential impact to performance. ([\matrix-org#12687](matrix-org#12687)) - Fix a long-standing bug where an empty room would be created when a user with an insufficient power level tried to upgrade a room. ([\matrix-org#12696](matrix-org#12696)) - Fix a bug introduced in Synapse 1.30.0 where empty rooms could be automatically created if a monthly active users limit is set. ([\matrix-org#12713](matrix-org#12713)) - Fix push to dismiss notifications when read on another client. Contributed by @SpiritCroc @ Beeper. ([\matrix-org#12721](matrix-org#12721)) - Fix poor database performance when reading the cache invalidation stream for large servers with lots of workers. ([\matrix-org#12747](matrix-org#12747)) - Delete events from the `federation_inbound_events_staging` table when a room is purged through the admin API. ([\matrix-org#12770](matrix-org#12770)) - Give a meaningful error message when a client tries to create a room with an invalid alias localpart. ([\matrix-org#12779](matrix-org#12779)) - Fix a bug introduced in 1.43.0 where a file (`providers.json`) was never closed. Contributed by @arkamar. ([\matrix-org#12794](matrix-org#12794)) - Fix a long-standing bug where finished log contexts would be re-started when failing to contact remote homeservers. ([\matrix-org#12803](matrix-org#12803)) - Fix a bug, introduced in Synapse 1.21.0, that led to media thumbnails being unusable before the index has been added in the background. ([\matrix-org#12823](matrix-org#12823)) Updates to the Docker image --------------------------- - Fix the docker file after a dependency update. ([\matrix-org#12853](matrix-org#12853)) Improved Documentation ---------------------- - Fix a typo in the Media Admin API documentation. ([\matrix-org#12715](matrix-org#12715)) - Update the OpenID Connect example for Keycloak to be compatible with newer versions of Keycloak. Contributed by @nhh. ([\matrix-org#12727](matrix-org#12727)) - Fix typo in server listener documentation. ([\matrix-org#12742](matrix-org#12742)) - Link to the configuration manual from the welcome page of the documentation. ([\matrix-org#12748](matrix-org#12748)) - Fix typo in `run_background_tasks_on` option name in configuration manual documentation. ([\matrix-org#12749](matrix-org#12749)) - Add information regarding the `rc_invites` ratelimiting option to the configuration docs. ([\matrix-org#12759](matrix-org#12759)) - Add documentation for cancellation of request processing. ([\matrix-org#12761](matrix-org#12761)) - Recommend using docker to run tests against postgres. ([\matrix-org#12765](matrix-org#12765)) - Add missing user directory endpoint from the generic worker documentation. Contributed by @olmari. ([\matrix-org#12773](matrix-org#12773)) - Add additional info to documentation of config option `cache_autotuning`. ([\matrix-org#12776](matrix-org#12776)) - Update configuration manual documentation to document size-related suffixes. ([\matrix-org#12777](matrix-org#12777)) - Fix invalid YAML syntax in the example documentation for the `url_preview_accept_language` config option. ([\matrix-org#12785](matrix-org#12785)) Deprecations and Removals ------------------------- - Require a body in POST requests to `/rooms/{roomId}/receipt/{receiptType}/{eventId}`, as required by the [Matrix specification](https://spec.matrix.org/v1.2/client-server-api/#post_matrixclientv3roomsroomidreceiptreceipttypeeventid). This breaks compatibility with Element Android 1.2.0 and earlier: users of those clients will be unable to send read receipts. ([\matrix-org#12709](matrix-org#12709)) Internal Changes ---------------- - Improve event caching mechanism to avoid having multiple copies of an event in memory at a time. ([\matrix-org#10533](matrix-org#10533)) - Preparation for faster-room-join work: return subsets of room state which we already have, immediately. ([\matrix-org#12498](matrix-org#12498)) - Add `@cancellable` decorator, for use on endpoint methods that can be cancelled when clients disconnect. ([\matrix-org#12586](matrix-org#12586), [\matrix-org#12588](matrix-org#12588), [\matrix-org#12630](matrix-org#12630), [\matrix-org#12694](matrix-org#12694), [\matrix-org#12698](matrix-org#12698), [\matrix-org#12699](matrix-org#12699), [\matrix-org#12700](matrix-org#12700), [\matrix-org#12705](matrix-org#12705)) - Enable cancellation of `GET /rooms/$room_id/members`, `GET /rooms/$room_id/state` and `GET /rooms/$room_id/state/$event_type/*` requests. ([\matrix-org#12708](matrix-org#12708)) - Improve documentation of the `synapse.push` module. ([\matrix-org#12676](matrix-org#12676)) - Refactor functions to on `PushRuleEvaluatorForEvent`. ([\matrix-org#12677](matrix-org#12677)) - Preparation for database schema simplifications: stop writing to `event_reference_hashes`. ([\matrix-org#12679](matrix-org#12679)) - Remove code which updates unused database column `application_services_state.last_txn`. ([\matrix-org#12680](matrix-org#12680)) - Refactor `EventContext` class. ([\matrix-org#12689](matrix-org#12689)) - Remove an unneeded class in the push code. ([\matrix-org#12691](matrix-org#12691)) - Consolidate parsing of relation information from events. ([\matrix-org#12693](matrix-org#12693)) - Convert namespace class `Codes` into a string enum. ([\matrix-org#12703](matrix-org#12703)) - Optimize private read receipt filtering. ([\matrix-org#12711](matrix-org#12711)) - Drop the logging level of status messages for the URL preview cache expiry job from INFO to DEBUG. ([\matrix-org#12720](matrix-org#12720)) - Downgrade some OIDC errors to warnings in the logs, to reduce the noise of Sentry reports. ([\matrix-org#12723](matrix-org#12723)) - Update configs used by Complement to allow more invites/3PID validations during tests. ([\matrix-org#12731](matrix-org#12731)) - Fix a long-standing bug where the user directory background process would fail to make forward progress if a user included a null codepoint in their display name or avatar. ([\matrix-org#12762](matrix-org#12762)) - Tweak the mypy plugin so that `@cached` can accept `on_invalidate=None`. ([\matrix-org#12769](matrix-org#12769)) - Move methods that call `add_push_rule` to the `PushRuleStore` class. ([\matrix-org#12772](matrix-org#12772)) - Make handling of federation Authorization header (more) compliant with RFC7230. ([\matrix-org#12774](matrix-org#12774)) - Refactor `resolve_state_groups_for_events` to not pull out full state when no state resolution happens. ([\matrix-org#12775](matrix-org#12775)) - Do not keep going if there are 5 back-to-back background update failures. ([\matrix-org#12781](matrix-org#12781)) - Fix federation when using the demo scripts. ([\matrix-org#12783](matrix-org#12783)) - The `hash_password` script now fails when it is called without specifying a config file. Contributed by @jae1911. ([\matrix-org#12789](matrix-org#12789)) - Improve and fix type hints. ([\matrix-org#12567](matrix-org#12567), [\matrix-org#12477](matrix-org#12477), [\matrix-org#12717](matrix-org#12717), [\matrix-org#12753](matrix-org#12753), [\matrix-org#12695](matrix-org#12695), [\matrix-org#12734](matrix-org#12734), [\matrix-org#12716](matrix-org#12716), [\matrix-org#12726](matrix-org#12726), [\matrix-org#12790](matrix-org#12790), [\matrix-org#12833](matrix-org#12833)) - Update EventContext `get_current_event_ids` and `get_prev_event_ids` to accept state filters and update calls where possible. ([\matrix-org#12791](matrix-org#12791)) - Remove Caddy from the Synapse workers image used in Complement. ([\matrix-org#12818](matrix-org#12818)) - Add Complement's shared registration secret to the Complement worker image. This fixes tests that depend on it. ([\matrix-org#12819](matrix-org#12819)) - Support registering Application Services when running with workers under Complement. ([\matrix-org#12826](matrix-org#12826)) - Disable 'faster room join' Complement tests when testing against Synapse with workers. ([\matrix-org#12842](matrix-org#12842))
Fixes #12780 — see that issue for context.
Up for debate whether this is a fair solution or not.
At least it makes tests fail rather than hang (personally confirmed).