Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix background updates failing to add unique indexes on receipts #14453

Merged
merged 7 commits into from
Nov 16, 2022

Conversation

squahtx
Copy link
Contributor

@squahtx squahtx commented Nov 15, 2022

As part of the database migration to support threaded receipts, there is
a possible window in between
73/08thread_receipts_non_null.sql.postgres removing the original
unique constraints on receipts_linearized and receipts_graph and the
reeipts_linearized_unique_index and receipts_graph_unique_index
background updates from 72/08thread_receipts.sql completing where
the unique constraints on receipts_linearized and receipts_graph are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.

Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.

Finally, clean up any duplicate receipts that may have crept in before
trying to create the receipts_graph_unique_index and
receipts_linearized_unique_index unique indexes.

Signed-off-by: Sean Quah seanq@matrix.org


Resolves #14406, but not the follow on to remove lock=False.

The first commit fixes the race that creates the duplicates.
The second commit changes the background update to clean up the duplicates.

As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts.

Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.

Signed-off-by: Sean Quah <seanq@matrix.org>
@squahtx squahtx requested a review from a team as a code owner November 15, 2022 12:48
Comment on lines 947 to 960
# Then remove all duplicate receipts.
# We could be clever and try to keep the latest receipt out of every set of
# duplicates, but it's far simpler to remove them all.
for room_id, receipt_type, user_id in duplicate_keys:
sql = f"""
DELETE FROM {table}
WHERE
room_id = ? AND
receipt_type = ? AND
user_id = ? AND
thread_id IS NULL
"""
txn.execute(sql, (room_id, receipt_type, user_id))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth trying to preserve these read receipts?

For receipts_linearized, we can choose to keep the receipt with the highest stream_id.
For receipts_graph, we have to dig into data, which is a user-provided json field, and extract out the ts field. This is a lot more painful, as we have to handle invalid data and ties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that they're completely identical but with thread_id being NULL (where NULL != NULL)? I guess that's hard to check though.

Honestly the receipts_graph isn't read from right now so I think it is probably safe to just delete them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although deleting from receipts_linearized will have a user-visible impact, so I'm unsure if that's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of rejigging to preserve the receipt with the highest stream_id in receipts_linearized. Users shouldn't notice anything now.

@squahtx squahtx changed the title Fix failing background updates to add unique indexes on receipts Fix background updates failing to add unique indexes on receipts Nov 15, 2022
@squahtx squahtx force-pushed the squah/fix_receipts_constraint_background_update branch from 7d847ac to dc8bdbd Compare November 15, 2022 12:57
Sean Quah added 2 commits November 15, 2022 13:41
Before creating the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes, we have to clean up
any duplicate receipts that may have crept in due to
#14406.

Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
@squahtx squahtx force-pushed the squah/fix_receipts_constraint_background_update branch from dc8bdbd to dfe9946 Compare November 15, 2022 13:41
@@ -113,24 +113,6 @@ def __init__(
prefilled_cache=receipts_stream_prefill,
)

self.db_pool.updates.register_background_index_update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move these to the non-worker store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to the *BackgroundUpdateStore. I thought that's where we usually put the background updates?
Is there a motivation for having these on the worker store that I've completely missed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no motivation besides it being consistent with other examples I saw. 🤷

@clokep clokep requested a review from a team November 15, 2022 14:59
@clokep
Copy link
Member

clokep commented Nov 15, 2022

(I think someone non-me should review this since I made the original issue causing this. Overall I think the fix looks reasonable though.)

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sane to me

Comment on lines -705 to -707
# receipts_linearized has a unique constraint on
# (user_id, room_id, receipt_type), so no need to lock
lock=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double check: is the table correctly deemed safe to upsert into when the relevant background updates have run? (Wasn't sure how the second commit would affect this, if at all)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once the unique index has been added by the background update, we will be able to rely on native upserts again (and the value of lock won't matter).

@squahtx
Copy link
Contributor Author

squahtx commented Nov 15, 2022

This is failing trial on postgres with psycopg2.errors.ActiveSqlTransaction: CREATE INDEX CONCURRENTLY cannot run inside a transaction block

@squahtx
Copy link
Contributor Author

squahtx commented Nov 15, 2022

I've updated and refactored things a bit to

  • fix the background update failing on postgres, because autocommit needs to be turned on when creating concurrent indexes.
  • and keep the receipt with the highest stream_id in receipt_linearized

@squahtx squahtx requested review from a team and erikjohnston November 15, 2022 21:25
@squahtx squahtx merged commit 8822770 into develop Nov 16, 2022
@squahtx squahtx deleted the squah/fix_receipts_constraint_background_update branch November 16, 2022 15:01
bradtgmurray added a commit to beeper/synapse-legacy-fork that referenced this pull request Nov 22, 2022
Synapse 1.72.0 (2022-11-22)
===========================

Please note that Synapse now only supports PostgreSQL 11+, because PostgreSQL 10 has reached end-of-life, c.f. our [Deprecation Policy](https://github.com/matrix-org/synapse/blob/develop/docs/deprecation_policy.md).

Bugfixes
--------

- Update forgotten references to legacy metrics in the included Grafana dashboard. ([\matrix-org#14477](matrix-org#14477))

Synapse 1.72.0rc1 (2022-11-16)
==============================

Features
--------

- Add experimental support for [MSC3912](matrix-org/matrix-spec-proposals#3912): Relation-based redactions. ([\matrix-org#14260](matrix-org#14260))
- Build Debian packages for Ubuntu 22.10 (Kinetic Kudu). ([\matrix-org#14396](matrix-org#14396))
- Add an [Admin API](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) endpoint for user lookup based on third-party ID (3PID). Contributed by @ashfame. ([\matrix-org#14405](matrix-org#14405))
- Faster joins: include heroes' membership events in the partial join response, for rooms without a name or canonical alias. ([\matrix-org#14442](matrix-org#14442))

Bugfixes
--------

- Faster joins: do not block creation of or queries for room aliases during the resync. ([\matrix-org#14292](matrix-org#14292))
- Fix a bug introduced in Synapse 1.64.0rc1 which could cause log spam when fetching events from other homeservers. ([\matrix-org#14347](matrix-org#14347))
- Fix a bug introduced in 1.66 which would not send certain pushrules to clients. Contributed by Nico. ([\matrix-org#14356](matrix-org#14356))
- Fix a bug introduced in v1.71.0rc1 where the power level event was incorrectly created during initial room creation. ([\matrix-org#14361](matrix-org#14361))
- Fix the refresh token endpoint to be under /r0 and /v3 instead of /v1. Contributed by Tulir @ Beeper. ([\matrix-org#14364](matrix-org#14364))
- Fix a long-standing bug where Synapse would raise an error when encountering an unrecognised field in a `/sync` filter, instead of ignoring it for forward compatibility. ([\matrix-org#14369](matrix-org#14369))
- Fix a background database update, introduced in Synapse 1.64.0, which could cause poor database performance. ([\matrix-org#14374](matrix-org#14374))
- Fix PostgreSQL sometimes using table scans for queries against the `event_search` table, taking a long time and a large amount of IO. ([\matrix-org#14409](matrix-org#14409))
- Fix rendering of some HTML templates (including emails). Introduced in v1.71.0. ([\matrix-org#14448](matrix-org#14448))
- Fix a bug introduced in Synapse 1.70.0 where the background updates to add non-thread unique indexes on receipts could fail when upgrading from 1.67.0 or earlier. ([\matrix-org#14453](matrix-org#14453))

Updates to the Docker image
---------------------------

- Add all Stream Writer worker types to `configure_workers_and_start.py`. ([\matrix-org#14197](matrix-org#14197))
- Remove references to legacy worker types in the multi-worker Dockerfile. ([\matrix-org#14294](matrix-org#14294))

Improved Documentation
----------------------

- Upload documentation PRs to Netlify. ([\matrix-org#12947](matrix-org#12947), [\matrix-org#14370](matrix-org#14370))
- Add addtional TURN server configuration example based on [eturnal](https://github.com/processone/eturnal) and adjust general TURN server doc structure. ([\matrix-org#14293](matrix-org#14293))
- Add example on how to load balance /sync requests. Contributed by [aceArt](https://aceart.de). ([\matrix-org#14297](matrix-org#14297))
- Edit sample Nginx reverse proxy configuration to use HTTP/1.1. Contributed by Brad Jones. ([\matrix-org#14414](matrix-org#14414))

Deprecations and Removals
-------------------------

- Remove support for PostgreSQL 10. ([\matrix-org#14392](matrix-org#14392), [\matrix-org#14397](matrix-org#14397))

Internal Changes
----------------

- Run unit tests against Python 3.11. ([\matrix-org#13812](matrix-org#13812))
- Add TLS support for generic worker endpoints. ([\matrix-org#14128](matrix-org#14128), [\matrix-org#14455](matrix-org#14455))
- Switch to a maintained action for installing Rust in CI. ([\matrix-org#14313](matrix-org#14313))
- Add override ability to `complement.sh` command line script to request certain types of workers. ([\matrix-org#14324](matrix-org#14324))
- Enabling testing of [MSC3874](matrix-org/matrix-spec-proposals#3874) (filtering of `/messages` by relation type) in complement. ([\matrix-org#14339](matrix-org#14339))
- Concisely log a failure to resolve state due to missing `prev_events`. ([\matrix-org#14346](matrix-org#14346))
- Use a maintained Github action to install Rust. ([\matrix-org#14351](matrix-org#14351))
- Cleanup old worker datastore classes. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#14375](matrix-org#14375))
- Test against PostgreSQL 15 in CI. ([\matrix-org#14394](matrix-org#14394))
- Remove unreachable code. ([\matrix-org#14410](matrix-org#14410))
- Clean-up event persistence code. ([\matrix-org#14411](matrix-org#14411))
- Update docstring to clarify that `get_partial_state_events_batch` does not just give you completely arbitrary partial-state events. ([\matrix-org#14417](matrix-org#14417))
- Fix mypy errors introduced by bumping the locked version of `attrs` and `gitpython`. ([\matrix-org#14433](matrix-org#14433))
- Make Dependabot only bump Rust deps in the lock file. ([\matrix-org#14434](matrix-org#14434))
- Fix an incorrect stub return type for `PushRuleEvaluator.run`. ([\matrix-org#14451](matrix-org#14451))
- Improve performance of `/context` in large rooms. ([\matrix-org#14461](matrix-org#14461))
H-Shay pushed a commit that referenced this pull request Dec 13, 2022
)

As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.

Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.

Finally, clean up any duplicate receipts that may have crept in before
trying to create the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes.

Signed-off-by: Sean Quah <seanq@matrix.org>
@clokep
Copy link
Member

clokep commented Jan 25, 2023

See #14915 for a follow-up to this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants