Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new sync method to support #617 using txid poller to fix #656 #811

Merged
merged 10 commits into from
Jul 13, 2023
Merged

Conversation

mxsasha
Copy link
Collaborator

@mxsasha mxsasha commented Jul 11, 2023

This is a sync mechanism specific for authoritative deployments for providing standby and query-only secondaries. Particularly when mixing the two.

With #617, there is a lot of potential data in PostgreSQL that will not be included in NRTM. NRTM was always a poor solution for standby instances, with nrtm_access_list_unfiltered as a hack on top, and potential loss of journal data and any suppressed objects. PostgreSQL replication fixes a lot of these issues, including even retaining serials. However, preloaded data will go out of date on standby instances. Redis replication is not a full fix, as some of the preloaded data is in memory in worker processes.

PostgreSQL does support triggers on a replica as noted in #656, but not when using WAL streaming, which is the only reasonable option here, as logical streaming breaks sequences and seems to have issues with upserts. The idea of monitoring for records where updated is higher than the last check is insufficient, because it does not catch row deletions or certain suppression state changes.

Best option: select timestamp from pg_last_committed_xact(). This does not allow us to filter for object types. However, considering route(6) and soon as-set are preloaded, many transactions will require pre-load store updates. There will therefore be additional overhead, but at the benefit of always being sure preloaded data gets updated. Caveat: the timestamp is not database-specific, so does not work well when running multiple databases. But these seems like an acceptable cost for people who need to run hot standby servers and is a fairly solid fix for the long standing difficulties in these setups.

The suppression status will be replicated as well, i.e. the !f queries will also work and suppressed objects are not lost during a switchover. However, the local config state shown in !J may not be consistent. This needs to be reflected in the docs. Also, during switchover, which requires a restart anyways, PGP keys have to be reimported into the local keychain. Hard and not worthwhile to do while running as standby.

  • Check if suppression config can be enabled even though it has no direct effect on the standby.
  • Add poller process
  • New docs
  • Deprecate unfiltered NRTM
  • track_commit_timestamp must be enabled
  • How to handle if track_commit_timestamp is not set
  • Document and verify new standby setting
  • Test in real setup
  • hot_standby_feedback
  • Anything in the query resolver that checks suppression status?
  • doc: IP access lists external failover

@mxsasha mxsasha added this to the IRRdv4 phase 3 milestone Jul 11, 2023
@mxsasha mxsasha self-assigned this Jul 11, 2023
@mxsasha mxsasha force-pushed the txpoll branch 12 times, most recently from 3fc3a93 to f89a743 Compare July 11, 2023 15:05
@mxsasha mxsasha marked this pull request as ready for review July 12, 2023 13:49
@mxsasha mxsasha merged commit ca8f412 into main Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant