Skip to content

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Apr 10, 2023

Issue

Jira ticket: DPE-1456
Sync mode is not enabled, which can cause data loss.

Solution

Enable sync mode in the Patroni configuration and set the synchronous node code to the cluster minority.

In src/charm.py:

  • Added a health check to avoid multiple primaries when freezing the PostgreSQL DB process (I could't figure out how to make it work with TLS, so I create a ticket to investigate that and file a bug if needed).
  • Added a check for all members ready before restarting PostgreSQL due to TLS settings change (without the check, some failures happen).

In templates/patroni.yml.j2:

  • Update Sync mode configuration in the Patroni configuration.
  • Also redirected the logs to a file in the logs directory to avoid losing logs (pebble logs are limited to 100kb).

In tests/integration/ha_tests/application-charm/*:

  • Updated the tester charm with some improvements from the VM charm to make it more stable.

In tests/integration/helpers.py:

  • Added a new service to be checked in the tests.
  • Fixed a helper function that was not correctly returning whether the primary changed (sometimes, while failing over, there was no primary, and the helper function returned that the primary changed to a new one).

In tests/integration/test_tls.py:

  • Fixed the test to force pg_rewind to run again after having the new role: sync_standby.
  • Changed the test code to read whether pg_rewind is using TLS from the log file instead of the pebble logs.

@marceloneppel marceloneppel marked this pull request as ready for review April 10, 2023 18:48
PATRONI_PROCESS = "/usr/local/bin/patroni"
APP_NAME = METADATA["name"]
PATRONI_PROCESS = "/usr/bin/patroni"
POSTGRESQL_PROCESS = "postgres"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are likely hitting the same pkill issues as the VM charm here, though we may want to deal with that in a separate PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll check it and update the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on https://github.com/canonical/postgresql-k8s-operator/pull/134/files/b79aa19b4a9f197c575d25345c780b47bb9f2d0d..c29e5cd37540872c7e19973598d8c77576d5388a. I used the same logic that we have in the VM charm (SIGINT). I'm going to open another PR to reenable the PG process freeze db test.

return resp.json()


async def change_master_start_timeout(ops_test: OpsTest, seconds: Optional[int]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch this to primary since we are now on patroni 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed on 84d810d, as the master_start_timeout is not being used. DPE-1407 is in the backlog to fix the wrong wording in the other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted it due to changes needed to address the above comment.

@delgod delgod merged commit 8b9d456 into main Apr 12, 2023
@delgod delgod deleted the dpe-1456-sync-mode branch April 12, 2023 06:35
BON4 pushed a commit to BON4/postgresql-k8s-operator that referenced this pull request May 20, 2024
https://warthogs.atlassian.net/browse/DPE-1456

---------

Co-authored-by: Dragomir Penev <dragomir.penev@canonical.com>
github-actions bot added a commit to canonical/test-runners-2-is-x64-postgresql-k8s-operator that referenced this pull request Aug 4, 2024
github-actions bot added a commit to canonical/test-runners-2-is-arm64-postgresql-k8s-operator that referenced this pull request Aug 5, 2024
github-actions bot added a commit to canonical/test-runners-2-github-x64-postgresql-k8s-operator that referenced this pull request Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants