Skip to content

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Sep 12, 2022

Issue

  • Jira issue: DPE-513
  • To enable backups later in the PostgreSQL charm, a tool responsible for backups must first be installed.
  • PostgreSQL needs some initial settings for backups and rewind operations (synchronisation of the data between a primary and the replicas after a replica is promoted to primary).
  • We also need to ensure that a rewind operations (the ones using the pg_rewind tool) are done using connections encrypted by TLS.

Solution

  • Install pgBackRest.

  • Add the following settings for backup operations:

    • achieve_mode=on

    • archive_command=/bin/true

    • wal_level = logical

  • Add the following settings for rewind operations:

    • remove_data_directory_on_rewind_failure = true

    • remove_data_directory_on_diverged_timelines = true

Add a user for pg_rewind and test that it uses TLS when connecting to other PostgreSQL instance.

Context

  • This PR is almost a copy of Add initial backup and rewind settings postgresql-k8s-operator#45.

  • The main differences are:

    • A call to reinitialise the replica was added to src/charm.py (because at some points in some tests the replica became too far behind the timeline of the primary and this is the recommended way to solve that is to reinitialise the replica data directory).

    • On src/cluster.py a call to the Patroni API to retrieve the lag information. Also, timeouts were added to the API calls to avoid long requests that make the cluster slow, specially in the freeze db process test (that was failing sometimes).

Testing

  • Existing unit tests on tests/unit/test_patroni.py were updated to match the new settings.
  • The extra backup and rewind settings are now added to the integration test that checks the current instance configuration.
  • An additional check was added to ensure that pg_rewind is using TLS.

Release Notes

  • Install pgBackRest.
  • Add initial settings for backup and rewind operations.
  • Add user for pg_rewind.
  • Add test to ensure pg_rewind is using TLS.

@marceloneppel marceloneppel changed the title Install pgbackrest Install pgBackRest Sep 13, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2022

Codecov Report

Merging #32 (5adae46) into main (9014cef) will increase coverage by 0.83%.
The diff coverage is 87.87%.

@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
+ Coverage   65.35%   66.19%   +0.83%     
==========================================
  Files           6        6              
  Lines         814      840      +26     
  Branches      115      121       +6     
==========================================
+ Hits          532      556      +24     
- Misses        258      260       +2     
  Partials       24       24              
Impacted Files Coverage Δ
src/charm.py 57.55% <71.42%> (+0.12%) ⬆️
src/cluster.py 71.08% <90.47%> (+3.08%) ⬆️
src/constants.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@marceloneppel marceloneppel changed the title Install pgBackRest Install pgBackRest and add initial backup settings Jan 13, 2023
@marceloneppel marceloneppel marked this pull request as ready for review January 13, 2023 01:35
@marceloneppel marceloneppel changed the title Install pgBackRest and add initial backup settings Install pgBackRest and add initial backup and rewind settings Jan 13, 2023
Copy link
Contributor

@WRFitch WRFitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, but looks great!

src/cluster.py Outdated
for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3)):
with attempt:
cluster_status = requests.get(
f"{self._patroni_url}/cluster",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these URL magic strings are similar - can you extract them into variables please?

Copy link
Member Author

@marceloneppel marceloneppel Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a constant on 5adae46.

apps=[application_name],
status="active",
timeout=1000,
timeout=2000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really relevant to this ticket, but should adding a relation really take half an hour? It might be worth looking into why this takes so much time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I think this change was made just for a local test and I forgot to remove it. Reverted on 0bb1414.

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! LGTM in general.
Q: should we really commit "archive_command: /bin/true" in Patroni template? Why not pgbackrest?

I would ask Mykola to review this as I am still new to postresql charm and he is a main feature requester.

@taurus-forever taurus-forever requested a review from delgod January 13, 2023 15:44
@delgod
Copy link
Member

delgod commented Jan 16, 2023

Cool! LGTM in general. Q: should we really commit "archive_command: /bin/true" in Patroni template? Why not pgbackrest?

it should be switched to pgBackRest only after initialization of the pgBackRest stanza (in other words after storage initialization).

@marceloneppel marceloneppel merged commit 29b8098 into main Jan 18, 2023
@marceloneppel marceloneppel deleted the pgbackrest-install branch January 18, 2023 11:17
BON4 pushed a commit to BON4/postgresql-operator that referenced this pull request Apr 23, 2024
…cal#32)

* Add pgbackrest

* Add initial backup settings

* Add additional backup settings

* Remove settings

* Rename user

* Add test for TLS being used on pg_rewind connections

* Remove table creation

* Readd write to the database

* Change bootstrap contraints

* Change the way the service is stopped

* Add one more call to service stop

* Change the way the service is stopped

* Change systemd unit

* Increase test timeout

* Remove test code

* Read code

* Readd code

* Change WAL trigger mechanism

* Fix check

* Improve code

* Remove instance promotion

* Readd instance promotion

* Change test retry logic

* Remove debug calls

* Add replica reinitialization

* Change checks order

* Add reinitialize call

* Improve reinitialize call

* Remove unused code

* Pin OS on release workflow

* Change whitelist_externals to allowlist_externals

* Change whitelist_externals to allowlist_externals

* Add API request timeout

* Add unit tests

* Remove log

* Revert timeout

* Extract endpoint from URL to constant
github-actions bot added a commit to canonical/test-runners-2-github-x64-postgresql-operator that referenced this pull request May 20, 2024
github-actions bot added a commit to canonical/test-runners-2-is-x64-postgresql-operator that referenced this pull request May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants