-
Notifications
You must be signed in to change notification settings - Fork 27
Install pgBackRest and add initial backup and rewind settings #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #32 +/- ##
==========================================
+ Coverage 65.35% 66.19% +0.83%
==========================================
Files 6 6
Lines 814 840 +26
Branches 115 121 +6
==========================================
+ Hits 532 556 +24
- Misses 258 260 +2
Partials 24 24
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, but looks great!
src/cluster.py
Outdated
for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3)): | ||
with attempt: | ||
cluster_status = requests.get( | ||
f"{self._patroni_url}/cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these URL magic strings are similar - can you extract them into variables please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a constant on 5adae46.
tests/integration/helpers.py
Outdated
apps=[application_name], | ||
status="active", | ||
timeout=1000, | ||
timeout=2000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really relevant to this ticket, but should adding a relation really take half an hour? It might be worth looking into why this takes so much time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I think this change was made just for a local test and I forgot to remove it. Reverted on 0bb1414.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! LGTM in general.
Q: should we really commit "archive_command: /bin/true" in Patroni template? Why not pgbackrest?
I would ask Mykola to review this as I am still new to postresql charm and he is a main feature requester.
it should be switched to pgBackRest only after initialization of the pgBackRest stanza (in other words after storage initialization). |
…cal#32) * Add pgbackrest * Add initial backup settings * Add additional backup settings * Remove settings * Rename user * Add test for TLS being used on pg_rewind connections * Remove table creation * Readd write to the database * Change bootstrap contraints * Change the way the service is stopped * Add one more call to service stop * Change the way the service is stopped * Change systemd unit * Increase test timeout * Remove test code * Read code * Readd code * Change WAL trigger mechanism * Fix check * Improve code * Remove instance promotion * Readd instance promotion * Change test retry logic * Remove debug calls * Add replica reinitialization * Change checks order * Add reinitialize call * Improve reinitialize call * Remove unused code * Pin OS on release workflow * Change whitelist_externals to allowlist_externals * Change whitelist_externals to allowlist_externals * Add API request timeout * Add unit tests * Remove log * Revert timeout * Extract endpoint from URL to constant
Issue
Solution
Install pgBackRest.
Add the following settings for backup operations:
achieve_mode=on
archive_command=/bin/true
wal_level = logical
Add the following settings for rewind operations:
remove_data_directory_on_rewind_failure = true
remove_data_directory_on_diverged_timelines = true
Add a user for pg_rewind and test that it uses TLS when connecting to other PostgreSQL instance.
Context
This PR is almost a copy of Add initial backup and rewind settings postgresql-k8s-operator#45.
The main differences are:
A call to reinitialise the replica was added to
src/charm.py
(because at some points in some tests the replica became too far behind the timeline of the primary and this is the recommended way to solve that is to reinitialise the replica data directory).On
src/cluster.py
a call to the Patroni API to retrieve the lag information. Also, timeouts were added to the API calls to avoid long requests that make the cluster slow, specially in the freeze db process test (that was failing sometimes).Testing
tests/unit/test_patroni.py
were updated to match the new settings.pg_rewind
is using TLS.Release Notes