Skip to content

Conversation

dragomirp
Copy link
Contributor

@dragomirp dragomirp commented Jun 18, 2025

  • Generate a hash of relation user/databases mapping and use it to add in advance pg_hba rules in patroni's configuration
  • Tweak test idleness
  • Remove the trigger updating pg_hba
  • Port pg_database observer from VM

Pgbouncer WIP PR accommodating the changes: canonical/pgbouncer-k8s-operator#602

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

@dragomirp dragomirp added the not bug or enhancement PR is not 'bug' or 'enhancement'. For release notes label Jun 18, 2025
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

❌ Patch coverage is 60.29412% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.46%. Comparing base (ac55896) to head (d59b9ff).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/charm.py 40.90% 11 Missing and 2 partials ⚠️
scripts/authorisation_rules_observer.py 80.00% 6 Missing ⚠️
src/authorisation_rules_observer.py 40.00% 6 Missing ⚠️
src/relations/postgresql_provider.py 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1007      +/-   ##
==========================================
+ Coverage   73.15%   73.46%   +0.31%     
==========================================
  Files          15       15              
  Lines        3866     3904      +38     
  Branches      566      572       +6     
==========================================
+ Hits         2828     2868      +40     
+ Misses        827      823       -4     
- Partials      211      213       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dragomirp dragomirp changed the title [MISC] More aggressive idle checks [MISC] Sync up pg_hba changes Jun 25, 2025
@dragomirp dragomirp changed the title [MISC] Sync up pg_hba changes [DPE-7594] Sync up pg_hba changes Jun 25, 2025
Comment on lines +92 to +103
for key in self.charm._peers.data:
# We skip the leader so we don't have to wait on the defer
if (
key != self.charm.app
and key != self.charm.unit
and self.charm._peers.data[key].get("user_hash", "")
!= self.charm.generate_user_hash
):
logger.debug("Not all units have synced configuration")
event.defer()
return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for all units to update Patroni's hba definition.

event.defer()
return

self.charm.update_config()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will update the hash on the leader and trigger relation change event on any follower units.

)

# Copy relations users directly instead of waiting for them to be created
for relation in self.model.relations[self.postgresql_client_relation.relation_name]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also copy the legacy users as well here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal, but I don't know if apps connecting through the legacy relation are using replicas. I think we can keep only the new relation users for now, so we keep the same code for 16/edge.

):
user = f"relation_id_{relation.id}"
user_db_pairs[user] = database
return shake_128(str(user_db_pairs).encode()).hexdigest(16)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dict key order should be deterministic for units on the same platform.

return user_database_map

@property
def generate_user_hash(self) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to hash the relations_user_databases_map property, but that never idled in the new relation integration test.


await ops_test.model.wait_for_idle(
apps=[DATA_INTEGRATOR_APP_NAME, DATABASE_APP_NAME], status="active"
apps=[DATA_INTEGRATOR_APP_NAME, DATABASE_APP_NAME], status="active", timeout=1000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same timeout as in other tests.

@dragomirp dragomirp requested review from a team, marceloneppel and taurus-forever and removed request for a team July 22, 2025 11:10
@dragomirp dragomirp marked this pull request as ready for review July 22, 2025 20:02
Copy link
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work, @dragomirp!

)

# Copy relations users directly instead of waiting for them to be created
for relation in self.model.relations[self.postgresql_client_relation.relation_name]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal, but I don't know if apps connecting through the legacy relation are using replicas. I think we can keep only the new relation users for now, so we keep the same code for 16/edge.

@dragomirp dragomirp changed the title [DPE-7594] Sync up pg_hba changes [DPE-7594] Sync up pg_hba changes and remove trigger Jul 28, 2025
@dragomirp dragomirp force-pushed the fix-self-healing branch 5 times, most recently from 6887fa7 to 4702891 Compare July 28, 2025 23:47
@dragomirp dragomirp merged commit e8aa568 into main Jul 29, 2025
469 of 494 checks passed
@dragomirp dragomirp deleted the fix-self-healing branch July 29, 2025 16:40
dragomirp added a commit that referenced this pull request Jul 30, 2025
* Sync docs from Discourse (#864)

Co-authored-by: GitHub Actions <41898282+github-actions[bot]@users.noreply.github.com>

* Refactor v14 documentation for Sphinx (#981)

* add starter pack and sync tutorial with VM
* update deployment guides
* rename how-to-guides to how-to
* sync how-to guides with vm
* sync references with vm
* sync explanation pages with vm
* add .readthedocs.yaml and reduce .gitignore scope for requirements.txt file
* fix some formatting issues
* sync backup guides with vm
* fix misc. build errors and sync tutorial with vm
* add doc ci checks
* remove discourse sync workflow
* polish tutorial and deploy guide
* specify channel on all deploy commands
* sync misc. pages with vm
* minor README update with new documentation link
* ignore docs folder in development workflows
* remove sphinx python dependency check workflow
* update home page

* [DPE-7510] Fix the auth username pattern (#987)

* Fix auth username pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix unit test

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix another ocurrence of the pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Update ghcr.io/canonical/charmed-postgresql Docker tag to v14.18 (#983)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Add redirects for all charmhub pages (14) (#994)

* chore: rename interfaces-endpoints for consistency with VM

* update references to interfaces-and-endpoints

* add rediraffe sphinx extension and redirect mapping list

* fix: add forward slash to redirect paths (#998)

* Fix broken URLs and spelling errors (#993)

* add starter pack and sync tutorial with VM

* update deployment guides

* rename how-to-guides to how-to

* sync how-to guides with vm

* sync references with vm

* sync explanation pages with vm

* add .readthedocs.yaml and reduce .gitignore scope for requirements.txt file

* fix some formatting issues

* sync backup guides with vm

* fix misc. build errors and sync tutorial with vm

* add doc ci checks

* remove discourse sync workflow

* polish tutorial and deploy guide

* specify channel on all deploy commands

* sync misc. pages with vm

* minor README update with new documentation link

* ignore docs folder in development workflows

* remove sphinx python dependency check workflow

* update home page

* fix some broken links

* fix broken links

* fix spelling errors in docs

* fix spelling errors in other markdown files

* small spelling fix

* Renovate team (#999)

* [MISC] Hold database created hook for pg_hba changes (#1001)

* Hold database created hook for pg_hba changes

* Restore the old blocking hook

* 16/edge lib changes

* Set to expected patch version

* fix: internal reference typo (#1006)

* Lock file maintenance Python dependencies (#962)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* DPE-6662 Fix pgbackrest logs names on logs rotation (#996)

The pgBackRest activity (backup creation, stanza initialisation, etc) could take minutes,
as a result every minute logs rotation could move the current log A.log to A.log_$date.log
and will be moved further as logrorate rule operates all *.log files in the folder:

> /var/snap/charmed-postgresql/common/var/log/pgbackrest/*.log { ...

It results in:

-rw------- 1 postgres postgres     0 Feb 18 09:25 all-server.log
-rw------- 1 postgres postgres     0 Feb 18 09:19 all-server.log-20250218_09:18.log
-rw------- 1 postgres postgres     0 Feb 18 09:20 all-server.log-20250218_09:18.log-20250218_09:19.log
-rw------- 1 postgres postgres     0 Feb 18 09:21 all-server.log-20250218_09:18.log-20250218_09:19.log-20250218_09:20.log
-rw------- 1 postgres postgres     0 Feb 18 09:22 all-server.log-20250218_09:18.log-20250218_09:19.log-20250218_09:20.log-20250218_09:21.log
-rw------- 1 postgres postgres     0 Feb 18 09:23 all-server.log-20250218_09:18.log-20250218_09:19.log-20250218_09:20.log-20250218_09:21.log-20250218_09:22.log
-rw------- 1 postgres postgres  1793 Feb 18 09:24 all-server.log-20250218_09:18.log-20250218_09:19.log-20250218_09:20.log-20250218_09:21.log-20250218_09:22.log-20250218_09:23.log

The poposed fix:
* move the log file to the new name without .log suffix.
* use datetime format matching MySQL charms

Example:

-rw------- 1 postgres postgres    12 Feb 18 09:28 all-server.log
-rw------- 1 postgres postgres   322 Feb 18 09:29 all-server.log-20250218_0918
-rw------- 1 postgres postgres  7344 Feb 18 09:30 all-server.log-20250218_0918

* Update canonical/data-platform-workflows action to v32 (main) (#961)

* Update canonical/data-platform-workflows action to v32

* Switch release flow

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Dragomir Penev <dragomir.penev@canonical.com>

* update links to ops tutorial and charm (#1015)

* [MISC] Update libs and remove warning (#1014)

* Update libs

* Workflow tweaks

* Remove from_environ warning

* Lock file maintenance Python dependencies (#1012)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Delete runner password (#1023)

* remove markdown-linter action (#1028)

* [DPE-7594] Sync up pg_hba changes and remove trigger (#1007)

* More aggressive idle checks

* Explicit idle

* Idle period when relating to the test app

* Remove second start

* Remove log warning

* Hold create db hook for longer

* Bump the pg_hba checker timeout

* Don't update config

* Bump timeout

* Try to just append to pg_hba

* Sync hba changes before creating db resources

* Force regenerate hash and config on leader

* Use current host to check hba

* Update libs

* Compare to local hash

* Cla check for 16/edge

* Don't defer peer change before init

* Add back app check

* Revert back to just updating peer data

* Only sync hba once initially set

* Bump timeout

* Don't filter appends to pg_hba

* Append the rel users directly to the user map

* Add idle timeout

* Remove trigger

* Sleep longer

* Set extra user roles

* Always update hash

* Bump sleep period

* Revert the trigger

* Move generate_user_hash to charm

* Conditional hash update

* Try to sort keys

* Revert to relation user hash

* Try to reduce the amount of ifs

* Remove trigger

* Blocked test app

* Ignore blocked

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andreia <andreia.velasco@canonical.com>
Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Lutay <1928266+taurus-forever@users.noreply.github.com>
Co-authored-by: Dave Wilding <tech@dpw.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Libraries: Out of sync not bug or enhancement PR is not 'bug' or 'enhancement'. For release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants