Skip to content

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Jun 11, 2025

Issue

A GRANT SQL command allows a user to access a database. However, the pg_hba.conf file rules are not updated on time to reflect that.

We encounter failures related to this when attempting to sync the latest changes from the main branch (which contains the pg_hba.conf hardening implementation) to the 16/edge branch (which includes the predefined roles implementation) through #920.

Solution

Add a new check for ACL changes in the cluster topology observer to update the pg_hba.conf file (in the worst case) after 30 seconds.

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

dragomirp and others added 30 commits March 25, 2025 04:48
* Use latest stable lxd

* Test tweaks

* Test tweaks
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Poll all members in the cluster topology script

* Dual branch config

* Unit tests and bugfixes

* Add peers when starting the observer

* Retry sync up checks
* Add wal_keep_size config option

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Remove parameter addition

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Reset durability_wal_keep_size value to PG default

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
* Refactor headings for syntax best practice

* Update the Security section
Co-authored-by: GitHub Actions <41898282+github-actions[bot]@users.noreply.github.com>
* Bump boto

* Conditional checksum calculation
* Create tiobe_scan.yaml

* Remove push trigger
* Set series for ubuntu-advantage test and disable the landscape test

* Revert to LTS LXD
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Disable network cut tests on arm

* Back to LXD 5
* disable pgaudit during extensions changes

* Bump libs
* Lock file maintenance Python dependencies

* Fix linting

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Dragomir Penev <dragomir.penev@canonical.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Make username mandatory

* Second get password method

* Default in get-password
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: GitHub Actions <41898282+github-actions[bot]@users.noreply.github.com>
dragomirp and others added 13 commits June 6, 2025 18:38
* Fix auth username pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix another ocurrence of the pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 67.64706% with 33 lines in your changes missing coverage. Please review.

Project coverage is 70.13%. Comparing base (fed4393) to head (8b3a70b).
Report is 3 commits behind head on 16/edge.

Files with missing lines Patch % Lines
src/charm.py 37.03% 15 Missing and 2 partials ⚠️
src/cluster.py 11.11% 7 Missing and 1 partial ⚠️
scripts/cluster_topology_observer.py 88.46% 4 Missing and 2 partials ⚠️
src/relations/postgresql_provider.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           16/edge     #960      +/-   ##
===========================================
- Coverage    70.23%   70.13%   -0.11%     
===========================================
  Files           16       16              
  Lines         3723     3817      +94     
  Branches       541      554      +13     
===========================================
+ Hits          2615     2677      +62     
- Misses         974     1002      +28     
- Partials       134      138       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
connection.cursor() as cursor,
):
connection.close()
for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry until the observer updates the pg_hba.conf file.

database="charmed_dml_database",
)
with connection.cursor() as cursor:
connection.autocommit = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing autocommit, which previously may have led to data not being available in the next query.

primary = await get_primary(ops_test, f"{DATABASE_APP_NAME}/0")
host = get_unit_address(ops_test, primary)
connection = None
for attempt in Retrying(stop=stop_after_delay(60), wait=wait_fixed(3)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry until the observer updates the pg_hba.conf file.

Comment on lines +445 to +447
self.update_config()
logger.debug("databases changed")
self._on_authorisation_rules_change(None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the pg_hba.conf in this unit and tell the other units to do the same.

conf_file_path,
"query",
"-c",
"SELECT datname,datacl FROM pg_database;",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightweight query to check whether the ACLs of the databases have been changed (i.e. a GRANT SQL command was issued).

Comment on lines +73 to +77
conf_file_path = "/var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml"
with open(conf_file_path) as conf_file:
conf_file_contents = yaml.safe_load(conf_file)
username = conf_file_contents["postgresql"]["authentication"]["superuser"]["username"]
password = conf_file_contents["postgresql"]["authentication"]["superuser"]["password"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get credentials from the Patroni config file (we'll need to handle the security of this file in a future ticket).

@marceloneppel marceloneppel marked this pull request as ready for review June 12, 2025 11:36
dragomirp and others added 2 commits June 12, 2025 18:35
* Handle PostgreSQLListUsersError

* Try to trigger the pg_hba update on db requested

* Try to hold db requested until pg_hba is up to date

* Increase timeouts

* Scale in parallel

* Fix param passing

* Increase timeout

* Try to scale without ffwd

* Try not to defer rel changed

* Remove extra hook

* Check if patroni is running before calling the health endpoint

* Revert timeout

* Pass the timeout param
…efined-roles-compatibility

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@dragomirp
Copy link
Contributor

We can most probably change the base branch and merge to 16/edge directly, since CI is green and it should have all the changes necessary.

@marceloneppel marceloneppel changed the base branch from sync14to16 to 16/edge June 13, 2025 11:31
@marceloneppel marceloneppel added the enhancement New feature, UI change, or workload upgrade label Jun 13, 2025
@marceloneppel marceloneppel merged commit 0523e6c into 16/edge Jun 13, 2025
435 of 449 checks passed
@marceloneppel marceloneppel deleted the sync14to16-predefined-roles-compatibility branch June 13, 2025 11:32
taurus-forever added a commit to taurus-forever/postgresql-operator that referenced this pull request Jun 25, 2025
* [MISC] Use latest/stable lxd (canonical#804)

* Use latest stable lxd

* Test tweaks

* Test tweaks

* Update canonical/data-platform-workflows action to v31.0.1 (canonical#805)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [DPE-6874] Poll all members in the cluster topology script (canonical#810)

* Poll all members in the cluster topology script

* Dual branch config

* Unit tests and bugfixes

* Add peers when starting the observer

* Retry sync up checks

* [DPE-6572] Add wal_keep_size config option (canonical#799)

* Add wal_keep_size config option

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Remove parameter addition

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Reset durability_wal_keep_size value to PG default

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Create pull_request_template.md (canonical#814)

* Create SECURITY.md (canonical#822)

* Update README file's security section (canonical#827)

* Refactor headings for syntax best practice

* Update the Security section

* Sync docs from Discourse (canonical#796)

Co-authored-by: GitHub Actions <41898282+github-actions[bot]@users.noreply.github.com>

* [MISC] Conditional checksum calculation (canonical#812)

* Bump boto

* Conditional checksum calculation

* [DPE-6218] Static code analysis (canonical#828)

* Create tiobe_scan.yaml

* Remove push trigger

* [MISC] Disable landscape subordinate test lxd (canonical#831)

* Set series for ubuntu-advantage test and disable the landscape test

* Revert to LTS LXD

* Update charmcraft.yaml build tools (canonical#815)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [MISC] Update snapped PostgreSQL (canonical#832)

* [DPE-6345] LDAP I: Create access groups (canonical#823)

* [DPE-6345] LDAP II: Include charm libs (canonical#824)

* [DPE-6345] LDAP III: Define config and handlers (canonical#825)

* [DPE-6345] LDAP IV: Define snap service (canonical#838)

* [DPE-6345] LDAP V: Define mapping option (canonical#849)

* [MISC] Disable network cut tests on arm (canonical#844)

* Disable network cut tests on arm

* Back to LXD 5

* [DPE-6815] disable pgaudit during extensions changes (canonical#842)

* disable pgaudit during extensions changes

* Bump libs

* Lock file maintenance Python dependencies (main) (canonical#816)

* Lock file maintenance Python dependencies

* Fix linting

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Dragomir Penev <dragomir.penev@canonical.com>

* Update dependency uv to v0.6.16 (canonical#847)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [DPE-6664] Make username mandatory in set-password (canonical#846)

* Make username mandatory

* Second get password method

* Default in get-password

* Add conditional expose directive (canonical#853)

* Lock file maintenance Python dependencies (canonical#854)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Mandatory scope for promote action (canonical#856)

* Update charmcraft.yaml build tools (canonical#860)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Lock file maintenance Python dependencies (canonical#861)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Sync docs from Discourse (canonical#850)

Co-authored-by: GitHub Actions <41898282+github-actions[bot]@users.noreply.github.com>

* [MISC] Extend relation-user listing syntax (canonical#868)

* Move _update_member_ip call to correctly remove Raft cluster member when network is cut

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix coverage

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Sync libs (canonical#884)

* Update refresh tests to modify charm to ensure refresh off edge or stable

* Fix lint warnings

* Store temporary charms in /tmp for upgrade_from_stable tests

* Use force-refresh-start instead of forcing refresh by updating versions

* Remove runner password (canonical#913)

* [DPE-6898] User->databases pg_hba rules (canonical#885)

* Restrict each user to their allowed databases

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix unit tests

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix sync users on replicas

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix unit test

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Add default landscape user permission

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Increase sleep time in pg_hba test, fix user->database mapping for upgrade from stable and skip event trigger function code when not a superuser

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Improve users list check

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix raft reinitialisation in tests

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Decrease the amount of API calls by one

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Check users list directly

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Tweak test fast interval

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Improvements to avoid replica restart while syncing from primary

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix linting

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Handle same snap revision situation in upgrade tests

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Merge

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Improvement to avoid replica restart while syncing from primary

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Run stop-continuous-writes action only once

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Increase sleep time in Juju spaces test

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Update charmcraft.yaml build tools (canonical#871)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [MISC] Remove JujuVersion warning in 14/edge (canonical#933)

* Remove JujuVersion warning

* Update libs

* Refactor v14 documentation for Sphinx (canonical#919)

* initial starter pack transfer
* update conf.py
* import processed discourse pages
* fix internal references and some broken URLs
* fix some style errors
* organize navigation
* remove h1 heading anchors
* add images to repository
* edit home page
* update .readthedocs.yaml
* remove docs/requirements.txt from .gitignore scope
* fix incorrect paths
* remove shell syntax from code blocks
* remove juju 2 banners
* fix dropdown formatting
* fix and polish admonitions, collapsible, and misc formatting
* remove v16 docs
* remove reference to nonexistant page
* Join all tutorial pages
* rename how-to-guides to how-to
* polish cloud deployment guides and rename leftover how-to-guide references
* polish and sync how-to guides with k8s
* remove discourse sync workflow
* specify channel on all deploy commands
* misc polishing, add version to side nav
* add pg 16 admonitions
* ignore docs folder in charm workflows
* sync misc. pages with k8s
* Minor README update with new documentation link
* add new section to CLI-helpers reference
* pin commit for v16 tag on markdown lint workflow for added security
* Update README.md
* remove sphinx python dependency check workflow
* Update index.md: add link to roles.md (canonical#928)

---------

Co-authored-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Co-authored-by: Alex Lutay <1928266+taurus-forever@users.noreply.github.com>

* [DPE-7511] Fix the auth username pattern (canonical#941)

* Fix auth username pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix another ocurrence of the pattern

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Add temp tablespace create

* Temp tblspace outside of transaction

* Reset role in test_pg_hba setup

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Detect when databases and their ACLs change

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix predefined catalog roles test

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Fix unit test

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Add unit tests

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Add missing autocommit

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* Remove _hash suffix from variables names

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>

* [MISC] Fix timeouts in 14 to 16 merge (canonical#959)

* Handle PostgreSQLListUsersError

* Try to trigger the pg_hba update on db requested

* Try to hold db requested until pg_hba is up to date

* Increase timeouts

* Scale in parallel

* Fix param passing

* Increase timeout

* Try to scale without ffwd

* Try not to defer rel changed

* Remove extra hook

* Check if patroni is running before calling the health endpoint

* Revert timeout

* Pass the timeout param

---------

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Co-authored-by: Dragomir Penev <6687393+dragomirp@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Andreia <andreia.velasco@canonical.com>
Co-authored-by: Vladimir Izmalkov <48120135+izmalk@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sinclert Pérez <sinclert.perez@canonical.com>
Co-authored-by: Dragomir Penev <dragomir.penev@canonical.com>
Co-authored-by: swetha1654 <swetha.swaminathan@canonical.com>
Co-authored-by: Shayan Patel <shayan.patel@canonical.com>
Co-authored-by: Alex Lutay <1928266+taurus-forever@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, UI change, or workload upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants