-
Notifications
You must be signed in to change notification settings - Fork 27
[DPE-2904] Reenable backup tests and revert to reusable workflow #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2f9fbf8 to
5a58dae
Compare
Contributor
Author
|
If I manage to get backups to work by EOD, I'll switch this PR back to self-hosted workflow, if not we should merge this to have all tests running. |
marceloneppel
approved these changes
Nov 30, 2023
Member
marceloneppel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now if we need more investigation or help from IS.
taurus-forever
approved these changes
Nov 30, 2023
BON4
pushed a commit
to BON4/postgresql-operator
that referenced
this pull request
Apr 23, 2024
…onical#301) * Move dashboard legends to the bottom of the graph * Missed test marks
taurus-forever
added a commit
that referenced
this pull request
Aug 20, 2025
It backports data-platform-workflow commit f1f8d27 to local integration test: > patch(integration_test_charm.yaml): Increase disk space step timeout (#301) Otherwise: > Disk usage before cleanup > Filesystem Size Used Avail Use% Mounted on > /dev/root 72G 46G 27G 64% / > tmpfs 7.9G 84K 7.9G 1% /dev/shm > tmpfs 3.2G 1.1M 3.2G 1% /run > tmpfs 5.0M 0 5.0M 0% /run/lock > /dev/sdb16 881M 60M 760M 8% /boot > /dev/sdb15 105M 6.2M 99M 6% /boot/efi > /dev/sda1 74G 4.1G 66G 6% /mnt > tmpfs 1.6G 12K 1.6G 1% /run/user/1001 > Error: The action 'Free up disk space' has timed out after 1 minutes.
taurus-forever
added a commit
that referenced
this pull request
Aug 21, 2025
…elect from pg_settings) (#1049) * DPE-7726 Use Patroni API for is_restart_pending() The previous is_restart_pending() waited for long due to the Patroni's loop_wait default value (10 seconds), which tells how much time Patroni will wait before checking the configuration file again to reload it. Instead of checking PostgreSQL pending_restart from pg_settings, let's check Patroni API pending_restart=True flag. * DPE-7726 Avoid pending_restart=True flag flickering The current Patroni 3.2.2 has wired/flickering behaviour: it temporary flag pending_restart=True on many changes to REST API, which is gone within a second but long enough to be cougth by charm. Sleepping a bit is a necessary evil, until Patroni 3.3.0 upgrade. The previous code sleept for 15 seconds waiting for pg_settings update. Also, the unnecessary restarts could be triggered by missmatch of Patroni config file and in-memory changes coming from REST API, e.g. the slots were undefined in yaml file but set as an empty JSON {} => None. Updating the default template to match the default API PATCHes and avoid restarts. * DPE-7726 Fix topology obsert Primarly status removal On topology observer event, the primary unit used to loose Primarly label. * DPE-7726 Add Patroni API logging Also: * use commong logger everywhere * and add several useful log messaged (e.g. DB connection) * remove no longer necessary debug 'Init class PostgreSQL' * align Patroni API requests style everhywhere * add Patroni API duration to debug logs * DPE-7726 Avoid unnecessary Patroni reloads The list of IPs were randomly sorted causing unnecessary Partroni configuration re-generation with following Patroni restart/reload. * DPE-7726 Remove unnecessary property app_units() and scoped_peer_data() Housekeeping cleanup. * DPE-7726 Stop deffering for non-joined peers on on_start/on_config_changed Those defers are necessary to support scale-up/scale-down during the refresh, while they have significalty slowdown PostgreSQL 16 bootstrap (and other daily related mainteinance tasks, like re-scaling, full node reboot/recovery, etc). Muting them for now with the proper documentation record to forbid rescaling during the refresh, untli we minimise amount of defers in PG16. Throw and warning for us to recall this promiss. * DPE-7726 Start observer on non-Primary Patroni start to speedup re-join The current PG16 logic relies on Juju update-status or on_topology_change observer events, while in some cases we start Patroni without the Observer, causing a long waiting story till the next update-status arrives. * DPE-7726 Log Patroni start/stop/restart (to undestand charm behavior) * DPE-7726 Log unit status change to notice Primary label loose It is hard (impossible?) to catch the Juju Primary label manipulations from Juju debug-log. Logging it simplifyies troubleshooting. * DPE-7726 Fixup logs polishing * DPE-7726 Decrease waiting for DB connection timeout We had to wait 30 seconds in case of lack of connection which is unnecessary long. Also, add details for the reason of failed connection Retry/CannotConnect. * DPE-7726 Stop propogating primary_endpoint=None for single unit app It speedups the sinble unit app deployments. * DPE-7726 Handling get primary cluster RetryError on get_partner_addresses() Otherwise update-status event fails: > unit-postgresql-0: relations.async_replication:Partner addresses: [] > unit-postgresql-0: cluster:Unable to get the state of the cluster > Traceback (most recent call last): > File "/var/lib/juju/agents/unit-postgresql-0/charm/src/cluster.py", line 619, in online_cluster_members > cluster_status = self.cluster_status() > ^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-0/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1116, in wrapped_function > return callable(*args, **kwargs) # type: ignore > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-0/charm/src/cluster.py", line 279, in cluster_status > raise RetryError( > tenacity.RetryError: RetryError[<Future at 0xffddafe01160 state=finished raised Exception>] * DPE-7726 Fix exception on update-status. PostgreSQLUndefinedHostError: Host not set. Exception: > 2025-08-19 20:49:40 DEBUG unit.postgresql/2.juju-log server.go:406 cluster:API get_patroni_health: <Response [200]> (0.057417) > 2025-08-19 20:49:40 DEBUG unit.postgresql/2.juju-log server.go:406 cluster:API cluster_status: [{'name': 'postgresql-0', 'role': 'leader', 'state': 'running', 'api_url': 'https://10.182.246.123:8008/patroni', 'host': '10.182.246.123', 'port': 5432, 'timeline': 1}, {'name': 'postgresql-1', 'role': 'sync_standby', 'state': 'running', 'api_url': 'https://10.182.246.163:8008/patroni', 'host': '10.182.246.163', 'port': 5432, 'timeline': 1, 'lag': 0}, {'name': 'postgresql-2', 'role': 'sync_standby', 'state': 'running', 'api_url': 'https://10.182.246.246:8008/patroni', 'host': '10.182.246.246', 'port': 5432, 'timeline': 1, 'lag': 0}] > 2025-08-19 20:49:40 DEBUG unit.postgresql/2.juju-log server.go:406 __main__:Early exit primary_endpoint: Primary IP not in cached peer list > 2025-08-19 20:49:40 ERROR unit.postgresql/2.juju-log server.go:406 root:Uncaught exception while in charm code: > Traceback (most recent call last): > File "/var/lib/juju/agents/unit-postgresql-2/charm/src/charm.py", line 2736, in <module> > main(PostgresqlOperatorCharm) > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/__init__.py", line 356, in __call__ > return _main.main(charm_class=charm_class, use_juju_for_storage=use_juju_for_storage) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/_main.py", line 502, in main > manager.run() > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/_main.py", line 486, in run > self._emit() > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/_main.py", line 421, in _emit > self._emit_charm_event(self.dispatcher.event_name) > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/_main.py", line 465, in _emit_charm_event > event_to_emit.emit(*args, **kwargs) > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/framework.py", line 351, in emit > framework._emit(event) > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/framework.py", line 924, in _emit > self._reemit(event_path) > File "/var/lib/juju/agents/unit-postgresql-2/charm/venv/lib/python3.12/site-packages/ops/framework.py", line 1030, in _reemit > custom_handler(event) > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1116, in wrapped_function > return callable(*args, **kwargs) # type: ignore > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/src/charm.py", line 1942, in _on_update_status > self.postgresql_client_relation.oversee_users() > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1116, in wrapped_function > return callable(*args, **kwargs) # type: ignore > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/src/relations/postgresql_provider.py", line 172, in oversee_users > user for user in self.charm.postgresql.list_users() if user.startswith("relation-") > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1116, in wrapped_function > return callable(*args, **kwargs) # type: ignore > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/postgresql_k8s/v1/postgresql.py", line 959, in list_users > with self._connect_to_database( > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 1116, in wrapped_function > return callable(*args, **kwargs) # type: ignore > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/var/lib/juju/agents/unit-postgresql-2/charm/lib/charms/postgresql_k8s/v1/postgresql.py", line 273, in _connect_to_database > raise PostgreSQLUndefinedHostError("Host not set") > charms.postgresql_k8s.v1.postgresql.PostgreSQLUndefinedHostError: Host not set > 2025-08-19 20:49:40 ERROR juju.worker.uniter.operation runhook.go:180 hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1 * DPE-7726 Adopt unit test for the new code Tnx to dragomir.penev@ for unit tests fixes here! * DPE-7726 Increase free disk space cleanup timeout (1->3 minutes) It backports data-platform-workflow commit f1f8d27 to local integration test: > patch(integration_test_charm.yaml): Increase disk space step timeout (#301) Otherwise: > Disk usage before cleanup > Filesystem Size Used Avail Use% Mounted on > /dev/root 72G 46G 27G 64% / > tmpfs 7.9G 84K 7.9G 1% /dev/shm > tmpfs 3.2G 1.1M 3.2G 1% /run > tmpfs 5.0M 0 5.0M 0% /run/lock > /dev/sdb16 881M 60M 760M 8% /boot > /dev/sdb15 105M 6.2M 99M 6% /boot/efi > /dev/sda1 74G 4.1G 66G 6% /mnt > tmpfs 1.6G 12K 1.6G 1% /run/user/1001 > Error: The action 'Free up disk space' has timed out after 1 minutes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backup tests were missed when switching to self-hosted runners and are failing likely due to a proxy issue.
This PR re-enables all missed tests and switches back to the reusable workflow.