[DPE-8426] Port Jubilant upgrade and async replication tests #1206

dragomirp · 2025-10-07T11:07:17Z

Port async & upgrade tests from Mysql
Fix endpoint update in replicated cluster
Fix standby leader switch in pre-refresh-check

Checklist

I have cleaned any remaining cloud resources from my accounts.
I have added or updated any relevant documentation.

codecov · 2025-10-07T11:08:21Z

Codecov Report

❌ Patch coverage is 42.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.83%. Comparing base (d86803f) to head (cbe61f1).

Files with missing lines	Patch %	Lines
src/charm.py	46.15%	6 Missing and 1 partial ⚠️
src/cluster.py	37.50%	4 Missing and 1 partial ⚠️

❌ Your project check has failed because the head coverage (66.83%) is below the target coverage (70.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           16/edge    #1206      +/-   ##
===========================================
- Coverage    66.94%   66.83%   -0.11%     
===========================================
  Files           17       17              
  Lines         4347     4360      +13     
  Branches       671      675       +4     
===========================================
+ Hits          2910     2914       +4     
- Misses        1242     1249       +7     
- Partials       195      197       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dragomirp · 2025-10-15T01:56:34Z

tests/integration/high_availability/high_availability_helpers_new.py

+import subprocess
+from collections.abc import Callable
+
+import jubilant


Using regular jubilant, since PG 16 doesn't need juju 2 support.

dragomirp · 2025-10-15T01:57:55Z

tests/integration/high_availability/test_upgrade.py

+    logging.info("checking the number of switchovers")
+    final_number_of_switchovers = count_switchovers(juju, DB_APP_NAME)
+    assert (final_number_of_switchovers - initial_number_of_switchovers) <= 2, (
+        "Number of switchovers is greater than 2"
+    )


Mysql checks that the DB primary is moved to the Juju leader. We checked for number of switchovers.

dragomirp · 2025-10-16T01:47:44Z

src/charm.py

+        base_patch = {}
+        if primary_endpoint := self.async_replication.get_primary_cluster_endpoint():
+            base_patch["standby_cluster"] = {"host": primary_endpoint}


Standby cluster is DCS value, so not updated by just changing the config. Should I also add this to the async relation changed hook?

I do not have a strict opinion here. Up to Marcelo.

dragomirp · 2025-10-16T01:48:57Z

src/cluster.py

            with attempt:
-                current_primary = self.get_primary()
+                current_primary = (
+                    self.get_primary() if not async_cluster else self.get_standby_leader()


In the async cluster we have a standby leader.

dragomirp · 2025-10-16T01:50:26Z

src/charm.py

+        for attempt in Retrying(stop=stop_after_attempt(2), wait=wait_fixed(1), reraise=True):
+            with attempt:
+                if not self._charm._patroni.are_all_members_ready():
+                    raise charm_refresh.PrecheckFailed("PostgreSQL is not running on 1+ units")


When moving the standby leader, the status becomes stopped for a bit, so the action will fail if the standby leader is not already the lowest unit.

dragomirp · 2025-10-16T01:51:28Z

src/cluster.py

+                if current_primary == candidate:
+                    logger.info("Candidate and leader are the same")
+                    return


Got conflicts in the standby cluster.

dragomirp · 2025-10-16T01:52:59Z