-
Notifications
You must be signed in to change notification settings - Fork 26
[DPE-4533] Pause Patroni in the TLS test #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #588 +/- ##
==========================================
+ Coverage 68.60% 68.77% +0.17%
==========================================
Files 10 10
Lines 2978 2985 +7
Branches 564 565 +1
==========================================
+ Hits 2043 2053 +10
+ Misses 816 814 -2
+ Partials 119 118 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
|
||
@pytest.mark.group(1) | ||
async def test_remove_tls(ops_test: OpsTest) -> None: | ||
async with ops_test.fast_forward(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting off TLS removal as well.
tests/integration/test_tls.py
Outdated
await ops_test.model.wait_for_idle(apps=[DATABASE_APP_NAME], status="active", timeout=1000) | ||
await ops_test.model.wait_for_idle(apps=[DATABASE_APP_NAME], status="active", timeout=1500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouted 3 times in a row for Juju 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tnx!
try: | ||
container.restart(self._postgresql_service) | ||
except ChangeError: | ||
logger.exception("Failed to restart patroni") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a dangling Postgresql process, Patroni will be unable to start
try: | ||
await run_command_on_unit(ops_test, primary, "/charm/bin/pebble stop postgresql") | ||
except Exception as e: | ||
# pebble stop on juju 2 errors out and leaves dangling PG processes | ||
if juju_major_version > 2: | ||
raise e | ||
await run_command_on_unit(ops_test, primary, "pkill --signal SIGTERM -x postgres") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Juju 2 if the pebble stop
command fails, the Patroni process is terminated, but a dangling Postgresql process remains and prevents Patroni from starting again.
Alternatively, we can do this check and termination in the charm code, but I would prefer not to add Juju 2 specific live code. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2.9 if patroni stop failed: kill -SIGTERM $postgresql_pid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2.9 if patroni stop failed: kill -SIGTERM $postgresql_pid ?
Pretty much. We can do this in the ChangeError in the restart check. Question is do we want to add that code for Juju 2's sake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we keep this SIGTERM call in the test code and not in the charm code. Even if it is a SIGTERM, I'd prefer either to wait for the database process to gracefully finish by itself or analyze the situation manually before doing any action to avoid any data corruption.
TLS test succeeded 5 times in a row on juju 2: https://github.com/canonical/postgresql-k8s-operator/actions/runs/10212523498/job/28261962481?pr=588 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one educational question.
primary_name = await get_primary(ops_test, app) | ||
unit_ip = await get_unit_address(ops_test, primary_name) | ||
configuration_info = requests.get(f"http://{unit_ip}:8008/config") | ||
configuration_info = requests.get(f"{schema}://{unit_ip}:8008/config", verify=not tls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please share the reason for verify=not tls
?
try: | ||
await run_command_on_unit(ops_test, primary, "/charm/bin/pebble stop postgresql") | ||
except Exception as e: | ||
# pebble stop on juju 2 errors out and leaves dangling PG processes | ||
if juju_major_version > 2: | ||
raise e | ||
await run_command_on_unit(ops_test, primary, "pkill --signal SIGTERM -x postgres") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2.9 if patroni stop failed: kill -SIGTERM $postgresql_pid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the changes, but why so many failures on install hook on CI? :(
Test app is borked due to charmcraft 3 release. |
try: | ||
await run_command_on_unit(ops_test, primary, "/charm/bin/pebble stop postgresql") | ||
except Exception as e: | ||
# pebble stop on juju 2 errors out and leaves dangling PG processes | ||
if juju_major_version > 2: | ||
raise e | ||
await run_command_on_unit(ops_test, primary, "pkill --signal SIGTERM -x postgres") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we keep this SIGTERM call in the test code and not in the charm code. Even if it is a SIGTERM, I'd prefer either to wait for the database process to gracefully finish by itself or analyze the situation manually before doing any action to avoid any data corruption.
Mostly a port of canonical/postgresql-operator#534