SST test #43

marceloneppel · 2022-10-18T12:44:49Z

Issue

Jira issues: DPE-546
The PostgreSQL charm should self heal the workload when restarting the DB process without data or transaction logs (SST test).

Solution

Add a test that deletes the PostgreSQL data directory files (including transaction logs), restarts the DB processe and later check that the workload recovers itself from that situation.

Context

This test can also be run on an existing cluster.

Testing

The test was added on tests/integration/ha_tests/test_self_healing.py.

Release Notes

Add SST test.

…eady running

codecov · 2022-10-18T12:46:35Z

Codecov Report

Merging #43 (3497cf3) into main (855dab5) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #43   +/-   ##
=======================================
  Coverage   64.05%   64.05%           
=======================================
  Files           6        6           
  Lines         818      818           
  Branches      121      121           
=======================================
  Hits          524      524           
  Misses        264      264           
  Partials       30       30

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…al/postgresql-operator into fix-early-tls-deployment

…st-test

shayancanonical · 2022-11-17T19:23:29Z

tests/integration/ha_tests/test_self_healing.py

+    # Copy data dir content removal script.
+    await ops_test.juju(
+        "scp", "tests/integration/ha_tests/clean-data-dir.sh", f"{primary_name}:/tmp"
+    )


i believe that for this test, we need to remove the data directory while the postgress process is down (we need to extend the systemd restart timeout like mongodb here)

an excerpt of a message from mykola:

for SST test (in MySQL): 1) stop pebble/systemd on one member, remove all files in /var/lib/mysql [data directory] (simulate HDD failure) 2) write data to new primary 3) rotate binlog and remove rotated binlog ON ALL alive members. literally remove data written on step 2 (with such we simulate looong period of downtime) 4) run mysql on member from step 1)

Thanks for the additional details, Shayan! I'll update the PR to have a similar approach on PostgreSQL.

Hey @shayancanonical! I updated the code to have the right steps that simulate the needed scenario.

I haven't changed the systemd service restart timeout as after stopping the service it was not being restarted until I request to start it again.

I also added a check to ensure the WAL files (the equivalent to MysQL binlog) are correctly rotated (a new one is created - in fact, more than one new WAL file is kept due to some settings that enabled the old ones to be removed).

tests/integration/ha_tests/test_self_healing.py

shayancanonical

Looks great!

* Add SST test * Enable previous tests * fix early tls deployment by only reloading patroni config if it's already running * Improve code * Remove duplicate check * Remove unused import * added unit test for reloading patroni * lint * removing postgres restart check * Pin Juju agent version on CI * adding series flags to test apps * adding series flags to test apps * made series into a list * Update test_new_relations.py * Add retrying * updating test to better emulate bundle deploymen * Remove unused code * Change processes list * Add logic for ensuring all units down * Change delay to only one unit * Add WAL switch * Updates related to WAL removal * Small improvements * Add comments * Change the way service is stopped * Remove slot removal * Small fixes * Remove unussed parameter Co-authored-by: WRFitch <will.fitch@canonical.com> Co-authored-by: Will Fitch <WRFitch@outlook.com>

marceloneppel and others added 6 commits October 17, 2022 20:25

Add SST test

04223a8

Enable previous tests

ebc4814

fix early tls deployment by only reloading patroni config if it's alr…

84fca0c

…eady running

Improve code

d5d0ad8

Remove duplicate check

1e64d31

Remove unused import

4688a95

WRFitch and others added 12 commits October 18, 2022 13:52

added unit test for reloading patroni

2b76d2e

lint

edcfb0c

removing postgres restart check

51cee09

Pin Juju agent version on CI

f05a540

adding series flags to test apps

5a97730

adding series flags to test apps

072252a

made series into a list

f340ccf

Update test_new_relations.py

b537e82

Add retrying

6c0db4b

updating test to better emulate bundle deploymen

60bc8e2

Merge branch 'fix-early-tls-deployment' of https://github.com/canonic…

dc3dd8f

…al/postgresql-operator into fix-early-tls-deployment

Merge remote-tracking branch 'origin/fix-early-tls-deployment' into s…

6772702

…st-test

marceloneppel force-pushed the sst-test branch from da3656a to 6772702 Compare November 16, 2022 19:02

marceloneppel added 3 commits November 16, 2022 16:03

Merge branch 'main' into sst-test

3348c2b

Remove unused code

f73aefc

Change processes list

c539f78

marceloneppel marked this pull request as ready for review November 17, 2022 14:04

marceloneppel requested review from MiaAltieri, marcoppenheimer, paulomach, shayancanonical, taurus-forever and zmraul November 17, 2022 14:05

marceloneppel requested review from Mehdi-Bendriss, WRFitch, averma-canonical, carlcsaposs-canonical, dragomirp and welpaolo November 17, 2022 14:05

shayancanonical reviewed Nov 17, 2022

View reviewed changes

marceloneppel added 3 commits November 18, 2022 11:21

Add logic for ensuring all units down

ed987ed

Change delay to only one unit

b1c8c2c

Add WAL switch

8e507a0

WRFitch reviewed Nov 21, 2022

View reviewed changes

tests/integration/ha_tests/test_self_healing.py Outdated Show resolved Hide resolved

marceloneppel added 8 commits November 28, 2022 14:03

Merge branch 'main' into sst-test

6fcdebd

Updates related to WAL removal

107e24c

Small improvements

50721e1

Add comments

addf716

Change the way service is stopped

deb8d23

Remove slot removal

d8d58ac

Small fixes

730e9d4

Remove unussed parameter

3497cf3

WRFitch approved these changes Nov 29, 2022

View reviewed changes

marceloneppel requested a review from shayancanonical November 30, 2022 11:28

shayancanonical approved these changes Dec 1, 2022

View reviewed changes

marceloneppel merged commit d94460d into main Dec 1, 2022

marceloneppel deleted the sst-test branch December 1, 2022 09:20

github-actions bot added a commit to canonical/test-runners-2-github-x64-postgresql-operator that referenced this pull request May 22, 2024

Allure report canonical#43

2c80a2e

github-actions bot added a commit to canonical/test-runners-2-azure-arm64-postgresql-operator that referenced this pull request May 23, 2024

Allure report canonical#43

c96e711

github-actions bot added a commit to canonical/test-runners-2-is-x64-postgresql-operator that referenced this pull request May 23, 2024

Allure report canonical#43

a2a1825

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SST test #43

SST test #43

Uh oh!

marceloneppel commented Oct 18, 2022 •

edited

Loading

Uh oh!

codecov bot commented Oct 18, 2022 •

edited

Loading

Uh oh!

shayancanonical Nov 17, 2022

Uh oh!

marceloneppel Nov 17, 2022

Uh oh!

marceloneppel Nov 30, 2022 •

edited

Loading

Uh oh!

Uh oh!

shayancanonical left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SST test #43

SST test #43

Uh oh!

Conversation

marceloneppel commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Context

Testing

Release Notes

Uh oh!

codecov bot commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shayancanonical Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

marceloneppel Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

marceloneppel Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shayancanonical left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marceloneppel commented Oct 18, 2022 •

edited

Loading

codecov bot commented Oct 18, 2022 •

edited

Loading

marceloneppel Nov 30, 2022 •

edited

Loading