Skip to content

Commit c7a3c43

Browse files
[DPE-7949] Refresh v3 documentation (#1087)
* Add refresh home page and upgrade guide * separate glossary and add rollback guide * update releases page * fix broken link * update releases page and fix glossary heading * remove extra heading level * fix typo * add some small clarifications * prioritize refresh terminology * fix toctree * (wip) apply some feedback * fix errors * Add refresh home page and upgrade guide * separate glossary and add rollback guide * update releases page * update releases page and fix glossary heading * remove extra heading level * fix typo * add some small clarifications * prioritize refresh terminology * fix toctree * (wip) apply some feedback * fix errors * Update landing page * Fix reference * Fix pause-after-unit-refresh * Fix rst link * Add snap revision * remove duplicate page and undefined terms * draft * update link * update link * punctuation * Remove refresh glossary entries * Use rst inline links as workaround * feedback * fix link * feedback * feedback * separate halt section (#1179) * revert changes to releases.md to fix merge conflicts * feedback --------- Co-authored-by: Carl Csaposs <carl.csaposs@canonical.com>
1 parent 798e91f commit c7a3c43

File tree

3 files changed

+248
-15
lines changed

3 files changed

+248
-15
lines changed

docs/how-to/index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,8 @@ Other deployment scenarios and configurations:
4343
* [Enable tracing] with Tempo
4444
* [Enable profiling] with Parca
4545

46-
## Upgrades
47-
* [How to upgrade]
46+
## Refresh (upgrade)
47+
* [How to refresh]
4848

4949
## Cross-regional (cluster-cluster) async replication
5050

@@ -99,7 +99,7 @@ This section is for charm developers looking to support PostgreSQL integrations
9999
[Enable tracing]: /how-to/monitoring-cos/enable-tracing
100100
[Enable profiling]: /how-to/monitoring-cos/enable-profiling
101101

102-
[How to upgrade]: /how-to/upgrade/index
102+
[How to refresh]: /how-to/refresh
103103

104104
[Cross-regional async replication]: /how-to/cross-regional-async-replication/index
105105
[Set up clusters]: /how-to/cross-regional-async-replication/set-up-clusters
@@ -119,7 +119,6 @@ This section is for charm developers looking to support PostgreSQL integrations
119119
```{toctree}
120120
:titlesonly:
121121
:maxdepth: 2
122-
:glob:
123122
:hidden:
124123
125124
Deploy <deploy/index>
@@ -133,8 +132,8 @@ Enable LDAP <enable-ldap>
133132
Enable plugins/extensions <enable-plugins-extensions/index>
134133
Back up and restore <back-up-and-restore/index>
135134
Monitoring (COS) <monitoring-cos/index>
136-
Upgrade <upgrade/index>
135+
Refresh (upgrade) <refresh>
137136
Cross-regional async replication <cross-regional-async-replication/index>
138137
Logical replication <logical-replication/index>
139138
Development <development/index>
140-
139+
```

docs/how-to/refresh.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
# Refresh (upgrade)
2+
3+
```{admonition} Emergency stop button
4+
:class: attention
5+
Use `juju config <app name> pause-after-unit-refresh=all` to halt an in-progress refresh.
6+
Then, consider [rolling back](#roll-back)
7+
```
8+
9+
Charmed PostgreSQL supports minor version in-place refresh via the [`juju refresh`](https://documentation.ubuntu.com/juju/3.6/reference/juju-cli/list-of-juju-cli-commands/refresh/#details) command.
10+
11+
## Determine which version to refresh to
12+
13+
Get the current charm revision of the application with [`juju status`](https://documentation.ubuntu.com/juju/3.6/reference/juju-cli/list-of-juju-cli-commands/status/).
14+
15+
### Recommended refreshes
16+
17+
These refreshes are well-tested and should be preferred.
18+
19+
```{eval-rst}
20+
+--------------+------------+----------+--------------+------------+----------+-----------------------------------------------------------------------------------------------+
21+
| .. centered:: From | .. centered:: To | Charm release notes to review |
22+
+--------------+------------+----------+--------------+------------+----------+ |
23+
| Charm | PostgreSQL | Snap | Charm | PostgreSQL | Snap | |
24+
| revision | Version | revision | revision | Version | revision | |
25+
+==============+============+==========+==============+============+==========+===============================================================================================+
26+
| 843 (amd64) | 16.9 | 201, 202 | TODO (amd64) | 16.9 | TODO | `TODO, TODO <https://github.com/canonical/postgresql-operator/releases/tag/v16%2F1.60.0>`__ |
27+
+--------------+ | +--------------+ | | `TODO, TODO <https://github.com/canonical/postgresql-operator/releases/tag/v16%2F1.61.0>`__ |
28+
| 844 (arm64) | | | TODO (arm64) | | | |
29+
+--------------+------------+----------+--------------+------------+----------+-----------------------------------------------------------------------------------------------+
30+
```
31+
32+
### Supported refreshes
33+
34+
If possible, use a [recommended refresh](#recommended-refreshes) instead.
35+
36+
```{eval-rst}
37+
+------------+------------+----------+------------+------------+----------+
38+
| .. centered:: From | .. centered:: To |
39+
+------------+------------+----------+------------+------------+----------+
40+
| Charm | PostgreSQL | Snap | Charm | PostgreSQL | Snap |
41+
| revision | Version | revision | revision | Version | revision |
42+
+============+============+==========+============+============+==========+
43+
| 843, 844 | 16.9 | 201, 202 | TODO | 16.9 | TODO |
44+
| | | +------------+------------+----------+
45+
| | | | TODO | 16.10 | TODO |
46+
+------------+------------+----------+------------+------------+----------+
47+
```
48+
49+
### Unsupported refreshes
50+
51+
These are examples of refreshes that are not supported in-place.
52+
In some of these cases, it may be possible to perform an out-of-place upgrade or downgrade.
53+
54+
* Minor in-place downgrade from PostgreSQL 16.10 to 16.9
55+
* Major in-place upgrade from PostgreSQL 14 to 16
56+
* Major in-place downgrade from PostgreSQL 16 to 14
57+
* Any refresh from or to a non-stable version (e.g. 16/edge)
58+
59+
## Create a backup
60+
61+
See [](/how-to/back-up-and-restore/create-a-backup).
62+
63+
### Verify the backup
64+
65+
Verify the integrity of the backup by performing a test [restore on another application](/how-to/back-up-and-restore/migrate-a-cluster).
66+
Check the restored data by ensuring that:
67+
* recent data is present
68+
* the data size is correct
69+
* the data matches what you expected in the backup
70+
71+
## Read the rollback instructions
72+
73+
In the event that something goes wrong (e.g. the refresh fails, the new version of PostgreSQL is not performant enough, a database client is incompatible with the new version), you may want to quickly roll back.
74+
75+
Prepare for this possibility by reading through the entire refresh documentation—with special attention to the [](#halt-the-refresh) and [](#roll-back) sections—before starting the refresh.
76+
77+
## Review release notes
78+
79+
For every charm version between the version that you are refreshing from and to—and for the version you are refreshing to, review the release notes to understand what changed and if any action is required from you before, during, or after the refresh.
80+
81+
For [recommended refreshes](#recommended-refreshes), refer to the rightmost column of the table.
82+
83+
If the PostgreSQL versions that you are refreshing from and to are different, refer to the [upstream PostgreSQL release notes](https://www.postgresql.org/docs/release/) to understand what changed and if any action is required from you.
84+
85+
## Test in a staging environment
86+
87+
We recommend testing the entire refresh procedure in a staging environment before refreshing your production environment.
88+
89+
In a staging environment, we also encourage you to simulate failure of the refresh and to practice recovery by restoring from [the backup](#create-a-backup).
90+
91+
## Check that clients are compatible
92+
93+
Ensure that your clients are compatible with the PostgreSQL version that you're refreshing to.
94+
It may be necessary to refresh your clients before refreshing PostgreSQL.
95+
96+
## Inform users and schedule a maintenance window
97+
98+
Tell your users when you will perform the refresh and remain in contact with them so that you are aware of any issues.
99+
100+
If possible, schedule a maintenance window during a period of low traffic.
101+
The duration of the refresh may depend on the size of your data and volume of traffic.
102+
To estimate the duration, we recommend [testing on a staging environment](#test-in-a-staging-environment).
103+
104+
## Consider scaling up
105+
106+
During the refresh of the application, units will be restarted one by one.
107+
While a unit is restarting, the performance of the cluster will be degraded.
108+
109+
To ensure that the cluster can handle all traffic during the refresh, consider scaling up the application by 1 unit.
110+
111+
```{note}
112+
The PostgreSQL charm does not currently support scaling up while a refresh is in progress.
113+
114+
If you anticipate that the refresh will be in progress for an extended duration (e.g. days, weeks), scale up the application before the refresh so that it can handle the maximum load during that period.
115+
```
116+
117+
## Pre-refresh check
118+
119+
Run the `pre-refresh-check` action on the leader unit to prepare the application for refresh.
120+
121+
```shell
122+
juju run postgresql/leader pre-refresh-check
123+
```
124+
125+
If the action does not succeed, do not refresh.
126+
127+
If the action succeeds, copy down the rollback command.
128+
Keep the command available in case you need to [roll back](#roll-back).
129+
130+
## Configure `pause-after-unit-refresh`
131+
132+
After each unit is refreshed, the charm will perform automatic health checks.
133+
We recommend supplementing the automatic checks with manual checks.
134+
135+
Examples of manual checks:
136+
* Database clients are healthy and can connect to the refreshed units
137+
* Transactions per second and resource consumption (CPU, memory, disk) are similar on refreshed and non-refreshed units
138+
* Leaving the application in a partially-refreshed state (only some units refreshed) for several weeks and monitoring that the new version is stable in your environment
139+
140+
To facilitate your manual checks, the application can be configured to pause the refresh and wait for your confirmation.
141+
142+
Set the `pause-after-unit-refresh` config option to:
143+
* `all` to wait for your confirmation after each unit refreshes
144+
* `first` (default) to wait for your confirmation once, after the first unit refreshes
145+
* `none` to never wait for your confirmation
146+
147+
For example:
148+
```shell
149+
juju config postgresql pause-after-unit-refresh=all
150+
```
151+
152+
```{note}
153+
If the charm's automatic health checks fail, the refresh will be paused (until those health checks succeed) regardless of the value of the `pause-after-unit-refresh` config option.
154+
```
155+
156+
## Avoid operations while a refresh is in progress
157+
158+
While a refresh is in progress, the application is in a vulnerable state.
159+
160+
These operations are not supported while a refresh is in progress:
161+
* Scaling up the application
162+
* Scaling down the application—unless it is necessary for recovery
163+
* Creating or removing relations
164+
* Creating or restoring a backup (on the Juju application)
165+
* Changes to config values (except `pause-after-unit-refresh`)
166+
167+
## Start the refresh
168+
169+
Use `juju refresh` and specify the charm revision that you are refreshing to.
170+
171+
```shell
172+
juju refresh postgresql --revision 0
173+
```
174+
175+
## Halt the refresh
176+
177+
If something goes wrong, halt the refresh by running:
178+
179+
```shell
180+
juju config postgresql pause-after-unit-refresh=all
181+
```
182+
183+
In the command above, replace `postgresql` with the name of the Juju application.
184+
185+
Next, assess the situation and plan the recovery.
186+
Often, the safest recovery path is to [roll back](#roll-back).
187+
Consider [contacting us](/reference/contacts).
188+
189+
## Roll back
190+
191+
If something went wrong, the safest recovery path is often to roll back to the original version.
192+
193+
First, [halt the refresh](#halt-the-refresh).
194+
195+
Run the rollback command [you copied down earlier](#pre-refresh-check).
196+
In most cases, the rollback command is also displayed in the application's status message in `juju status`.
197+
198+
### Resume the rollback
199+
200+
If more than one unit was refreshed before the rollback was started and `pause-after-unit-refresh` is set to `all` or `first`, your manual confirmation will be needed to complete the rollback.
201+
The procedure for the rollback is the same as described in [](#monitor-the-refresh).
202+
203+
### Reflect
204+
205+
After the application has been rolled back and you have confirmed that service has been fully restored, investigate what went wrong.
206+
207+
If applicable, please file a [bug report](/reference/contacts).
208+
209+
Once you understand what went wrong and have tested that it has been fixed, the refresh can be attempted again.
210+
211+
## Monitor the refresh
212+
213+
Use `juju status` to monitor the progress of the refresh.
214+
215+
In some cases, it may take a few minutes for the statuses to update after the refresh has started.
216+
217+
If the application status or any of the unit statuses are `blocked`, your action is required.
218+
Follow the instructions in the status messages.
219+
220+
If the application status or any of the unit statuses are `error`, your action may be required.
221+
Monitor `juju debug-log`.
222+
The error may have been a temporary issue.
223+
If the error persists, your action is required—consider [rolling back](#roll-back).
224+
225+
Monitor the refresh until it successfully finishes.
226+
When the refresh completes, the application status will go from a message beginning with "Refreshing" to an `active` status with no message.
227+
228+
### Resume refresh
229+
230+
If `pause-after-unit-refresh` is set to `all` or `first` (default), your confirmation will be needed during the refresh.
231+
232+
The application status in `juju status` will instruct you when your confirmation is needed with the `resume-refresh` action.
233+
234+
Before running the `resume-refresh` action:
235+
* Wait until all of the application's unit agent statuses are `idle`
236+
* Wait until all of the refreshed units' workload statuses are `active`
237+
* Perform [manual checks](#configure-pause-after-unit-refresh) to ensure that everything is healthy
238+
239+
Example of running the `resume-refresh` action on unit 1:
240+
241+
```shell
242+
juju run postgresql/1 resume-refresh
243+
```

docs/how-to/upgrade/index.md

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)