canonical · taurus-forever · May 5, 2025 · May 5, 2025
diff --git a/docs/explanation.md b/docs/explanation.md
@@ -9,9 +9,10 @@ This section contains pages with more detailed explanations that provide additio
 * [Legacy charm]
 
 ## Operational concepts
-* [Connection pooling]
+* [Units]
 * [Users]
 * [Logs]
+* [Connection pooling]
 
 ## Security and hardening
 * [Security hardening guide][Security]
@@ -22,6 +23,7 @@ This section contains pages with more detailed explanations that provide additio
 
 [Architecture]: /t/11857
 [Interfaces and endpoints]: /t/10251
+[Units]: /t/17525
 [Users]: /t/10798
 [Logs]: /t/12099
 [Juju]: /t/11985

diff --git a/docs/explanation/e-units.md b/docs/explanation/e-units.md
@@ -0,0 +1,93 @@
+# PostgreSQL units
+
+Each [HA](https://en.wikipedia.org/wiki/High_availability)/[DR](https://en.wikipedia.org/wiki/IT_disaster_recovery) implementation has a primary and secondary (standby) site(s).
+Charmed PostgreSQL cluster size can be [easily scaled](/t/11863) from 0 to 10 units ([contact us](/t/11863) for 10+ units cluster). It is recommended to use 3+ units cluster size in production (due to [Raft consensus](https://en.wikipedia.org/wiki/Raft_(algorithm)) requirements). Those units type can be:
+  * **Primary**: unit which accepts all writes and guaranties [no split brain](https://en.wikipedia.org/wiki/Split-brain_(computing)).
+  * **Sync Standby** (synchronous copy) : designed for the fast automatic failover. Used for read-only queries and guaranties the latest transaction availability.
+  * **Replica** (asynchronous copy): designed for long-running and resource consuming queries without affecting Primary performance. Used for read-only queries without guaranties of the latest transaction availability.
+
+> **Warning**: all SQL transactions have to be confirmed by all Sync Standby unit(s) before Primary unit commit transaction to the client. Therefor the high-performance and high-availability is a trade-of balance between "Sync Standby" and "Replica" units count in the cluster.
+
+> **Note**: starting from revision 561 all Charmed PostgreSQL units are configured as Sync Standby members. It provides better guaranties for the data survival when two of three units gone simultaneously. Users can re-configure the necessary synchronous units count using Juju config option '[synchronous_node_count](https://charmhub.io/postgresql/configurations?channel=14/edge#synchronous_node_count)'.
+
+![PostgreSQL Units types|690x253, 100%](upload://pY5kzxO9ELJGEqEe1F1RQjOG6SS.png)
+
+## Primary
+
+The simplest way to find the Primary unit is to run `juju status`. Please be aware that the information here can be outdated as it is being updated only on [Juju event 'update-status'](https://documentation.ubuntu.com/juju/3.6/reference/hook/#update-status): 
+```shell
+ubuntu@juju360:~$ juju status postgresql
+Model       Controller  Cloud/Region         Version  SLA          Timestamp
+postgresql  lxd         localhost/localhost  3.6.5    unsupported  13:04:15+02:00
+
+App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
+postgresql  14.15    active      3  postgresql  14/stable  553  no       
+
+Unit           Workload  Agent  Machine  Public address  Ports     Message
+postgresql/0*  active    idle   0        10.189.210.53   5432/tcp  Primary <<<<<<<<<<<<<<
+postgresql/1   active    idle   1        10.189.210.166  5432/tcp  
+postgresql/2   active    idle   2        10.189.210.188  5432/tcp  
+
+Machine  State    Address         Inst id        Base          AZ  Message
+0        started  10.189.210.53   juju-422c1a-0  ubuntu@22.04      Running
+1        started  10.189.210.166  juju-422c1a-1  ubuntu@22.04      Running
+2        started  10.189.210.188  juju-422c1a-2  ubuntu@22.04      Running
+```
+
+The up-to-date Primary unit number can be received using Juju action `get-primary`:
+```shell
+> juju run postgresql/leader get-primary
+...
+primary: postgresql/0
+```
+
+Also it is possible to retrieve this information using [patronictl](/t/17406#p-37204-patronictl-3) and [Patroni REST API](/t/17406#p-37204-patroni-rest-api-8).
+
+## Standby / Replica
+
+At the moment it is possible to retrieve this information using [patronictl](/t/17406#p-37204-patronictl-3) and [Patroni REST API](/t/17406#p-37204-patroni-rest-api-8) only (check the linked documentation for the access details). Example:
+```shell
+> ... patronictl ... list
++ Cluster: postgresql (7499430436963402504) ---+-----------+----+-----------+
+| Member       | Host           | Role         | State     | TL | Lag in MB |
++--------------+----------------+--------------+-----------+----+-----------+
+| postgresql-0 | 10.189.210.53  | Leader       | running   |  1 |           |
+| postgresql-1 | 10.189.210.166 | Sync Standby | streaming |  1 |         0 |
+| postgresql-2 | 10.189.210.188 | Replica      | streaming |  1 |         0 |
++--------------+----------------+--------------+-----------+----+-----------+
+```
+On the example above:
+* `postgresql-0` is a PostgreSQL Primary unit (Patroni Leader) which accepts all writes
+* `postgresql-1` is a PostgreSQL/Patroni Sync Standby unit which can be promoted as new primary using manual switchover (safe).
+* `postgresql-2` is a PostgreSQL/Patroni Replica unit which can NOT be directly promoted as a new Primary using manual switchover. The automatic promotion Replica=>Sync Standby is necessary to guaranties the latest SQL transactions availability on this unit to allow further promotion as a new Primary. Otherwise the manual failover can be performed to Replica unit accepting the risks of loosing the last transactions(s) which lagged behind Primary. 
+
+## Replica lag distance
+
+At the moment it is possible to retrieve this information using [patronictl](/t/17406#p-37204-patronictl-3) and [Patroni REST API](/t/17406#p-37204-patroni-rest-api-8) only (check the linked documentation for the access details). Example:
+```shell
+> ... patronictl ... list
++ Cluster: postgresql (7499430436963402504) ---+-----------+----+-----------+
+| Member       | Host           | Role         | State     | TL | Lag in MB |
++--------------+----------------+--------------+-----------+----+-----------+
+| postgresql-0 | 10.189.210.53  | Leader       | running   |  1 |           |
+| ...
+| postgresql-2 | 10.189.210.188 | Replica      | streaming |  1 |        42 |  <<<<<
++--------------+----------------+--------------+-----------+----+-----------+
+
+> curl ... x.x.x.x:8008/cluster | jq
+  "members": [
+    {
+      "name": "postgresql-0",
+      "role": "leader",
+      "state": "running",
+      ...
+    },
+...
+    {
+      "name": "postgresql-2",
+      "role": "replica",
+      "state": "streaming",
+      ...
+      "lag": 42 <<<<<<<<<<<< Lag in MB
+    }
+```
diff --git a/docs/explanation/e-users.md b/docs/explanation/e-users.md
@@ -1,4 +1,4 @@
-# Charm Users explanations
+# Users
 
 There are three types of users in PostgreSQL:
 * Internal users (used by charm operator)

diff --git a/docs/how-to.md b/docs/how-to.md
@@ -13,10 +13,12 @@ Installation of different cloud services with Juju:
 * [Azure]
 * [Multi-availability zones (AZ)][Multi-AZ]
 
-Specific deployment scenarios and architectures:
-* [Terraform]
-* [Air-gapped]
+Other deployment scenarios and configurations:
 * [TLS VIP access]
+* [Juju spaces]
+* [Air-gapped]
+* [Terraform]
+* [Juju storage]
 
 ## Usage and maintenance
 
@@ -25,6 +27,7 @@ Specific deployment scenarios and architectures:
 * [Scale replicas]
 * [Enable TLS]
 * [Enable plugins/extensions]
+* [Switchover/failover]
 
 ## Backup and restore
 * [Configure S3 AWS]
@@ -36,9 +39,10 @@ Specific deployment scenarios and architectures:
 
 ## Monitoring (COS)
 
-* [Enable monitoring]
-* [Enable alert rules]
-* [Enable tracing]
+* [Enable monitoring] with Grafana
+* [Enable alert rules] with Prometheus
+* [Enable tracing] with Tempo
+* [Enable profiling] with Parca
 
 ## Minor upgrades
 * [Perform a minor upgrade]
@@ -69,13 +73,17 @@ This section is for charm developers looking to support PostgreSQL integrations
 [GCE]: /t/15722
 [Azure]: /t/15733
 [Multi-AZ]: /t/15749
+[TLS VIP access]: /t/16576
+[Juju spaces]: /t/17416
 [Terraform]: /t/14916
 [Air-gapped]: /t/15746
-[TLS VIP access]: /t/16576
+[Juju storage]: /t/17529
+
 [Integrate with another application]: /t/9687
 [External access]: /t/15802
 [Scale replicas]: /t/9689
 [Enable TLS]: /t/9685
+[Switchover/failover]: /t/17523
 
 [Configure S3 AWS]: /t/9681
 [Configure S3 RadosGW]: /t/10313
@@ -87,7 +95,8 @@ This section is for charm developers looking to support PostgreSQL integrations
 [Enable monitoring]: /t/10600
 [Enable alert rules]: /t/13084
 [Enable tracing]: /t/14521
-
+[Enable profiling]: /t/17172
+
 [Perform a minor upgrade]: /t/12089
 [Perform a minor rollback]: /t/12090
 

diff --git a/docs/how-to/h-async-set-up.md b/docs/how-to/h-async-set-up.md
@@ -62,7 +62,7 @@ juju run -m rome db1/leader create-replication
 To switchover and use `lisbon` as the primary instead, run
 
 ```shell
-juju run -m lisbon db2/leader promote-to-primary
+juju run -m lisbon db2/leader promote-to-primary scope=cluster
 ```
 
 ## Scale a cluster

diff --git a/docs/how-to/h-deploy-juju-spaces.md b/docs/how-to/h-deploy-juju-spaces.md
@@ -0,0 +1,65 @@
+# Deploy on Juju spaces
+
+The Charmed PostgreSQL operator supports [Juju spaces](https://documentation.ubuntu.com/juju/latest/reference/space/index.html) to separate network traffic for:
+- **Client** - PostgreSQL instance to client data
+- **Instance-replication** - cluster instances replication data
+- **Cluster-replication** - cluster to cluster replication data
+- **Backup** - backup and restore data
+
+## Prerequisites
+
+* **Charmed PostgreSQL 16**
+* Configured network spaces
+  * See [Juju | How to manage network spaces](https://documentation.ubuntu.com/juju/latest/reference/juju-cli/list-of-juju-cli-commands/add-space/)
+
+## Deploy
+
+On application deployment, constraints are required to ensure the unit(s) have address(es) on the specified network space(s), and endpoint binding(s) for the space(s).
+
+For example, with spaces configured for instance replication and client traffic:
+```shell
+❯ juju spaces
+Name      Space ID  Subnets
+alpha     0         10.163.154.0/24
+client    1         10.0.0.0/24
+peers     2         10.10.10.0/24
+```
+
+The space `alpha` is default and cannot be removed. To deploy Charmed PostgreSQL Operator using the spaces:
+```shell
+juju deploy postgresql --channel 16/edge \
+  --constraints spaces=client,peers \
+  --bind "database-peers=peers database=client"
+```
+
+[note type=caution]
+Currently there's no support for the juju  `bind` command. Network space binding must be defined at deploy time only.
+[/note]
+
+Consequently, a client application must use the `client` space on the model, or a space for the same subnet in another model, for example:
+```shell
+juju deploy client-app \
+  --constraints spaces=client \
+  --bind database=client
+```
+
+The two application can be then related using:
+```shell
+juju integrate postgresql:database client-app:database
+```
+
+The client application will receive network endpoints on the `10.0.0.0/24` subnet.
+
+The Charmed PostgreSQL operator endpoints are:
+
+| Endpoint                       | Traffic              |
+| ------------------------------ | -------------------- |
+| database                       | Client               |
+| database-peers                 | Instance-replication |
+| replication-offer, replication | Cluster-replication  |
+| s3-parameters                  | Backup               |
+
+
+[note]
+If using a network space for the backup traffic, the user is responsible for ensuring that the target object storage URL traffic is routed via the specified network space.
+[/note]