matrix-org · richvdh · May 11, 2020 · May 6, 2020 · May 6, 2020 · May 6, 2020
diff --git a/changelog.d/7446.feature b/changelog.d/7446.feature
@@ -0,0 +1 @@
+Add support for running replication over Redis when using workers.
diff --git a/docs/workers.md b/docs/workers.md
@@ -1,23 +1,31 @@
 # Scaling synapse via workers
 
-Synapse has experimental support for splitting out functionality into
-multiple separate python processes, helping greatly with scalability.  These
+For small instances it recommended to run Synapse in monolith mode (the
+default). For larger instances where performance is a concern it can be helpful
+to split out functionality into multiple separate python processes. These
 processes are called 'workers', and are (eventually) intended to scale
 horizontally independently.
 
-All of the below is highly experimental and subject to change as Synapse evolves,
-but documenting it here to help folks needing highly scalable Synapses similar
-to the one running matrix.org!
+Synapse's worker support is under active development and subject to change as
+we attempt to rapidly scale ever larger Synapse instances. However we are
+documenting it here to help admins needing a highly scalable Synapse instance
+similar to the one running `matrix.org`.
 
-All processes continue to share the same database instance, and as such, workers
-only work with postgres based synapse deployments (sharing a single sqlite
-across multiple processes is a recipe for disaster, plus you should be using
-postgres anyway if you care about scalability).
+All processes continue to share the same database instance, and as such,
+workers only work with PostgreSQL-based Synapse deployments. SQLite should only
+be used for demo purposes and any admin considering workers should already be
+running PostgreSQL.
 
-The workers communicate with the master synapse process via a synapse-specific
-TCP protocol called 'replication' - analogous to MySQL or Postgres style
-database replication; feeding a stream of relevant data to the workers so they
-can be kept in sync with the main synapse process and database state.
+## Master/worker communication
+
+The workers communicate with the master process via a Synapse-specific protocol
+called 'replication' (analogous to MySQL- or Postgres-style database
+replication) which feeds a stream of relevant data from the master to the
+workers so they can be kept in sync with the master process and database state.
+
+Additionally, workers may make HTTP requests to the master, to send information
+in the other direction. Typically this is used for operations which need to
+wait for a reply - such as sending an event.
 
 ## Configuration
 
@@ -27,50 +35,43 @@ the correct worker, or to the main synapse instance. Note that this includes
 requests made to the federation port. See [reverse_proxy.md](reverse_proxy.md)
 for information on setting up a reverse proxy.
 
-To enable workers, you need to add two replication listeners to the master
-synapse, e.g.:
-
-    listeners:
-      # The TCP replication port
-      - port: 9092
-        bind_address: '127.0.0.1'
-        type: replication
-      # The HTTP replication port
-      - port: 9093
-        bind_address: '127.0.0.1'
-        type: http
-        resources:
-         - names: [replication]
+To enable workers, you need to add *two* replication listeners to the
+main Synapse configuration file (`homeserver.yaml`). For example:
 
-Under **no circumstances** should these replication API listeners be exposed to
-the public internet; it currently implements no authentication whatsoever and is
-unencrypted.
-
-(Roughly, the TCP port is used for streaming data from the master to the
-workers, and the HTTP port for the workers to send data to the main
-synapse process.)
+```yaml
+listeners:
+  # The TCP replication port
+  - port: 9092
+    bind_address: '127.0.0.1'
+    type: replication
+
+  # The HTTP replication port
+  - port: 9093
+    bind_address: '127.0.0.1'
+    type: http
+    resources:
+     - names: [replication]
+```
 
-You then create a set of configs for the various worker processes.  These
-should be worker configuration files, and should be stored in a dedicated
-subdirectory, to allow synctl to manipulate them.
+Under **no circumstances** should these replication API listeners be exposed to
+the public internet; they have no authentication and are unencrypted.
 
-Each worker configuration file inherits the configuration of the main homeserver
-configuration file.  You can then override configuration specific to that worker,
-e.g. the HTTP listener that it provides (if any); logging configuration; etc.
-You should minimise the number of overrides though to maintain a usable config.
+You should then create a set of configs for the various worker processes.  Each
+worker configuration file inherits the configuration of the main homeserver
+configuration file.  You can then override configuration specific to that
+worker, e.g. the HTTP listener that it provides (if any); logging
+configuration; etc.  You should minimise the number of overrides though to
+maintain a usable config.
 
 In the config file for each worker, you must specify the type of worker
 application (`worker_app`). The currently available worker applications are
-listed below. You must also specify the replication endpoints that it's talking
-to on the main synapse process.  `worker_replication_host` should specify the
-host of the main synapse, `worker_replication_port` should point to the TCP
+listed below. You must also specify the replication endpoints that it should
+talk to on the main synapse process.  `worker_replication_host` should specify
+the host of the main synapse, `worker_replication_port` should point to the TCP
 replication listener port and `worker_replication_http_port` should point to
 the HTTP replication port.
 
-Currently, the `event_creator` and `federation_reader` workers require specifying
-`worker_replication_http_port`.
-
-For instance:
+For example:
 
     worker_app: synapse.app.synchrotron
 
@@ -101,6 +102,50 @@ recommend the use of `systemd` where available: for information on setting up
 `systemd` to start synapse workers, see
 [systemd-with-workers](systemd-with-workers). To use `synctl`, see below.
 
+### **Experimental** support for replication over redis
+
+As of Synapse v1.13.0, it is possible to configure Synapse to send replication
+via a [Redis pub/sub channel](https://redis.io/topics/pubsub). This is an
+alternative to direct TCP connections to the master: rather than all the
+workers connecting to the master, all the workers and the master connect to
+Redis, which relays replication commands between processes. This can give a
+significant cpu saving on the master and will be a prerequisite for upcoming
+performance improvements.
+
+Note that this support is currently experimental; you may experience lost
+messages and similar problems! It is strongly recommended that admins setting
+up workers for the first time use direct TCP replication as above.
+
+To configure Synapse to use Redis:
+
+1. Install Redis following the normal procedure for your distribution - for
+   example, on Debian, `apt install redis-server`. (It is safe to use an
+   existing Redis deployment if you have one: we use a pub/sub stream named
+   according to the `server_name` of your synapse server.)
+2. Check Redis is running and accessible: you should be able to `echo PING | nc -q1
+   localhost 6379` and get a response of `+PONG`.
+3. Install the python prerequisites. If you installed synapse into a
+   virtualenv, this can be done with:
+   ```sh
+   pip install matrix-synapse[redis]
+   ```
+   The debian packages from matrix.org already include the required
+   dependencies.
+4. Add config to the shared configuration (`homeserver.yaml`):
+    ```yaml
+    redis:
+      enabled: true
+    ```
+    Optional parameters which can go alongside `enabled` are `host`, `port`,
+    `password`. Normally none of these are required.
+5. Restart master and all workers.
+
+Once redis replication is in use, `worker_replication_port` is redundant and
+can be removed from the worker configuration files. Similarly, the
+configuration for the `listener` for the TCP replication port can be removed
+from the main configuration file. Note that the HTTP replication port is
+still required.
+
 ### Using synctl
 
 If you want to use `synctl` to manage your synapse processes, you will need to