Deploying a Docker based Geoserver cluster solution

Here I will document the steps I took along with any issues encountered during deployment of a Geoserver cluster with Docker.

At minimum, our Docker-based Geoserver deployment needs 3 main elements:

A disk volume for the data and relevant scripts for daily updates.
A Postgis server, either Docker based or standalone. At present our AWS deployment uses a dedicated RDB instance.
A Docker container running Geoserver ( 2.24.4 at the time of this writing).

Single node instance:

Make an EBS volume and copy all current data onto it.
Create a VM and install docker.io.
Mount this volume on /mnt/cephfs (the mount point is a holdover for when we used this setup on the Plume cluster at the Firelab).
Deploy a Postgres server with Postgis extension (via container or RDB instance).
Load the wfas database onto the Postgis server.
Deploy a single node Geoserver instance via docker container. Example Docker file, using an image built with WFAS extensons:

version: "3.7"
 
services:
  geoserver_adm:
    image: wfas/wfas-repo:geoserver-2.24.4-alpine
    depends_on:
      - broker
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
    ports:
      - target: 8080
        published: 8081
        protocol: tcp
        mode: host 
    volumes:
      - type: bind
        source: /mnt/cephfs/geoserver/data_dir
        target: /opt/geoserver/data_dir
      - type: bind
        source: /mnt/cephfs
        target: /mnt/cephfs
    environment:
      - CLUSTER_CONFIG_DIR=/opt/geoserver/data_dir/cluster/master
      - GEOSERVER_LOG_LOCATION=/opt/geoserver/logs/geoserver.log
      - COOKIE=JSESSIONID prefix
      - GWC_DISKQUOTA_DISABLED="true"
      - GWC_METASTORE_DISABLED="true"
      - instanceName=master
      - POSTGRES_ADDR=
      - POSTGRES_PORT=
      - DBNAME=
      - PROXY_NAME=
      - PROXY_HTTPS_PORT=  
    deploy:
      restart_policy:
        condition: any
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname==]
      update_config:
        delay: 2s

Initialize a Docker swarm and add nodes

A Docker swarm is based on a manager/worker node structure, where the nodes are either physical machines or virtual instances.

I created 3 instances on the Plume Openstack cluster, using the Ubuntu 19.04 Disco image.

Next, I installed docker on each of the nodes:

sudo apt update

sudo apt install docker.io

Then initialized the swarm:

sudo docker swarm init --advertise-addr <MANAGER-IP>

Added the worker nodes:

sudo docker swarm join --token sometoken <MANAGER-IP/MANAGER PORT>

Created a local image repository:

sudo docker service create --name registry --publish 5000:5000 registry:2

Build images and deploy the stack

I based this stack on the activemq Dockerfile from G Roldan's 2019 FOSS4G presentation, and the kartoza/geoserver docker image.

Some customizations were necessary, hence the need for the local registry in Step 5 above. Most notably, these customizations were:

Modifying the path for the GEOSERVER_DATA_DIR environment variable to match the path specified in the kartoza/geoserver image:

ENV GEOSERVER_DATA_DIR /opt/geoserver/data_dir

Adding the jms-clustering plugin to the list of extensions in the setup.sh script for the kartoza/geoserver image:

geoserver-${GS_VERSION:0:4}-SNAPSHOT-jms-cluster-plugin.zip

There were a few other changes, such as geoserver download URL since the public server rebuild, and creation of directories. I should include a file with my changes.

Build the Docker images:

sudo docker build --tag 'localhost:5000/geoserver:2.15.1' .

sudo docker build --tag 'localhost:5000/activemqbroker:2.15.1' .

Following a successful build, these iamges should now be listed with sudo docker image ls

Push these images to the local registry:

sudo docker push localhost:5000/geoserver:2.15.1

sudo docker push localhost:5000/activemqbroker:2.15.1

Finally, deploy the stack:

sudo docker stack deploy -c docker-geoserver.stack.yml gs_stack

Inspect the status of the deployment

sudo docker stack ps gs_stack

sudo docker service ls

Clustering configuration: Inspect configuration of the cluster.properties file. From this thread: the option "Slave connection" in the "Cluster configuration" panel should be set to "enabled". It is important to note that "Slave connection" corresponds to "connection=enabled" in cluster.properties. The "URL of the broker to connect" field (on the very top) should be set to the broker:tcp://broker:61666, where broker is the name of the service where the activemq appication is running. This had me stumped for days. Working options in the cluster.properties file:

in cluster/master:

CLUSTER_CONFIG_DIR=/opt/geoserver/data_dir/cluster/master
instanceName=master
readOnly=disabled
brokerURL=tcp\://broker\:61666
durable=true
embeddedBroker=disabled
toggleMaster=true
connection.retry=10
xbeanURL=./broker.xml
embeddedBrokerProperties=embedded-broker.properties
topicName=VirtualTopic.geoserver
connection=enabled
toggleSlave=false
connection.maxwait=500
group=geoserver-cluster

in cluster/slave1:

CLUSTER_CONFIG_DIR=/opt/geoserver/data_dir/cluster/slave1
instanceName=slave1
readOnly=enabled
brokerURL=tcp\://broker\:61666
durable=true
embeddedBroker=disabled
toggleMaster=false
connection.retry=10
xbeanURL=./broker.xml
embeddedBrokerProperties=embedded-broker.properties
topicName=VirtualTopic.geoserver
connection=enabled
toggleSlave=true
connection.maxwait=500
group=geoserver-cluster

TODO:

How to load a common configuration file for slave instances, but have some options customized? For example, instanceName for each client.

Materials I used

Gabriel Roldan's 2019 FOSS4G presentation on Geoserver clustering (in Spanish):

https://groldan.github.io/2019_foss4g-ar_taller_geoserver/

The lengthy tutorial pages on this topic from Geosolutions:

https://geoserver.geo-solutions.it/edu/en/clustering/clustering/active/index.html

Gabriel Roldan's thread on the Geoserver mailing list

http://osgeo-org.1560.x6.nabble.com/Geoserver-and-JMS-clustering-td5403395.html

GeoServer Clustering Revisited: Getting Your Docker On

https://vimeo.com/246199046

Trial and error

Some peculiarities

I haven't been able to leverage Docker's built-in routing mesh and load balancing capabilities so far. In order to get clustering working, I resorted to using the "host" networking mode. Maybe I need to change the connection type to the broker on all cluster instances.

Front end proxy/load ballancer.

HAProxy (in my case, deployed on a separate VM) requires manual config, which is not optimal.

Traeffic seems to be the currently widely acceppted and actively developed solution. It is pretty complicated to configure. But, if any services are scaled, it is dynamically reconfigured. Also, I had problems with having a session persist on the Master Geoserver instance.

dockercloud/haproxy is also capable of dynamic configuration. However, it is not being developed actively anymore.

WFAS Web Processing Services:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly