Clustering

Note: clustering support is available since

Titanoboa Cluster is Masterless & Can Scale to Hundreds of Nodes

Since Titanoboa is based on functional principles such as immutability, it is only natural that its cluster is master-less. Due to this design choice you can easily add or remove nodes during runtime with no configuration change and zero downtime. You can scale to hundreds of nodes very easily.

Also since Titanoboa server is basically just a jar you can run the cluster nodes anywhere: on bare metal, VMs, Docker or K8s.

See for example how I added 20 extra nodes in few seconds during my recent talk at re:Clojure conference:

Use Titanoboa to spawn and destroy clusters and/or nodes

In video above you can see that it is possible to alter cluster just by running Titanoboa workflow within that cluster. This way you can add nodes - see e.g. here. In similar way you can even build a separate cluster using Titanoboa workflow.

Titanoboa Cluster relies on a Message Broker

In clustered environment, titanoboa nodes use a message broker to communicate. Instead of re-inventing the wheel, titanoboa simply relies on broker's ability to deliver the message - so different types of brokers and their different settings may be best fit for different use cases: use non-persisted in-memory queues for best performance. Or use persisted queues on a highly available broker to ensure failover and high availability. In similar way, if you use a broker with unlimited scalability that spans across multiple availability zones (such as AWS SQS or Kafka) your titanoboa cluster can then have unlimited scalability and can span multiple availability zones.

cluster

Shared File System

Using shared file system is not mandatory though it makes Titanoboa Cluster easier to configure and to control - on the shared file system (whether it is a physical drive, S3, EFS, HDFS etc.) can reside one or more of the following:

workflow repository
job folders (each workflow job can optionally create a unique job folder to make share of files/data between workflow steps easier)
server configuration file
external dependencies file

Titanoboa's Clustering Concepts

Since I always love to steal ideas from sci-fi movies and various areas of physics, T-boa's Cluster is based on following two principles:

alien movie poster

In Space No One Can Hear You Scream
When You Are Not Looking The Universe Does Not Exist

Cluster State Broadcast (In Space No One Can Hear You Scream)

For each Titanoboa Cluster, there is a "heartbeat" exchange set up in the underlying message broker. Each node periodically sends a broadcast of its internal state to this exchange. These broadcasts may be received by some consumer or very well may be not. There is no guarantee.

Cluster Aware Nodes (When You Are Not Looking The Universe Does Not Exist)

Any cluster node can become "Cluster Aware" and subscribe to the State Broadcast Exchange. It will then start to receive the state broadcasts from the other nodes and based on these snapshots (or puzzle pieces) of states it can start building its own understanding of what is happening in the entire cluster.

By default nodes are not cluster aware, but any can become one - e.g. when you connect to GUI on a particular node and open it in the browser the node will become cluster aware and you can use it as a "master" node.

Also each node contains a simple proxy module, so you can use its very API to send commands to other nodes as well.

The Speed Of Light Limit Applies

As with Hubble, when you look at stars you see only their ancient snapshots as the light has been traveling for billions of years - the same applies when you open a GUI and try to make sense of what is happening in the cluster.

Since titanoboa servers are not serializing over a single Database but rather leverage some particular Message Broker (e.g. Rabbit MQ, Kafka, JMS or SQS) they are built for massive scale. You can run hundreds of nodes easily. The underlying MQ broker is transactional, so you can rest assured that the workflow jobs will get processed, failovers will get handled etc.

But the scale will always have a trade-off: State of your cluster in any given point in time (e.g. jobs running, their state etc.) will always be limited literally by the speed of light (i.e. latency; and also because node broadcast their states only periodically).

So when watching workflow jobs in your GUI always be mindful that there might be latency in play.

Sample Cluster Configuration (using Rabbit MQ)

Server Config File

(in-ns 'titanoboa.server)
(log/info "Hello, I am RMQ server-config and I am being loaded...")
;;RMQ setup:
(alter-var-root #'server-config
                (constantly {:systems-catalogue {:core  {:system-def #'titanoboa.system.rabbitmq/distributed-core-system
                                                         :worker-def #'titanoboa.system.rabbitmq/distributed-worker-system
                                                         :autostart  true
                                                         :worker-count 2
                                                         :restart-workers-on-error true}
                                                 :archival-system {:system-def #'titanoboa.system.rabbitmq/archival-system
                                                                   :autostart  true}}
                             :jobs-repo-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/repo/"
                             :steps-repo-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/step-repo/"
                             :job-folder-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/job-folders/"
                             :log-file-path   "titanoboa.log"
                             :enable-cluster   true
                             :heartbeat-exchange-name "heartbeat"
                             :cmd-exchange-name "command"
                             :jobs-cmd-exchange-name "jobs-command"
                             :cluster-eviction-interval (* 1000 30)
                             :cluster-eviction-age (* 1000 60)
                             :cluster-state-broadcast #'titanoboa.system.rabbitmq/cluster-broadcast-system
                             :cluster-state-subs #'titanoboa.system.rabbitmq/cluster-subscription-system
                             :cluster-state-fn #'titanoboa.cluster/process-broadcast
                             :archive-ds-ks [:archival-system :system :db-pool]
                             :systems-config {:core {:new-jobs-queue "titanoboa-bus-queue-0"
                                                     :jobs-queue "titanoboa-bus-queue-1"
                                                     :archival-queue "titanoboa-bus-archive"
                                                     :eviction-age (* 1000 60 5)
                                                     :eviction-interval (* 1000 30)}
                                              :archival-system {:jdbc-url "jdbc:postgresql://localhost:5432/mydb?currentSchema=titanoboa"
                                                                :archival-queue "titanoboa-bus-archive"}}}))

Ext Dependencies File

{:coordinates [[org.postgresql/postgresql "9.4.1208"]
               [com.novemberain/langohr
                "3.5.0"
                :exclusions
                [cheshire]]],
 :require [[titanoboa.system.rabbitmq]
           [titanoboa.database.postgres]
           [titanoboa.cluster]],
 :repositories {"clojars" "https://clojars.org/repo",
                "central" "https://repo1.maven.org/maven2/"},
 :import [[org.postgresql.Driver]]}

Start Script

#!/bin/bash 
java -Dboa.server.port=3001 -Dboa.server.config.path=/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/boa_server_config_rmq.clj -Dboa.server.dependencies.path=/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/ext-dependencies.clj -cp "./build/titanoboa.jar:./lib/*" titanoboa.server

Testing Cluster Locally

Note the -Dboa.server.port property in the script above - this way you can easily test the cluster setup locally, changing port number for each instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly