-
Notifications
You must be signed in to change notification settings - Fork 663
Docker
Note: this page is out-of-date as it refers to the now-retired Ruby version of Synthea. The Java version has not been "Docker-ized" nor are there plans to do so at this time. This information is left for reference
Synthea instances can be run in parallel using Docker, an open-source software containerization platform. Docker wraps synthea in a complete, lightweight filesystem that contains everything needed to run Synthea: code, system tools, libraries, etc. - everything that we install in our own production Synthea environments.
- Install and run the Docker Engine. Variants exist for OS X, Linux, and Windows.
- If you're new to Docker, check out Docker's helpful Getting Started Guide.
- Pull the latest Synthea images from The Docker Hub.
- Clone this repository and move to the directory you cloned it into.
There are several variants of Synthea available on the Docker Hub:
Variant | When to use it |
---|---|
synthetichealth/synthea:latest |
Always. This version uses Matz's standard C Ruby and is fast and efficient when running on Docker. Allocating additional CPUs and memory to your Docker Engine will speed things up even more. |
synthetichealth/synthea:jruby |
For posterity's sake. When multithreaded, Synthea runs 30% faster on bare metal by leveraging the concurrent power of the JVM. However, in Docker we've observed the opposite: Synthea is orders of magnitude slower on jRuby. ¯\_(ツ)_/¯ |
To pull a Synthea image from the Docker Hub:
$ docker pull <synthea_variant>
For example:
$ docker pull synthetichealth/synthea:latest
This is important: Before running Synthea with Docker check the following settings in synthea.yml
:
docker:
dockerized: true
location: '/mnt/synthea'
This configures Synthea to run in a Docker environment. Specifically, this does the following:
-
Instructs Mongoid (the mongodb driver used with the
health-data-standards
gem) to connect to a mongo database running atmongo:27017
instead of using a database atlocalhost:27017
. -
Instructs the exporter to export all records (ccda, fhir, html) to a persistent storage container with a mounted volume at
/mnt/synthea
.
Synthea runs in a multi-container environment managed using docker-compose
. Synthea uses the following containers:
Container | Purpose |
---|---|
synthea |
The main container that contains Synthea's source, configuration, and executes a Synthea run. |
mongo |
A standard mongodb container used by the CCDA exporter and health-data-standards gem. |
synthea_output |
A persistent storage volume mounted to a standard Ubuntu container. Synthea writes its output to this persistent volume at /mnt/synthea . The synthea container can connect to the database(s) in this container at mongo:27017 . |
inspector |
A standard ubuntu instance that can be used to view and manipulate the output in /mnt/synthea . The persistent storage volume can be attached to this container using --volumes-from . |
See Configuration. There are important settings that must be checked before attempting to run Synthea in Docker.
$ docker create -v /mnt/synthea --name synthea_output ubuntu
This creates a data volume container using a standard Ubuntu image and names it "synthea_output" for easy reference. The path to the volume (/mnt/synthea/
) is the same in both the synthea_output
container and the synthea
container. This location should also match docker:location
in synthea.yml
.
$ docker-compose -f docker/ruby/docker-compose.yml run synthea bundle exec rake synthea:sequential
This tells Docker to:
- Create the
mongo
andsynthea
services specified in thedocker-compose.yml
file - Link them together in an isolated network
- Then pass the command
bundle exec rake synthea:sequential
to the runningsynthea
container
The first time you call docker-compose run
Docker will pull the required images from the Docker Hub and build them locally. The setup required for this first run will take a little while to execute, so sit tight. Subsequent runs will reuse these images and will be fast.
After building, Synthea will automatically run and write all output to the persistent storage volume at /mnt/synthea
.
Create and run the standalone inspector
container to mount and view the synthea_output
volume:
$ docker run -it --name inspector --volumes-from synthea_output ubuntu
This creates a regular ubuntu instance from the base ubuntu
image and mounts the volume. The -it
arguments run the Docker container in interactive mode (-i
) using a pseudo-TTY terminal (-t
).
You should then be able to view the output from the latest Synthea run:
$ cd /mnt/synthea && ls
CCDA fhir html
Since you only need to create the inspector
container once, you can reuse it to inspect the output volume:
$ docker start -i inspector
$ cd /mnt/synthea && ls
Docker already knows to connect the volume and use TTY. The -i
tells Docker to use interactive mode.
Docker images are tagged with :tags
to indicate the specific characteristics and configuration of the image. The :latest
build of Synthea on the Docker Hub uses the latest source code and the defaults in synthea.yml
. However, docker:dockerized
should always be set to true
since we're using Synthea in a Docker environment.
After making changes to Synthea's source code or configuration, use docker-compose build
to rebuild the Synthea image:
$ docker-compose -f docker/ruby/docker-compose.yml build synthea
After successfully building the new image, you can optionally push it to the Docker Hub:
$ docker push synthetichealth/synthea:latest
This process should look familar to git
users. You will need an account on the Docker Hub and permission to push images to the SyntheticHealth organization.
Synthea is only as fast as the Docker environment it runs in and that environment's host. However, within it's container Synthea will take advantage of all of the resources that are available. By default Docker typically allocates 2 CPUs and 2GB of RAM to running containers. If you need faster performance try adjusting your Docker preferences to allocate more CPUs and additional memory to running containers.