DockerDoop provides a solution for running HDP on Docker. It was designed to streamline HDP training, administrative testing, and certain development tasks.
This solution is an intermediate step between the HDP Sandbox and multi-machine HDP installations for dev/test workloads. With DockerDoop, multi-node HDP clusters can be installed quickly and easily on a single machine with minimal resource requirements.
8GB RAM and 50GB disk is recommended for the multinode sample configuration. 6GB or less RAM may be viable for smaller clusters.
- In a local VM
- Sandboxed, "Cluster in a box"
- On a shared machine
- Collaborative clusters
-
CentOS 7 (Other Linux operating systems should work as well)
-
Docker 1.9+
-https://docs.docker.com/engine/installation/linux/centos/ -
Configure for External Network Access to Nodes
- Add multiple IPs to Host OS (N+1 for N nodes)
- Use VMWare/VirtualBox/etc. to add network adaptors to the VM
- For example, the threeNode-sample configuration requires 4 IPs: 1 for host, 3 for the cluster.
- Limit SSH to a Single IP. By default, SSH listens on 0.0.0.0
- Edit sshd_config
vi /etc/ssh/sshd_config
- Add the following line with the IP address for the host OS:
ListenAddress <IP Address>
- Restart sshd
service sshd restart
- Edit sshd_config
- Enable IPv4 forwarding
sysctl -w net.ipv4.ip_forward=1
-
Expand Node Disk Size Beyond 10GB Default
-
Create Docker configuration file
mkdir /etc/systemd/system/docker.service.d
vi /etc/systemd/system/docker.service.d/docker.conf
-
Add the following content to the file. 20GB is recommended, but not required.
[Service] ExecStart= ExecStart=/usr/bin/docker daemon --storage-opt dm.basesize=20G
-
Reload and restart Docker Daemon
systemctl daemon-reload
systemctl restart docker
-
An .ini file is required to define hostnames and a cluster name. An external IP list can be defined to allow external access to nodes.
HDP can be installed manually or through Ambari Blueprints. Example blueprint files are provided in the blueprints folder.
- clusterName (required)
- ambariServerHostName (required)
- hostNames (required)
- externalIPs (required for external access to nodes)
- blueprintName (required to use blueprint script)
- blueprintFile (required to use blueprint script)
- blueprintHostMappingFile (required to use blueprint script)
3 Docker images need to be built:
-
Parent Image: This container does basic preparation needed on all HDP cluster nodes - installing basic utility packages, setting environment variables, etc.
-
Ambari Server Image: This container installs and runs the Ambari Server and Ambari Agent. This container also builds a local yum repository mirror for the HDP packages (base CentOS packages are not currently mirrored), so creating this image will take some time.
-
Ambari Agent Image: This container runs an Ambari Agent process, but no Ambari Server. For multi-node cluster deployments, all nodes except the node designated as the Ambari Server node will be based on this image.
./scripts/build_images.sh
Once the Docker images are built, the cluster nodes can be created and Ambari can be started.
./scripts/createCluster.sh threeNode-sample.ini
Once a cluster is set up, the Ambari UI or blueprints can be used to install the cluster.
To use blueprints, add the blueprint fields to the ini file to use the included script. Sample blueprints and host mapping files are provided in the blueprints directory.
/scripts/installCluster.sh threeNode-sample.ini
This project includes several additional scripts:
-
Install Status: Monitor the status of a blueprint install. Note: the Ambari UI can also be used.
./scripts/installStatus.sh threeNode-sample.ini
-
Stats: Monitor the resource utilization of a cluster or all clusters on a machine using the built in
docker status
command./scripts/stats.sh threeNode-sample.ini
./scripts/stats.sh
-
Create Node: Create a new node. Note: does not install services
./scripts/createNode.sh HWorker4 HMaster 172.16.96.140 HCluster
-
Export Blueprint: Export blueprint from Ambari
./scripts/exportBlueprint.sh threeNode-sample.ini
-
Start/Stop Cluster: Start and stop the docker containers running the cluster. Similar to powering on/off machines.
./scripts/stopCluster.sh threeNode-sample.ini
./scripts/startCluster.sh threeNode-sample.ini
-
Destroy Cluster: Completely remove all nodes from the cluster. Not reversible!
./scripts/destroyCluster.sh threeNode-sample.ini
- The Ambari Server node hosts a local repository to accelerate the install process.
- Multiple clusters can reside on the same machine as long as the cluster names (and external IPs) are unique. The docker container names have ".{clusterName}" appended.
- Use the stop and start functionality to keep multiple cluster versions and/or configurations.
- The containers are configured to autostart if they were not manually stopped. Run this command to autostart the docker service.
chkconfig docker on
- Additional sample blueprints
- HA
- Kerberos
- Standalone use cases (streaming, data science, batch, etc.)
- Optimize settings to reduce footprint and improve performance