-
Notifications
You must be signed in to change notification settings - Fork 148
Home
storm-deploy
makes it dead-simple to launch Storm clusters on AWS. It is built on top of jclouds and pallet. After you follow the instructions in this tutorial, you will be able to provision, configure, and install a fully functional Storm cluster with just one command:
lein deploy-storm --start --name mycluster --branch 0.8.3
You can then stop a cluster like this:
lein deploy-storm --stop --name mycluster
The deploy also installs Ganglia which provides fine-grained metrics about resource usage on the cluster.
If you run into any issues, please mail the mailing list.
-
Install leiningen - version 2 only. All you have to do is download this script, place it on your PATH, and make it executable.
-
Clone
storm-deploy
using git (git clone https://github.com/nathanmarz/storm-deploy.git
) -
Run
lein deps
-
Create a
~/.pallet/config.clj
file that looks like the following (and fill in the blanks). This provides the deploy with the credentials necessary to launch and configure instances on AWS.
(defpallet
:services
{
:default {
:blobstore-provider "aws-s3"
:provider "aws-ec2"
:environment {:user {:username "storm" ; this must be "storm"
:private-key-path "$YOUR_PRIVATE_KEY_PATH$"
:public-key-path "$YOUR_PUBLIC_KEY_PATH$"}
:aws-user-id "$YOUR_USER_ID$"}
:identity "$YOUR_AWS_ACCESS_KEY$"
:credential "$YOUR_AWS_ACCESS_KEY_SECRET$"
:jclouds.regions "$YOUR_AWS_REGION$"
}
})
The deploy needs:
-
Public and private key paths for setting up ssh on the nodes. The public key path must be the private key path + ".pub" (this seems to be a bug in pallet). On Linux, you should have a null passphrase on the keys.
a. If you are running a ssh agent (e.g. you are on Mac OS X), then you must ensure that your key is available to the agent. You can make this change permanently using:
ssh-add -K $YOUR_PRIVATE_KEY_PATH$
-
AWS user id: You can find this on your account management page. It's a numeric number with hyphens in it. Optionally take out the hyphens when you put it in the config.
-
Identity: Your AWS access key
-
Credential: Your AWS access key secret
-
Configure your cluster by editing
conf/clusters.yaml
. You can change the number of zookeeper nodes or supervisor nodes by editingzookeeper.count
orsupervisor.count
, respectively. You can launch spot instances for supervisor nodes by settingsupervisor.spot.price
. The other properties should be self-explanatory. -
Regions: The region where the security groups will be defined. Should be same region as the new instances will be started in. Use for example "us-east-1".
-
(optional) Place any custom configurations for your Storm cluster by editing
conf/storm.yaml
. For example, you may change timeouts, register custom serializations, or put in other configurations you want available to your topologies.
Run this command:
lein deploy-storm --start --name mycluster [--branch {branch}] [--commit {commit tag-or-sha1}]
The --name
parameter names your cluster so that you can attach to it or stop it later. If you omit --name
, it will default to "dev". The --branch
parameter indicates which branch of Storm to install. If you omit --branch
, it will install Storm from the master branch. The --commit
parameter allows a release tag or commit SHA1 to be passed. If you omit --commit
you will get the latest commit from the branch you are using. This gives flexibility in picking storm, e.g., to checkout storm tagged release 0.9.0-rc2 from master you would execute the following, passing the branch and tag name:
lein deploy-storm --start --name mycluster --branch master --commit 0.9.0-rc2
or execute the following, passing the branch and SHA1
lein deploy-storm --start --name mycluster --branch master --commit 32098d5b2694434ea43d430a4703fbe51bab268f
If you want the latest version from a specific branch (say 0.8.3) you can execute
lein deploy-storm --start --name mycluster --branch 0.8.3
and if you want the latest commit from master, execute the following
lein deploy-storm --start --name mycluster
The deploy sets up Zookeeper, sets up Nimbus, launches the Storm UI on port 8080 on Nimbus, launches a DRPC server on port 3772 on Nimbus, sets up the Supervisors, sets configurations appropriately, sets the appropriate permissions for the security groups, and attaches your machine to the cluster (see below for more information on attaching).
Simply run:
lein deploy-storm --stop --name mycluster
This will shut down Nimbus, the Supervisors, and the Zookeeper nodes.
Attaching to a cluster configures your storm
client to talk to that particular cluster as well as giving your computer authorization to view the Storm UI. The storm
client is used to start and stop topologies and is described here.
To attach to a cluster, run the following command:
lein deploy-storm --attach --name mycluster
Attaching does the following:
- Writes the location of Nimbus in
~/.storm/storm.yaml
so that thestorm
client knows which cluster to talk to - Authorizes your computer to access the Nimbus daemon's Thrift port (which is used for submitting topologies)
- Authorizes your computer to access the Storm UI on port 8080 on Nimbus
- Authorizes your computer to access Ganglia on port 80 on Nimbus
To get the IP addresses of the cluster nodes, run the following:
lein deploy-storm --ips --name mycluster
You can access Ganglia by navigating to the following address on your web browser:
http://{nimbus ip}/ganglia/index.php