Skip to content

Running Faunus on Amazon EC2

okram edited this page Jul 20, 2012 · 34 revisions

Amazon EC2 and Whirr make it easy to set up a Hadoop compute cluster that can then be utilized by Faunus. This section of documentation will explain how to set up a Hadoop cluster on Amazon EC2 and execute Faunus scripts.

Setting Up Whirr

Apache Whirr is a set of libraries for running cloud services. Whirr provides a cloud-neutral way to run services (you don’t have to worry about the idiosyncrasies of each provider), a common service API (the details of provisioning are particular to the service), and smart defaults for services (you can get a properly configured system running quickly, while still being able to override settings as needed). You can also use Whirr as a command line tool for deploying clusters. — The Apache Whirr Homepage

Faunus provides a Whirr recipe for loading up a Hadoop cluster that is properly versioned for the Hadoop currently used by Faunus. This recipe is reproduced below. Please see the Whirr Quick Start for more information the parameters and how to set up an Amazon EC2 account.

whirr.cluster-name=faunuscluster
whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,3 hadoop-datanode+hadoop-tasktracker
whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
whirr.hadoop.version=1.0.3

Once your Amazon EC2 keys and ssh key files have been properly set up, a Hadoop cluster can be created. The recipe above creates a 4 node cluster.

faunus$ whirr launch-cluster --config bin/whirr.properties
Bootstrapping cluster
Configuring template
Configuring template
Starting 3 node(s) with roles [hadoop-datanode, hadoop-tasktracker]
Starting 1 node(s) with roles [hadoop-namenode, hadoop-jobtracker]
...

When logging into the Amazon EC2 Console, the cluster machines are visible. After running the Hadoop proxy . ~/.whirr/faunuscluster/hadoop-proxy.sh, the Hadoop cluster is ready to be submitted jobs. A simply check to ensure that the Hadoop cluster is working is to see if HDFS is available.

faunus$ export HADOOP_CONF_DIR=~/.whirr/faunuscluster
faunus$ hadoop fs -ls
Clone this wiki locally