diff --git a/.gitignore b/.gitignore index 7a53ab1..9c9e19d 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,5 @@ .settings _site* +docs/0.4/img/Thumbs.db +docs/0.5/img/Thumbs.db diff --git a/docs/0.5/img/AmazonHome.PNG b/docs/0.5/img/AmazonHome.PNG new file mode 100644 index 0000000..92aca20 Binary files /dev/null and b/docs/0.5/img/AmazonHome.PNG differ diff --git a/docs/0.5/img/EMRNew.PNG b/docs/0.5/img/EMRNew.PNG new file mode 100644 index 0000000..2fd6207 Binary files /dev/null and b/docs/0.5/img/EMRNew.PNG differ diff --git a/docs/0.5/img/HardwareConfigurationMedium.PNG b/docs/0.5/img/HardwareConfigurationMedium.PNG new file mode 100644 index 0000000..fcc675e Binary files /dev/null and b/docs/0.5/img/HardwareConfigurationMedium.PNG differ diff --git a/docs/0.5/img/IrelandSecurityCredentialsMenue.PNG b/docs/0.5/img/IrelandSecurityCredentialsMenue.PNG new file mode 100644 index 0000000..de54277 Binary files /dev/null and b/docs/0.5/img/IrelandSecurityCredentialsMenue.PNG differ diff --git a/docs/0.5/img/KeyPairs.PNG b/docs/0.5/img/KeyPairs.PNG new file mode 100644 index 0000000..968544c Binary files /dev/null and b/docs/0.5/img/KeyPairs.PNG differ diff --git a/docs/0.5/img/SecurityAndAccess.PNG b/docs/0.5/img/SecurityAndAccess.PNG new file mode 100644 index 0000000..b0b5c95 Binary files /dev/null and b/docs/0.5/img/SecurityAndAccess.PNG differ diff --git a/docs/0.5/img/SecurityCredentials.PNG b/docs/0.5/img/SecurityCredentials.PNG new file mode 100644 index 0000000..4dfd47a Binary files /dev/null and b/docs/0.5/img/SecurityCredentials.PNG differ diff --git a/docs/0.5/img/SecurityCredentialsFirst.PNG b/docs/0.5/img/SecurityCredentialsFirst.PNG new file mode 100644 index 0000000..3707228 Binary files /dev/null and b/docs/0.5/img/SecurityCredentialsFirst.PNG differ diff --git a/docs/0.5/img/SoftwareConfiguration.PNG b/docs/0.5/img/SoftwareConfiguration.PNG new file mode 100644 index 0000000..2ccf0e1 Binary files /dev/null and b/docs/0.5/img/SoftwareConfiguration.PNG differ diff --git a/docs/0.5/img/StepCompleted.PNG b/docs/0.5/img/StepCompleted.PNG new file mode 100644 index 0000000..4fb4b7e Binary files /dev/null and b/docs/0.5/img/StepCompleted.PNG differ diff --git a/docs/0.5/img/StepStartStratosphere.png b/docs/0.5/img/StepStartStratosphere.png new file mode 100644 index 0000000..80c908c Binary files /dev/null and b/docs/0.5/img/StepStartStratosphere.png differ diff --git a/docs/0.5/img/StratosphereInterfaceProxy.png b/docs/0.5/img/StratosphereInterfaceProxy.png new file mode 100644 index 0000000..501e10b Binary files /dev/null and b/docs/0.5/img/StratosphereInterfaceProxy.png differ diff --git a/docs/0.5/img/TerminatedWithErrors.PNG b/docs/0.5/img/TerminatedWithErrors.PNG new file mode 100644 index 0000000..8a03e2a Binary files /dev/null and b/docs/0.5/img/TerminatedWithErrors.PNG differ diff --git a/docs/0.5/img/YarnApplication.png b/docs/0.5/img/YarnApplication.png new file mode 100644 index 0000000..f036463 Binary files /dev/null and b/docs/0.5/img/YarnApplication.png differ diff --git a/docs/0.5/index.markdown b/docs/0.5/index.markdown index bb25ff6..5fca442 100644 --- a/docs/0.5/index.markdown +++ b/docs/0.5/index.markdown @@ -5,6 +5,7 @@ links: - { anchor: "jdbc", title: "JDBC Input/Output Format" } - { anchor: "collection_data_source", title: "CollectionDataSource" } - { anchor: "broadcast_variables", title: "Broadcast Variables" } + - { anchor: "emr_tutorial", title: "Stratosphere in EMR" } ---
@@ -64,3 +65,100 @@ public class MyMapper extends MapFunction {
An example of how to use Broadcast Variables in practice can be found in the K-Means example.
+
+
+#### First steps: New to Amazon?
+[](img/AmazonHome.PNG)
+#### Create an EC2 key pair.
+
+ 1. Click on [EC2](https://console.aws.amazon.com/ec2/v2/home#KeyPairs:)
+ 2. Create new EC2 key pair.
+ [](img/KeyPairs.PNG)
+ 3. Save it locally.
+
+ * Key pairs are used to SSH into your instances. Read more about [how to access your instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstances.html)
+ * The access key gives one access to your full account. Keep it save! The access key file is not needed but EMR does not work without having created an access key.
+
+#### Create an Access key
+ 1. Click on Security Credentials in your account management tab in the right upper corner.
+[](img/IrelandSecurityCredentialsMenue.PNG)
+ 2. Continue to your security credentials.
+[](img/SecurityCredentialsFirst.PNG)
+ 3. Click on Access Keys and create a new one.
+[](img/SecurityCredentials.PNG)
+ 4. Save access key locally if you need to use it later. Note: The access key is used internally by the EMR service. So it does not need to be downloaded.
+
+
+#### Creating an Elastic MapReduce Cluster
+1. Click on [ElasticMapreduce](https://console.aws.amazon.com/elasticmapreduce/vnext/home)
+ [](img/AmazonHome.PNG)
+2. Click on create cluster [](img/EMRNew.PNG)
+
+
+#### Step 'Set up cluster':
+1. Chose a name
+2. Chose AMI version with at least Hadoop 2.2.0. AMI 3.0.3 (Hadoop 2.2.0) for example.
+3. Remove all applications which additionally will be installed.
+ * Stratosphere does not need any additional applications installed. It runs on top of Hadoop YARN and Hadoop Distributed File System.
+
+[](img/SoftwareConfiguration.PNG)
+4. Choose number and type of instances.
+ * The Stratosphere JobManger (Stratosphere master) runs on the master instance
+ * Stratosphere TaskManagers (Stratosphere worker/slave) run on core instances.
+
+[](img/HardwareConfigurationMedium.PNG)
+5. Choose Amazon EC2 key pair for SSH access
+[](img/SecurityAndAccess.PNG)
+
+
+#### Step 'Create step to run stratosphere'
+1. Select step 'Custom jar'.
+2. Click 'Configure and add'.
+3. Copy 's3://elasticmapreduce/libs/script-runner/script-runner.jar' into Jar S3 Location
+Copy 's3n://stratosphere-bootstrap/installStart-stratosphere-yarn.sh -n 1 -j 1024 -t 1024' into arguments.
+ * -n is the number of TaskManagers. Should be the same number as the core instance count or less.
+ * -j memory (heapspace) for the JobManager.
+ * -t memory for the TaskManagers.
+
+[](img/StepStartStratosphere.png.PNG)
+4. Save step.
+
+
+#### Create cluster and reuse it
+* Click create cluster to start the Amazon instances and install Stratosphere on them.
+* It will take some time until Stratosphere is started, the completed installation step will indicate that Stratosphere is running.
+* Use {Master public DNS}:9026 to access the YARN interface. To access on port 9026, the EMR master Security Group (under EC2 -> NETWORK & SECURITY-> Security Groups. It is normally called: 'ElasticMapReduce-master') needs to allow access on port 9026. For more information on how to allow access to your instances, [read here](http://docs.aws.amazon.com/gettingstarted/latest/wah/getting-started-security-group.html).
+* The settings can be copied by cloning this cluster. This cluster can be reused as a template for a configured and running Stratosphere cluster.
+
+[](img/StepCompleted.PNG)
+
+
+#### Accessing Your Stratosphere Interface
+* You need to allow access to your master on TCP port 9026 and 9046 from outside
+* First access your YARN interface by typing into your browser '{Master public DNS}:9026'. Replace {Master public DNS} with the actual master public DNS or public IP. You can find it at the top of your cluster summary.
+* Copy the ID of the running Stratosphere application
+
+[](img/YarnApplication.png)
+
+* Type into your browser '{Master public DNS}:9046/proxy/{Stratospher Apllication ID}/index.html'. Replace {Stratospher Apllication ID} with the application ID which can be find on the YARN web interface.
+* You now see the Stratosphere Dashboard.
+
+[](img/StratosphereInterfaceProxy.png)
+
+
+* Stratosphere is running!
+
+
+#### Troubleshoot - What to do when something went wrong?:
+
+##### Termination with errors No active keys found for user account - Create AWS Access Key
+[](img/TerminatedWithErrors.PNG)
+1. Create access key. Described in "new to Amazon?".
+