Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
721dcb7
Getting a basic interpreter going. Next step is to hook in the Scaldi…
sriramkrishnan Dec 17, 2015
35fc032
Initial version of a ScaldingInterpreter running in local mode. Need …
sriramkrishnan Dec 17, 2015
e13576f
Now seem to be getting the console out, but only for last line. Will …
sriramkrishnan Dec 17, 2015
b19fda4
More cleanup - flushing output stream. Still can't seem to get the Sc…
sriramkrishnan Dec 18, 2015
1ffbb3b
Fixing output of stdout from console
sriramkrishnan Dec 18, 2015
d3916b7
Adding modified version of ScaldingILoop for grabbing Console output …
sriramkrishnan Dec 18, 2015
36a2dac
Added a link to the scalding code where the ILoop was lifted from.
sriramkrishnan Dec 18, 2015
8944b0c
Merge remote-tracking branch 'upstream/master' into scalding
sriramkrishnan Dec 21, 2015
c27ec48
Formatting, license
sriramkrishnan Dec 21, 2015
368dc04
Cleaning up imports, comments, etc
sriramkrishnan Dec 21, 2015
7ec2941
More code cleanup
sriramkrishnan Dec 21, 2015
7a9ceeb
Adding some tests for the Scalding interpreter
sriramkrishnan Dec 22, 2015
5fd1ae4
Address comments on PR. Merge remote-tracking branch 'upstream/master…
sriramkrishnan Dec 25, 2015
91b0692
adding ScaldingInterpreter
sriramkrishnan Dec 25, 2015
8004b39
Fixing a typo
sriramkrishnan Dec 25, 2015
d4cf308
Adding docs for the Scalding interpreter
sriramkrishnan Dec 25, 2015
5c8056c
Making the Scalding scala jars same as the Spark ones for consistency
sriramkrishnan Dec 28, 2015
083f059
Trimming deps down from hadoop-client to just hadoop-common. Scalding…
sriramkrishnan Dec 28, 2015
dd8a4c8
Adding Scalding licenses
sriramkrishnan Dec 28, 2015
460658a
Adding Cascading dependencies
sriramkrishnan Dec 28, 2015
6019cc8
More licenses. Only remaining ones are the dependencies of hadoop-com…
sriramkrishnan Dec 28, 2015
8be8d22
Went thru and added all licenses I could find
sriramkrishnan Dec 28, 2015
aaae5d1
Moving licenses to the right location
sriramkrishnan Dec 28, 2015
dd0bb9a
Changing licenses to text format
sriramkrishnan Dec 28, 2015
9a7d733
Moved tukanni license to a separate section and added license
sriramkrishnan Dec 28, 2015
b30725f
Making the Scalding interpreter optional as part of a new -Pscalding …
sriramkrishnan Dec 31, 2015
bc31d1e
Getting rid of added licenses
sriramkrishnan Dec 31, 2015
006500d
Reverting all commits to LICENSE to be back to master
sriramkrishnan Dec 31, 2015
8eec3c2
Updating docs to include the -Pscalding profile for Scalding
sriramkrishnan Dec 31, 2015
1ad405b
Adding newline to remove redundant change in PR
sriramkrishnan Dec 31, 2015
ffa698b
Whitespace cleanup
sriramkrishnan Dec 31, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,11 @@ mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests
mvn clean package -Dignite.version=1.1.0-incubating -DskipTests
```

#### Scalding Interpreter

```
mvn clean package -Pscalding -DskipTests
```

### Configure
If you wish to configure Zeppelin option (like port number), configure the following files:
Expand Down
2 changes: 1 addition & 1 deletion conf/zeppelin-site.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@

<property>
<name>zeppelin.interpreters</name>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter</value>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter</value>
<description>Comma separated interpreter configurations. First interpreter become a default</description>
</property>

Expand Down
1 change: 1 addition & 0 deletions docs/_includes/themes/zeppelin/_navigation.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
<li><a href="{{BASE_PATH}}/interpreter/lens.html">Lens</a></li>
<li><a href="{{BASE_PATH}}/interpreter/markdown.html">Markdown</a></li>
<li><a href="{{BASE_PATH}}/interpreter/postgresql.html">Postgresql, hawq</a></li>
<li><a href="{{BASE_PATH}}/interpreter/scalding.html">Scalding</a></li>
<li><a href="{{BASE_PATH}}/pleasecontribute.html">Shell</a></li>
<li><a href="{{BASE_PATH}}/interpreter/spark.html">Spark</a></li>
<li><a href="{{BASE_PATH}}/pleasecontribute.html">Tajo</a></li>
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ limitations under the License.
* [lens](./interpreter/lens.html)
* [md](./interpreter/markdown.html)
* [postgresql, hawq](./interpreter/postgresql.html)
* [scalding](./interpreter/scalding.html)
* [sh](./pleasecontribute.html)
* [spark](./interpreter/spark.html)
* [tajo](./pleasecontribute.html)
Expand Down
78 changes: 78 additions & 0 deletions docs/interpreter/scalding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
layout: page
title: "Scalding Interpreter"
description: ""
group: manual
---
{% include JB/setup %}


## Scalding Interpreter for Apache Zeppelin
[Scalding](https://github.com/twitter/scalding) is an open source Scala library for writing MapReduce jobs.

### Building the Scalding Interpreter
You have to first build the Scalding interpreter by enable the **scalding** profile as follows:

```
mvn clean package -Pscalding -DskipTests
```

### Enabling the Scalding Interpreter

In a notebook, to enable the **Scalding** interpreter, click on the **Gear** icon,select **Scalding**, and hit **Save**.

<center>
![Interpreter Binding](../assets/themes/zeppelin/img/docs-img/scalding-InterpreterBinding.png)

![Interpreter Selection](../assets/themes/zeppelin/img/docs-img/scalding-InterpreterSelection.png)
</center>

### Configuring the Interpreter
Zeppelin comes with a pre-configured Scalding interpreter in local mode, so you do not need to install anything.

### Testing the Interpreter

In example, by using the [Alice in Wonderland](https://gist.github.com/johnynek/a47699caa62f4f38a3e2) tutorial, we will count words (of course!), and plot a graph of the top 10 words in the book.

```
%scalding

import scala.io.Source

// Get the Alice in Wonderland book from gutenberg.org:
val alice = Source.fromURL("http://www.gutenberg.org/files/11/11.txt").getLines
val aliceLineNum = alice.zipWithIndex.toList
val alicePipe = TypedPipe.from(aliceLineNum)

// Now get a list of words for the book:
val aliceWords = alicePipe.flatMap { case (text, _) => text.split("\\s+").toList }

// Now lets add a count for each word:
val aliceWithCount = aliceWords.filterNot(_.equals("")).map { word => (word, 1L) }

// let's sum them for each word:
val wordCount = aliceWithCount.group.sum

print ("Here are the top 10 words\n")
val top10 = wordCount
.groupAll
.sortBy { case (word, count) => -count }
.take(10)
top10.dump

```
```
%scalding

val table = "words\t count\n" + top10.toIterator.map{case (k, (word, count)) => s"$word\t$count"}.mkString("\n")
print("%table " + table)

```

If you click on the icon for the pie chart, you should be able to see a chart like this:
![Scalding - Pie - Chart](../assets/themes/zeppelin/img/docs-img/scalding-pie.png)

### Current Status & Future Work
The current implementation of the Scalding interpreter does not support canceling jobs, or fine-grained progress updates.

The pre-configured Scalding interpreter only supports Scalding in local mode. Hadoop mode for Scalding is currently unsupported, and will be future work (contributions welcome!).
7 changes: 7 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -627,6 +627,13 @@
</modules>
</profile>

<profile>
<id>scalding</id>
<modules>
<module>scalding</module>
</modules>
</profile>

<profile>
<id>build-distr</id>
<activation>
Expand Down
202 changes: 202 additions & 0 deletions scalding/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<artifactId>zeppelin</artifactId>
<groupId>org.apache.zeppelin</groupId>
<version>0.6.0-incubating-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>

<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-scalding</artifactId>
<packaging>jar</packaging>
<version>0.6.0-incubating-SNAPSHOT</version>
<name>Zeppelin: Scalding interpreter</name>
<url>http://zeppelin.incubator.apache.org</url>

<properties>
<scala.version>2.10.4</scala.version>
<hadoop.version>2.3.0</hadoop.version>
<scalding.version>0.15.1-RC13</scalding.version>
</properties>

<repositories>
<repository>
<id>conjars</id>
<name>Concurrent Maven Repo</name>
<url>http://conjars.org/repo</url>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to avoid, or at least disable this maven repo by default?
We need to avoid publish official binary package built with 3rd party maven repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the Cascading jars are actually available from http://conjars.org/. I am not sure how I can disable or avoid this.

</repository>
</repositories>

<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>zeppelin-interpreter</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-exec</artifactId>
<version>1.3</version>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>com.twitter</groupId>
<artifactId>scalding-core_2.10</artifactId>
<version>${scalding.version}</version>
</dependency>

<dependency>
<groupId>com.twitter</groupId>
<artifactId>scalding-repl_2.10</artifactId>
<version>${scalding.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>

<!-- Scalding REPL needs org.apache.hadoop.conf.Configuration even in local mode -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.7</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>

<plugin>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.3.1</version>
<executions>
<execution>
<id>enforce</id>
<phase>none</phase>
</execution>
</executions>
</plugin>

<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.8</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/scalding</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
</configuration>
</execution>
<execution>
<id>copy-artifact</id>
<phase>package</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/scalding</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
<artifactItems>
<artifactItem>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<type>${project.packaging}</type>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<!-- Plugin to compile Scala code -->
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
Loading