-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add a (local mode) Scalding Interpreter to Zeppelin #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
721dcb7
Getting a basic interpreter going. Next step is to hook in the Scaldi…
sriramkrishnan 35fc032
Initial version of a ScaldingInterpreter running in local mode. Need …
sriramkrishnan e13576f
Now seem to be getting the console out, but only for last line. Will …
sriramkrishnan b19fda4
More cleanup - flushing output stream. Still can't seem to get the Sc…
sriramkrishnan 1ffbb3b
Fixing output of stdout from console
sriramkrishnan d3916b7
Adding modified version of ScaldingILoop for grabbing Console output …
sriramkrishnan 36a2dac
Added a link to the scalding code where the ILoop was lifted from.
sriramkrishnan 8944b0c
Merge remote-tracking branch 'upstream/master' into scalding
sriramkrishnan c27ec48
Formatting, license
sriramkrishnan 368dc04
Cleaning up imports, comments, etc
sriramkrishnan 7ec2941
More code cleanup
sriramkrishnan 7a9ceeb
Adding some tests for the Scalding interpreter
sriramkrishnan 5fd1ae4
Address comments on PR. Merge remote-tracking branch 'upstream/master…
sriramkrishnan 91b0692
adding ScaldingInterpreter
sriramkrishnan 8004b39
Fixing a typo
sriramkrishnan d4cf308
Adding docs for the Scalding interpreter
sriramkrishnan 5c8056c
Making the Scalding scala jars same as the Spark ones for consistency
sriramkrishnan 083f059
Trimming deps down from hadoop-client to just hadoop-common. Scalding…
sriramkrishnan dd8a4c8
Adding Scalding licenses
sriramkrishnan 460658a
Adding Cascading dependencies
sriramkrishnan 6019cc8
More licenses. Only remaining ones are the dependencies of hadoop-com…
sriramkrishnan 8be8d22
Went thru and added all licenses I could find
sriramkrishnan aaae5d1
Moving licenses to the right location
sriramkrishnan dd0bb9a
Changing licenses to text format
sriramkrishnan 9a7d733
Moved tukanni license to a separate section and added license
sriramkrishnan b30725f
Making the Scalding interpreter optional as part of a new -Pscalding …
sriramkrishnan bc31d1e
Getting rid of added licenses
sriramkrishnan 006500d
Reverting all commits to LICENSE to be back to master
sriramkrishnan 8eec3c2
Updating docs to include the -Pscalding profile for Scalding
sriramkrishnan 1ad405b
Adding newline to remove redundant change in PR
sriramkrishnan ffa698b
Whitespace cleanup
sriramkrishnan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+12.7 KB
docs/assets/themes/zeppelin/img/docs-img/scalding-InterpreterBinding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+24.3 KB
docs/assets/themes/zeppelin/img/docs-img/scalding-InterpreterSelection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| --- | ||
| layout: page | ||
| title: "Scalding Interpreter" | ||
| description: "" | ||
| group: manual | ||
| --- | ||
| {% include JB/setup %} | ||
|
|
||
|
|
||
| ## Scalding Interpreter for Apache Zeppelin | ||
| [Scalding](https://github.com/twitter/scalding) is an open source Scala library for writing MapReduce jobs. | ||
|
|
||
| ### Building the Scalding Interpreter | ||
| You have to first build the Scalding interpreter by enable the **scalding** profile as follows: | ||
|
|
||
| ``` | ||
| mvn clean package -Pscalding -DskipTests | ||
| ``` | ||
|
|
||
| ### Enabling the Scalding Interpreter | ||
|
|
||
| In a notebook, to enable the **Scalding** interpreter, click on the **Gear** icon,select **Scalding**, and hit **Save**. | ||
|
|
||
| <center> | ||
|  | ||
|
|
||
|  | ||
| </center> | ||
|
|
||
| ### Configuring the Interpreter | ||
| Zeppelin comes with a pre-configured Scalding interpreter in local mode, so you do not need to install anything. | ||
|
|
||
| ### Testing the Interpreter | ||
|
|
||
| In example, by using the [Alice in Wonderland](https://gist.github.com/johnynek/a47699caa62f4f38a3e2) tutorial, we will count words (of course!), and plot a graph of the top 10 words in the book. | ||
|
|
||
| ``` | ||
| %scalding | ||
|
|
||
| import scala.io.Source | ||
|
|
||
| // Get the Alice in Wonderland book from gutenberg.org: | ||
| val alice = Source.fromURL("http://www.gutenberg.org/files/11/11.txt").getLines | ||
| val aliceLineNum = alice.zipWithIndex.toList | ||
| val alicePipe = TypedPipe.from(aliceLineNum) | ||
|
|
||
| // Now get a list of words for the book: | ||
| val aliceWords = alicePipe.flatMap { case (text, _) => text.split("\\s+").toList } | ||
|
|
||
| // Now lets add a count for each word: | ||
| val aliceWithCount = aliceWords.filterNot(_.equals("")).map { word => (word, 1L) } | ||
|
|
||
| // let's sum them for each word: | ||
| val wordCount = aliceWithCount.group.sum | ||
|
|
||
| print ("Here are the top 10 words\n") | ||
| val top10 = wordCount | ||
| .groupAll | ||
| .sortBy { case (word, count) => -count } | ||
| .take(10) | ||
| top10.dump | ||
|
|
||
| ``` | ||
| ``` | ||
| %scalding | ||
|
|
||
| val table = "words\t count\n" + top10.toIterator.map{case (k, (word, count)) => s"$word\t$count"}.mkString("\n") | ||
| print("%table " + table) | ||
|
|
||
| ``` | ||
|
|
||
| If you click on the icon for the pie chart, you should be able to see a chart like this: | ||
|  | ||
|
|
||
| ### Current Status & Future Work | ||
| The current implementation of the Scalding interpreter does not support canceling jobs, or fine-grained progress updates. | ||
|
|
||
| The pre-configured Scalding interpreter only supports Scalding in local mode. Hadoop mode for Scalding is currently unsupported, and will be future work (contributions welcome!). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <!-- | ||
| ~ Licensed to the Apache Software Foundation (ASF) under one or more | ||
| ~ contributor license agreements. See the NOTICE file distributed with | ||
| ~ this work for additional information regarding copyright ownership. | ||
| ~ The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| ~ (the "License"); you may not use this file except in compliance with | ||
| ~ the License. You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, software | ||
| ~ distributed under the License is distributed on an "AS IS" BASIS, | ||
| ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| ~ See the License for the specific language governing permissions and | ||
| ~ limitations under the License. | ||
| --> | ||
|
|
||
| <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
| <parent> | ||
| <artifactId>zeppelin</artifactId> | ||
| <groupId>org.apache.zeppelin</groupId> | ||
| <version>0.6.0-incubating-SNAPSHOT</version> | ||
| <relativePath>..</relativePath> | ||
| </parent> | ||
|
|
||
| <groupId>org.apache.zeppelin</groupId> | ||
| <artifactId>zeppelin-scalding</artifactId> | ||
| <packaging>jar</packaging> | ||
| <version>0.6.0-incubating-SNAPSHOT</version> | ||
| <name>Zeppelin: Scalding interpreter</name> | ||
| <url>http://zeppelin.incubator.apache.org</url> | ||
|
|
||
| <properties> | ||
| <scala.version>2.10.4</scala.version> | ||
| <hadoop.version>2.3.0</hadoop.version> | ||
| <scalding.version>0.15.1-RC13</scalding.version> | ||
| </properties> | ||
|
|
||
| <repositories> | ||
| <repository> | ||
| <id>conjars</id> | ||
| <name>Concurrent Maven Repo</name> | ||
| <url>http://conjars.org/repo</url> | ||
| </repository> | ||
| </repositories> | ||
|
|
||
| <dependencies> | ||
| <dependency> | ||
| <groupId>${project.groupId}</groupId> | ||
| <artifactId>zeppelin-interpreter</artifactId> | ||
| <version>${project.version}</version> | ||
| <scope>provided</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-exec</artifactId> | ||
| <version>1.3</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>junit</groupId> | ||
| <artifactId>junit</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>com.twitter</groupId> | ||
| <artifactId>scalding-core_2.10</artifactId> | ||
| <version>${scalding.version}</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>com.twitter</groupId> | ||
| <artifactId>scalding-repl_2.10</artifactId> | ||
| <version>${scalding.version}</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.scala-lang</groupId> | ||
| <artifactId>scala-library</artifactId> | ||
| <version>${scala.version}</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.scala-lang</groupId> | ||
| <artifactId>scala-compiler</artifactId> | ||
| <version>${scala.version}</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.scala-lang</groupId> | ||
| <artifactId>scala-reflect</artifactId> | ||
| <version>${scala.version}</version> | ||
| </dependency> | ||
|
|
||
| <!-- Scalding REPL needs org.apache.hadoop.conf.Configuration even in local mode --> | ||
| <dependency> | ||
| <groupId>org.apache.hadoop</groupId> | ||
| <artifactId>hadoop-common</artifactId> | ||
| <version>${hadoop.version}</version> | ||
| </dependency> | ||
| </dependencies> | ||
|
|
||
| <build> | ||
| <plugins> | ||
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-deploy-plugin</artifactId> | ||
| <version>2.7</version> | ||
| <configuration> | ||
| <skip>true</skip> | ||
| </configuration> | ||
| </plugin> | ||
|
|
||
| <plugin> | ||
| <artifactId>maven-enforcer-plugin</artifactId> | ||
| <version>1.3.1</version> | ||
| <executions> | ||
| <execution> | ||
| <id>enforce</id> | ||
| <phase>none</phase> | ||
| </execution> | ||
| </executions> | ||
| </plugin> | ||
|
|
||
| <plugin> | ||
| <artifactId>maven-dependency-plugin</artifactId> | ||
| <version>2.8</version> | ||
| <executions> | ||
| <execution> | ||
| <id>copy-dependencies</id> | ||
| <phase>package</phase> | ||
| <goals> | ||
| <goal>copy-dependencies</goal> | ||
| </goals> | ||
| <configuration> | ||
| <outputDirectory>${project.build.directory}/../../interpreter/scalding</outputDirectory> | ||
| <overWriteReleases>false</overWriteReleases> | ||
| <overWriteSnapshots>false</overWriteSnapshots> | ||
| <overWriteIfNewer>true</overWriteIfNewer> | ||
| <includeScope>runtime</includeScope> | ||
| </configuration> | ||
| </execution> | ||
| <execution> | ||
| <id>copy-artifact</id> | ||
| <phase>package</phase> | ||
| <goals> | ||
| <goal>copy</goal> | ||
| </goals> | ||
| <configuration> | ||
| <outputDirectory>${project.build.directory}/../../interpreter/scalding</outputDirectory> | ||
| <overWriteReleases>false</overWriteReleases> | ||
| <overWriteSnapshots>false</overWriteSnapshots> | ||
| <overWriteIfNewer>true</overWriteIfNewer> | ||
| <includeScope>runtime</includeScope> | ||
| <artifactItems> | ||
| <artifactItem> | ||
| <groupId>${project.groupId}</groupId> | ||
| <artifactId>${project.artifactId}</artifactId> | ||
| <version>${project.version}</version> | ||
| <type>${project.packaging}</type> | ||
| </artifactItem> | ||
| </artifactItems> | ||
| </configuration> | ||
| </execution> | ||
| </executions> | ||
| </plugin> | ||
| <!-- Plugin to compile Scala code --> | ||
| <plugin> | ||
| <groupId>org.scala-tools</groupId> | ||
| <artifactId>maven-scala-plugin</artifactId> | ||
| <executions> | ||
| <execution> | ||
| <id>compile</id> | ||
| <goals> | ||
| <goal>compile</goal> | ||
| </goals> | ||
| <phase>compile</phase> | ||
| </execution> | ||
| <execution> | ||
| <id>test-compile</id> | ||
| <goals> | ||
| <goal>testCompile</goal> | ||
| </goals> | ||
| <phase>test-compile</phase> | ||
| </execution> | ||
| <execution> | ||
| <phase>process-resources</phase> | ||
| <goals> | ||
| <goal>compile</goal> | ||
| </goals> | ||
| </execution> | ||
| </executions> | ||
| </plugin> | ||
| </plugins> | ||
| </build> | ||
|
|
||
| </project> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to avoid, or at least disable this maven repo by default?
We need to avoid publish official binary package built with 3rd party maven repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the Cascading jars are actually available from http://conjars.org/. I am not sure how I can disable or avoid this.