Reliable dependency loading mechanism #319

Leemoonsoo · 2015-02-05T10:24:48Z

#308 Implements/fixes runtime dependency library loading. But the feature is unreliable so some library loaded correctly and some library not.

While it looks not easy to find out reliable solution for runtime library loading, this PR trying to do library loading before SparkIMain being created, so library does not need to be loaded dynamically on runtime, but just included as a classpath.

To do this, this PR adds an new interpreter, "DepInterpreter".
It provides separate scala interpreter and API to loads dependency. He is fetching necessary library from maven repository and keep the file list. And then when SparkInterpreter is initializing, it's passing that file list to SparkInterpreter, so SparkInterpreter adds them in the classpath without trying to load them on runtime.

DepInterpreter implementation
Warning message when DepInterpreter used after SparkInterpreter initialized.

Usage. DepInterpreter can be used with %dep expose instance of com.nflabs.zeppelin.spark.dep.DependencyContext as variable z.

Here's API

z.reset() // clean up previously added artifact and repository

// add maven repository
z.addRepo("RepoName").url("RepoURL")

// add maven snapshot repository
z.addRepo("RepoName").url("RepoURL").snapshot()

// add artifact from filesystem
z.load("/path/to.jar")

// add artifact from maven repository
z.load("groupId:artifactId:version")

// add artifact recursively (with all it's dependency)
z.load("groupId:artifactId:version").recursive()

// add artifact recursively except comma separated GroupID:ArtifactId list
z.load("groupId:artifactId:version").recursive().exclude("groupId:artifactId,groupId:artifactId, ...")

// add artifact recursively and distribute them to spark workers (sc.addJar())
z.load("groupId:artifactId:version").recursive().dist()

Example of use

…nitialized

Leemoonsoo · 2015-02-05T12:15:26Z

It's ready. Please someone review this PR.

swkimme · 2015-02-05T14:01:04Z

It works on local environment, let me test more on cluster environment. Great job!!

comments and questions:

Are there previously added library if %dep interpreter can be used only when before spark interpreter has initialized?
How can I reload dependencies? For my thought it should work after I restart SparkInterpreter, but still "Must be used before SparkInterpreter (%spark) initialized" came out after restart SparkInterpreter.

Leemoonsoo · 2015-02-05T14:26:12Z

@swkimme

Unless a) Restart interpreter, b) call z.reset() %dep interpreter keeps previously added library.
That's rights. Restart SparkInterpreter and run %dep before %spark. that's the way reload dependencies. It works for me .. and let me try again.

swkimme · 2015-02-06T09:29:10Z

z.load("org.apache.james:apache-mime4j:0.7.2")
org.sonatype.aether.resolution.DependencyResolutionException: Could not find artifact org.apache.james:apache-mime4j:jar:0.7.2 in central (http://repo1.maven.org/maven2/)

it loads with build.sbt,
"org.apache.james" % "apache-mime4j" % "0.7.2",
but it failed in %dep interpreter.

…plate

Leemoonsoo · 2015-02-07T05:09:28Z

@swkimme

it was because of apache-mime4j is pom type artifact. I pushed a fix and it can be loaded by specifying extension between groupId and version, like

%dep
z.load("org.apache.james:apache-mime4j:pom:0.7.2")

swkimme · 2015-02-07T15:43:46Z

For 2) restart issue, found it was related to
#309

* Make recursive default * Exclude by pattern

Leemoonsoo · 2015-02-08T13:17:00Z

Made some improvements

infer scala version using '::'
now z.load("eu.unicredit::hbase-rdd:0.4.0-SNAPSHOT") equivalent to z.load("eu.unicredit:hbase-rdd_2.10:0.4.0-SNAPSHOT")
recursive is now default. therefore recursive() is removed from api and excludeAll() is added instead.
exclusion now possible with pattern (wildcard with '*')

Here's updated API

z.reset() // clean up previously added artifact and repository

// add maven repository
z.addRepo("RepoName").url("RepoURL")

// add maven snapshot repository
z.addRepo("RepoName").url("RepoURL").snapshot()

// add artifact from filesystem
z.load("/path/to.jar")

// add artifact from maven repository, with no dependency
z.load("groupId:artifactId:version").excludeAll()

// add artifact recursively
z.load("groupId:artifactId:version")

// add artifact recursively except comma separated GroupID:ArtifactId list
z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")

// exclude with pattern
z.load("groupId:artifactId:version").exclude(*)
z.load("groupId:artifactId:version").exclude("groupId:artifactId:*")
z.load("groupId:artifactId:version").exclude("groupId:*")

// add artifact recursively and distribute them to spark workers (sc.addJar())
z.load("groupId:artifactId:version").dist()

swkimme · 2015-02-08T14:13:56Z

OMG, AWESOME job!

I've brought one discussion.
isn't .dist() should be a default??
I guess the libraries should be available in cluster in usual cases.

Leemoonsoo · 2015-02-10T22:33:26Z

@swkimme

Updated to make 'dist' default. 'dist()' is removed from API and added 'local()', for the case does not want to add artifact to spark cluster. Here's updated API

z.reset() // clean up previously added artifact and repository

// add maven repository
z.addRepo("RepoName").url("RepoURL")

// add maven snapshot repository
z.addRepo("RepoName").url("RepoURL").snapshot()

// add artifact from filesystem
z.load("/path/to.jar")

// add artifact from maven repository, with no dependency
z.load("groupId:artifactId:version").excludeAll()

// add artifact recursively
z.load("groupId:artifactId:version")

// add artifact recursively except comma separated GroupID:ArtifactId list
z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")

// exclude with pattern
z.load("groupId:artifactId:version").exclude(*)
z.load("groupId:artifactId:version").exclude("groupId:artifactId:*")
z.load("groupId:artifactId:version").exclude("groupId:*")

// local() skips adding artifact to spark clusters (skipping sc.addJar())
z.load("groupId:artifactId:version").local()

Leemoonsoo · 2015-02-10T23:48:01Z

Ready to be merged! #319 -> #308 -> master

swkimme · 2015-02-11T00:47:17Z

LGTM!

On 2015년 2월 11일 (수) 08:48 Lee moon soo notifications@github.com wrote:

Ready to be merged! #319 #319 ->
#308 #308 -> master

—
Reply to this email directly or view it on GitHub
#319 (comment).

Reliable dependency loading mechanism

From ZEPL/zeppelin#388. Update description of dependency loader to reflect ZEPL/zeppelin#319. To do this, document structure is changed. * docs/zeppelincontext -> removed * interpreter/spark -> added (includes description about zeppelincontext and dependencyloader) Ready to merge. Author: Lee moon soo <leemoonsoo@gmail.com> Closes #7 from Leemoonsoo/gh-pages_update_changes and squashes the following commits: a3894cf [Lee moon soo] Add interpreter/spark.md instead of docs/zeppelincontext.md update description about dependency loader

Leemoonsoo added 5 commits February 5, 2015 19:16

DepInterpreter implementation

3f55fa0

Print error message when DepInterpreter used after SparkInterpreter i…

cfa7089

…nitialized

Fix test

2338ae6

Add local m2 repo by default

2d1d39b

Fix test

c04505d

Leemoonsoo mentioned this pull request Feb 5, 2015

Fix/Improve Dependency loader #308

Merged

3 tasks

Leemoonsoo added 2 commits February 7, 2015 14:05

extension and classifier

ea2a76f

Add com.nflabs.zeppelin.spark.DepInterpreter to zeppelin-site.xml.tem…

e9ff658

…plate

Leemoonsoo added 2 commits February 8, 2015 19:59

Fix wrong api method call

55a40c6

* Infer scala version after artifactId by doing '::'

a56f6d2

* Make recursive default * Exclude by pattern

Leemoonsoo added 2 commits February 11, 2015 07:00

Take care of json4s version

6d9b5b4

Make dist() default

b2b7577

Leemoonsoo added a commit that referenced this pull request Feb 12, 2015

Merge pull request #319 from NFLabs/new/depinterpreter

ab20344

Reliable dependency loading mechanism

Leemoonsoo merged commit ab20344 into improve/libload Feb 12, 2015

Leemoonsoo deleted the new/depinterpreter branch February 12, 2015 03:04

Leemoonsoo mentioned this pull request Mar 16, 2015

Update homepage : DependencyLoader #388

Closed

Leemoonsoo mentioned this pull request Mar 28, 2015

Update dependency loader description apache/zeppelin#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reliable dependency loading mechanism #319

Reliable dependency loading mechanism #319

Uh oh!

Leemoonsoo commented Feb 5, 2015

Uh oh!

Leemoonsoo commented Feb 5, 2015

Uh oh!

swkimme commented Feb 5, 2015

Uh oh!

Leemoonsoo commented Feb 5, 2015

Uh oh!

swkimme commented Feb 6, 2015

Uh oh!

Leemoonsoo commented Feb 7, 2015

Uh oh!

swkimme commented Feb 7, 2015

Uh oh!

Leemoonsoo commented Feb 8, 2015

Uh oh!

swkimme commented Feb 8, 2015

Uh oh!

Leemoonsoo commented Feb 10, 2015

Uh oh!

Leemoonsoo commented Feb 10, 2015

Uh oh!

swkimme commented Feb 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reliable dependency loading mechanism #319

Reliable dependency loading mechanism #319

Uh oh!

Conversation

Leemoonsoo commented Feb 5, 2015

Uh oh!

Leemoonsoo commented Feb 5, 2015

Uh oh!

swkimme commented Feb 5, 2015

Uh oh!

Leemoonsoo commented Feb 5, 2015

Uh oh!

swkimme commented Feb 6, 2015

Uh oh!

Leemoonsoo commented Feb 7, 2015

Uh oh!

swkimme commented Feb 7, 2015

Uh oh!

Leemoonsoo commented Feb 8, 2015

Uh oh!

swkimme commented Feb 8, 2015

Uh oh!

Leemoonsoo commented Feb 10, 2015

Uh oh!

Leemoonsoo commented Feb 10, 2015

Uh oh!

swkimme commented Feb 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants