Skip to content

Conversation

@Leemoonsoo
Copy link
Contributor

Zeppelin has ability to load dependency library dynamically from local filesystem or remote maven repository. (see dependency loader)

However, it's buggy and not working correctly. this PR trying to fixes the problem and improve it's API.

  • Fix library loading bug
  • Improve API
  • Ability to add maven repository

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be it would be cleaner to

static import scala.collection.JavaConversions.*
...
dep.load(artifaict, asJavaCollection(excludes), true, false);

@bzz
Copy link

bzz commented Jan 28, 2015

Great improvement! Does it now have both, a Java and Scala API?

@Leemoonsoo
Copy link
Contributor Author

@bzz
Don't have Scala API yet. And current Java API is not really carefully designed.
So it's better to have better form of Java API. Of course Scala API, too. in the future.

@Leemoonsoo
Copy link
Contributor Author

Summary of the changes

  • Fixed dependency loading bug
  • API to exclude some transitive dependencies
  • API to add local/remote maven repository

Here's some examples

a. Load elasticsearch-hadoop library and distribute to spark cluster (sc.addJar()) but exclude some transitive dependencies, like cascading-local, pig, etc.

z.loadAndDist("org.elasticsearch:elasticsearch-hadoop:2.0.2", 
    Seq("cascading:cascading-local",
        "cascading:cascading-hadoop",
        "org.apache.pig:pig",
        "joda-time:joda-time",
        "org.apache.hive:hive-service"))

b. Add local .m2 repository and load library from it

z.addMavenRepo("local", "file:///Users/moon/.m2/repository")
z.load("com.nflabs.zeppelin:zeppelin-markdown:0.5.0-SNAPSHOT")

c. Add remote maven repository (snapshot, https protocol) and load a library from it

z.addMavenRepo("snapshot", "https://oss.sonatype.org/content/repositories/snapshots/", true)
z.load("com.nflabs.zeppelin:markdown:0.5.0-SNAPSHOT")

Ready to merge!

@anthonycorbacho
Copy link
Contributor

Any tests to show that it works?

@Leemoonsoo
Copy link
Contributor Author

@anthonycorbacho I've added basic test. oops test is failing..

@bzz
Copy link

bzz commented Jan 28, 2015

Tested it locally - works like a charm.

I could not find the way to list all loaded artifact and because of that I guess nothing can be done if user loads two different versions of the same library\dependency, so it's up to user (not to do it)

@Leemoonsoo
Copy link
Contributor Author

@anthonycorbacho Added basic test.

@bzz I modified API to make it return the list of dependency loaded. It's still not giving list of all loaded artifact so far. But it's giving list that loaded transitive dependency in particular z.load() call. Here's screenshot. I think it can be helpful a little.
image

@bzz
Copy link

bzz commented Jan 29, 2015

Thank you, this looks greeat to me

@swkimme
Copy link
Contributor

swkimme commented Jan 29, 2015

What if we add addLocalMavenRepo API?

it will be like
val homeDir = System.getProperty("user.home")
addMavenRepo("local", s"file://$homeDir/.m2/repository")

@swkimme
Copy link
Contributor

swkimme commented Jan 29, 2015

I'm testing it,
it's really sweet feature but not works very well for me.

  1. library loads well, but calling sc.textFile("") will create an error.
    error: error while loading Partition, class file '/home/zeppelin/zeppelin/interpreter/spark/spark-core_2.10-1.1.1.jar(org/apache/spark/Partition.class)' has location not matching its contents: contains class Partition
    error: error while loading StorageLevel, class file '/home/zeppelin/zeppelin/interpreter/spark/spark-core_2.10-1.1.1.jar(org/apache/spark/storage/StorageLevel.class)' has location not matching its contents: contains class StorageLevel
    error: error while loading Partitioner, class file '/home/zeppelin/zeppelin/interpreter/spark/spark-core_2.10-1.1.1.jar(org/apache/spark/Partitioner.class)' has location not matching its contents: contains class Partitioner
    error: error while loading BoundedDouble, class file '/home/zeppelin/zeppelin/interpreter/spark/spark-core_2.10-1.1.1.jar(org/apache/spark/partial/BoundedDouble.class)' has location not matching its contents: contains class BoundedDouble
  2. sqlc might not loaded
    :54: error: value registerTempTable is not a member of org.apache.spark.rdd.RDD[DailyStatsChart]
    dailyStats.registerTempTable("daily_stats")

@Leemoonsoo
Copy link
Contributor Author

@swkimme What i found so far is, Spark 1.2 has the same problem. Can you confirm the same error happening on Spark 1.2 with :cp command in bin/spark-shell ?

Code for load library in Zeppelin is basically came from Spark 1.2. Therefore, to solve the problem we need completely different approach.

@Leemoonsoo
Copy link
Contributor Author

@swkimme
Copy link
Contributor

swkimme commented Feb 4, 2015

I got same error in spark 1.2 shell.

@swkimme
Copy link
Contributor

swkimme commented Feb 4, 2015

+1) does addMavenRepo feature work?

When I run

z.addMavenRepo("snapshots", "https://oss.sonatype.org/content/repositories/snapshots/", true)
z.load("eu.unicredit:hbase-rdd:0.4.0-SNAPSHOT")

----- result

org.sonatype.aether.resolution.ArtifactResolutionException: Could not find artifact eu.unicredit:hbase-rdd:jar:0.4.0-SNAPSHOT in central (http://repo1.maven.org/maven2/)
at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:537)
at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:216)
at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:193)
at org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveArtifact(DefaultRepositorySystem.java:286)
at com.nflabs.zeppelin.spark.dep.DependencyResolver.getArtifact(DependencyResolver.java:269)
at com.nflabs.zeppelin.spark.dep.DependencyResolver.loadFromMvn(DependencyResolver.java:218)
at com.nflabs.zeppelin.spark.dep.DependencyResolver.load(DependencyResolver.java:185)
at com.nflabs.zeppelin.spark.dep.DependencyResolver.load(DependencyResolver.java:174)
at com.nflabs.zeppelin.spark.ZeppelinContext.load(ZeppelinContext.java:63)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:20)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:25)
at $iwC$$iwC$$iwC$$iwC.(:27)
at $iwC$$iwC$$iwC.(:29)
at $iwC$$iwC.(:31)
at $iwC.(:33)
at (:35)
at .(:39)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at com.nflabs.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:369)
at com.nflabs.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:345)
at com.nflabs.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:339)
at com.nflabs.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:54)
at com.nflabs.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:77)
at com.nflabs.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:184)
at com.nflabs.zeppelin.scheduler.Job.run(Job.java:147)
at com.nflabs.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.sonatype.aether.transfer.ArtifactNotFoundException: Could not find artifact eu.unicredit:hbase-rdd:jar:0.4.0-SNAPSHOT in central (http://repo1.maven.org/maven2/)
at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$4.wrap(WagonRepositoryConnector.java:971)
at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$4.wrap(WagonRepositoryConnector.java:966)
at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$GetTask.flush(WagonRepositoryConnector.java:707)
at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$GetTask.flush(WagonRepositoryConnector.java:701)
at org.sonatype.aether.connector.wagon.WagonRepositoryConnector.get(WagonRepositoryConnector.java:452)
at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:456)

@Leemoonsoo
Copy link
Contributor Author

@swkimme
i think it should be (_2.10 after hbase-rdd)

z.load("eu.unicredit:hbase-rdd_2.10:0.4.0-SNAPSHOT")

By the way, I've implemented completely new way of library load. It's not loading library at runtime, so it is more reliable. Could you review #319 and check it works for your cases?

@swkimme
Copy link
Contributor

swkimme commented Feb 6, 2015

@Leemoonsoo
Oh, that was my mistake.
in SBT, _2.10 is not necessary if I write like this (%% instead of %)
"eu.unicredit" %% "hbase-rdd" % "0.4.0-SNAPSHOT"
So I was forgot about it.

Isn't it great if we adopt that scala-version inference mechanism of SBT?

@Leemoonsoo
Copy link
Contributor Author

@swkimme
Sure. I think SBT also does the artifact type inference.
Do you have any idea how SBT do such inferences?

* Make recursive default
* Exclude by pattern
Reliable dependency loading mechanism
Conflicts:
	spark/src/main/java/com/nflabs/zeppelin/spark/SparkInterpreter.java
Leemoonsoo added a commit that referenced this pull request Feb 12, 2015
Fix/Improve Dependency loader
@Leemoonsoo Leemoonsoo merged commit 5c160ab into master Feb 12, 2015
@Leemoonsoo Leemoonsoo deleted the improve/libload branch February 12, 2015 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants