Make zeppelin work with CDH5.7.0 #868

krisgeus · 2016-04-29T15:29:40Z

What is this PR for?

The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together.

What type of PR is it?

Bug Fix

Todos

What is the Jira issue?

https://www.mail-archive.com/users@zeppelin.incubator.apache.org/msg03471.html

How should this be tested?

Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop)
build zeppeling (with this patch) with the following options:
mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

Screenshots (if appropriate)

Questions:

Does the licenses files need update? No
Is there breaking changes for older versions? Didn't verify
Does this needs documentation? Don't think so

… it anymore

…et config if variable is filled

…n spark-env.sh and using both is not supported

felixcheung · 2016-05-04T04:29:16Z

Thanks for sorting this out and contributing a PR.
Do you know if this change specific to CDH 5.7.0 is intentional - do you know if there's a JIRA or if this is in a change list or Cloudera supports talk about this? This seems odd that Apache Spark release 1.6.0 works but Spark 1.6 in CDH 5.7.0 does not (though it has happened before)

krisgeus · 2016-05-04T06:39:14Z

Hi Felix,

No i’m not aware this change is specific for CDH5.7.0. I ran into the issue when installing the latest CDH on my laptop and wanted to experiment with zeppelin a bit.
I have seen small differences on other occasions between the apache version of hadoop stuff and CDH.
I know they sometimes add patches to their versions which are coming in future versions of the apache stuff and sometimes need to change things to make it work nicer with their distribution.

If you need my help in any other way feel free to contact me

Cheers Kris

On 04 May 2016, at 06:29, Felix Cheung notifications@github.com wrote:

Thanks for sorting this out and contributing a PR.
Do you know if this change specific to CDH 5.7.0 is intentional - do you know if there's a JIRA or if this is in a change list or Cloudera supports talk about this? This seems odd that Apache Spark release 1.6.0 works but Spark 1.6 in CDH 5.7.0 does not (though it has happened before)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #868 (comment)

b3nbk1m70 · 2016-05-08T20:00:23Z

I'm having a similar issue using CDH 5.7.0 with Zeppelin 0.6.0. This blog link seems to have evidence of some back-porting that was done from Spark 2.0.

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/java-lang-NoSuchMethodException-exception-with-CDH-5-7-0-Apache/m-p/40427

I hope that there's an easy patch to fix this.

felixcheung · 2016-05-10T00:11:46Z

bin/interpreter.sh


 if [[ -n "${SPARK_SUBMIT}" ]]; then
-    ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} &
+    #${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} &


could you elaborate why we need to change this command line?

Removed the --driver-class-path because it was conflicting with setting the SPARK_CLASSPATH env var in my local environment.
I'm setting the following classpath in spark-env.sh to make spark from cdh5.7.0 work locally with hdfs (in local and speudo distributed mode):
SPARK_CLASSPATH=$(hadoop classpath)
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HIVE_HOME/lib/*
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_PREFIX/share/hadoop/tools/lib/*

having both SPARK_CLASSPATH set and using --driver-class-path is not supported.

Feel free to mark this specific change as only needed for me locally and skip it from the merge.

I think that's exactly what ZEPPELIN_CLASSPATH_OVERRIDES is for?
In your case you just need to change it to:

ZEPPELIN_CLASSPATH_OVERRIDES=$(hadoop classpath) ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HIVE_HOME/lib/* ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HADOOP_PREFIX/share/hadoop/tools/lib/*

I guess if you are saying you need to set SPARK_CLASSPATH for spark-shell or spark-submit?
I think the best way to do that is to check if SPARK_CLASSPATH is set then add it to --driver-class-path here and unset SPARK_CLASSPATH?
Either case, it might be good to separate that into a new JIRA issue.

felixcheung · 2016-05-10T00:14:00Z

Sure - sounds like we should push this PR.

felixcheung · 2016-05-12T22:58:29Z

Hi @krisgeus could you separate the change referenced in here so we could merge the fix?

…SSPATH in spark-env.sh and using both is not supported" This reverts commit 146b524.

krisgeus · 2016-05-13T06:40:06Z

Hi Felix,

I have reverted that specific commit in the PR.
Feel free to go ahead with merging

Cheers Kris

On 13 May 2016, at 00:58, Felix Cheung notifications@github.com wrote:

Hi @krisgeus https://github.com/krisgeus could you separate the change referenced in here https://github.com/apache/incubator-zeppelin/pull/868/files#r62711339 so we could merge the fix?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #868 (comment)

felixcheung · 2016-05-13T06:50:32Z

LGTM

felixcheung · 2016-05-13T22:26:57Z

@krisgeus thanks - it looks like tests are failing fairly consistently, but not clear what's going on. Would you have some time to take a look, or run them locally to investigate?
https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt

meniluca · 2016-05-16T14:03:34Z

Thank you @krisgeus for the workaround, however it is not sufficient for what I'm trying to do, I have an entire cluster in cdh5.7. When this issue can be solved in your opinion?

b3nbk1m70 · 2016-05-16T16:21:29Z

Has this been merged to master ready for a clean build?

Thanks,
Ben

On May 13, 2016, at 3:27 PM, Felix Cheung notifications@github.com wrote:

@krisgeus https://github.com/krisgeus thanks - it looks like tests are failing fairly consistently, but not clear what's going on. Would you have some time to take a look, or run them locally to investigate?
https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt
—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

felixcheung · 2016-05-17T04:41:49Z

I will try to look into the test error in a couple of days.
@H4ml3t Not sure I understand - could you elaborate?

meniluca · 2016-05-17T08:32:09Z

@felixcheung not much to elaborate, I would need zeppelin to work with external Spark installation with cdh5.7 :)
Cheers,
Luca

felixcheung · 2016-05-17T13:08:58Z

Does the fix in this PR make it work for your case?

krisgeus · 2016-05-17T15:51:32Z

@felixcheung I've made some progress in the testing area. It appears that spark-core-1.6.0-cdh5.7.0 brings in older akka version 2.2.3 instead of the 2.3.11 mentioned as akka.version property in the spark-1.6 profile.

I've temporarily fixed this by adding a exclusion to this profile. Not the way it should end up but a temp solution for me to be able to run more tests successfully.

The diff:
diff --git a/spark-dependencies/pom.xml b/spark-dependencies/pom.xml
index 8e23f22..58af3f0 100644
--- a/spark-dependencies/pom.xml
+++ b/spark-dependencies/pom.xml
@@ -518,6 +518,23 @@

   <dependencies>

```
   <dependency>
```

     <groupId>org.apache.spark</groupId>

     <artifactId>spark-core_2.10</artifactId>

     <version>${spark.version}</version>

```
     <exclusions>
```
```
       <exclusion>
```

         <groupId>org.apache.hadoop</groupId>

         <artifactId>hadoop-client</artifactId>

```
       </exclusion>
```
```
       <exclusion>
```

         <groupId>org.spark-project.akka</groupId>

```
         <artifactId>*</artifactId>
```
```
       </exclusion>
```
```
     </exclusions>
```
```
   </dependency>
```

Still a few tests failing but getting closer.
Thought you wanted to know before you start running tests on your end.

felixcheung · 2016-05-17T18:43:24Z

Thanks!
CDH is known to have an older Akka and normally we build with the profile -Pvendor-repo to get that to work.

secsubs · 2016-05-18T21:58:22Z

I have the exact same issue with CDH-5.7.0 and Zepplin-0.5.6. Is there a workaround available? Thanks.

felixcheung · 2016-05-19T03:06:01Z

@secsubs unfortunately no, it seems to be a breaking API change in CDH-5.7 (also in Spark 2.0)
Let's get Travis tests to pass and merge this ASAP. Thanks!

kovalenko-boris · 2016-05-24T12:10:02Z

same problem here, merge it please!!! @felixcheung

felixcheung · 2016-05-25T04:29:16Z

ok, with the same code changes everything passes except for selenium, which is a known test issue.
https://travis-ci.org/apache/incubator-zeppelin/builds/132738602

merging if no more comment

Leemoonsoo · 2016-05-25T04:43:01Z

+1 for merge

b3nbk1m70 · 2016-05-25T22:16:01Z

Please do…

On May 24, 2016, at 9:29 PM, Felix Cheung notifications@github.com wrote:

ok, with the same code changes everything passes except selenium, which is a known test issue.
https://travis-ci.org/apache/incubator-zeppelin/builds/132738602 https://travis-ci.org/apache/incubator-zeppelin/builds/132738602
merging if no more comment

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

### What is this PR for? The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together. ### What type of PR is it? Bug Fix ### Todos ### What is the Jira issue? https://www.mail-archive.com/userszeppelin.incubator.apache.org/msg03471.html ### How should this be tested? Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop) build zeppeling (with this patch) with the following options: mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? Didn't verify * Does this needs documentation? Don't think so Author: Kris Geusebroek <kgeusebroek@KgMBP2015.local> Closes apache#868 from krisgeus/master and squashes the following commits: e33d520 [Kris Geusebroek] Revert "Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported" 488cce6 [Kris Geusebroek] Added logging and comments to clarify reason not throwing exception 146b524 [Kris Geusebroek] Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported 24ea584 [Kris Geusebroek] method classServerUri not available in cdh5.7.0 Spark version. Only set config if variable is filled 50717dd [Kris Geusebroek] Use slf4j instead of parquet bundled one since parquet doesn't bundle it anymore

b3nbk1m70 · 2016-06-03T16:50:53Z

Hi all,

I downloaded the latest master clone from github and built it using this command:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

I got this error running the Zeppelin Tutorial example:

java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

So, I rebuilt using this command:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo -Pyarn

Now, I get this after a long while:

java.net.ConnectException: Connection refused

Can someone provide some advice on getting this to work with CDH 5.7.0?

Thanks,
Ben

On May 24, 2016, at 9:43 PM, Lee moon soo notifications@github.com wrote:

+1 for merge

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

felixcheung · 2016-06-03T17:14:09Z

Can you check the log file?

b3nbk1m70 · 2016-06-03T17:45:41Z

I get this in the Zeppelin log file:

ERROR [2016-06-03 16:49:20,929]({qtp519821334-82} NotebookServer.java[onMessage]:210) - Can't handle message
java.lang.NullPointerException
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:128)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:56)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
INFO [2016-06-03 16:54:20,934]({qtp519821334-86} NotebookServer.java[onClose]:216) - Closed connection to 10.4.67.203 : 56658. (1001) Idle Timeout
ERROR [2016-06-03 17:06:45,825]({Thread-21} JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:361)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:226)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:279)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:264)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:358)
... 3 more
ERROR [2016-06-03 17:26:57,357]({Thread-40} JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:361)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:226)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:279)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:264)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:358)
... 3 more

And in the Interpreter for Spark log file:

INFO [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:225) - ------ Create new SparkContext yarn-client -------
WARN [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:249) - Spark method classServerUri not available due to: [org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri()]
WARN [2016-06-03 17:26:57,774]({pool-2-thread-3} Logging.scala[logWarning]:70) - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:83)
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:330)
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:118)
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:499)
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:408)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1492)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1477)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
INFO [2016-06-03 17:26:57,775]({pool-2-thread-3} Logging.scala[logInfo]:58) - Running Spark version 1.6.0
WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) -
SPARK_CLASSPATH was detected (set to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/*::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes').
This is deprecated in Spark 1.0+.

Please instead use:

./spark-submit with --driver-class-path to augment the driver classpath
spark.executor.extraClassPath to augment the executor classpath

INFO [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:225) - ------ Create new SparkContext yarn-client -------
WARN [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:249) - Spark method classServerUri not available due to: [org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri()]
WARN [2016-06-03 17:26:57,774]({pool-2-thread-3} Logging.scala[logWarning]:70) - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:83)
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:330)
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:118)
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:499)
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:408)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1492)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1477)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
INFO [2016-06-03 17:26:57,775]({pool-2-thread-3} Logging.scala[logInfo]:58) - Running Spark version 1.6.0
WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) -
SPARK_CLASSPATH was detected (set to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/*::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes').
This is deprecated in Spark 1.0+.

Please instead use:

./spark-submit with --driver-class-path to augment the driver classpath
spark.executor.extraClassPath to augment the executor classpath

WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) - Setting 'spark.executor.extraClassPath' to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes' as a work-around.
WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) - Setting 'spark.driver.extraClassPath' to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes' as a work-around.
INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - Changing view acls to: zeppelin
INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - Changing modify acls to: zeppelin
INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); users with modify permissions: Set(zeppelin)
INFO [2016-06-03 17:26:57,784]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'sparkDriver' on port 51559.
INFO [2016-06-03 17:26:57,843]({sparkDriverActorSystem-akka.actor.default-dispatcher-2} Slf4jLogger.scala[applyOrElse]:80) - Slf4jLogger started
INFO [2016-06-03 17:26:57,846]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Starting remoting
INFO [2016-06-03 17:26:57,853]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.28.22.133:34296]
INFO [2016-06-03 17:26:57,853]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@172.28.22.133:34296]
INFO [2016-06-03 17:26:57,854]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'sparkDriverActorSystem' on port 34296.
INFO [2016-06-03 17:26:57,854]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering MapOutputTracker
INFO [2016-06-03 17:26:57,855]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering BlockManagerMaster
INFO [2016-06-03 17:26:57,856]({pool-2-thread-3} Logging.scala[logInfo]:58) - Created local directory at /tmp/blockmgr-7aec5561-fd7c-4023-84cf-5ddd3aef88a4
INFO [2016-06-03 17:26:57,857]({pool-2-thread-3} Logging.scala[logInfo]:58) - MemoryStore started with capacity 528.1 MB
INFO [2016-06-03 17:26:57,890]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering OutputCommitCoordinator
INFO [2016-06-03 17:26:57,897]({pool-2-thread-3} Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
INFO [2016-06-03 17:26:57,902]({pool-2-thread-3} AbstractConnector.java[doStart]:338) - Started SelectChannelConnector@0.0.0.0:4040
INFO [2016-06-03 17:26:57,903]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'SparkUI' on port 4040.
INFO [2016-06-03 17:26:57,904]({pool-2-thread-3} Logging.scala[logInfo]:58) - Started SparkUI at http://172.28.22.133:4040
INFO [2016-06-03 17:26:57,924]({pool-2-thread-3} Logging.scala[logInfo]:58) - Created default pool default, schedulingMode: FIFO, minShare: 0, weight: 1
INFO [2016-06-03 17:26:57,955]({pool-2-thread-3} RMProxy.java[createRMProxy]:98) - Connecting to ResourceManager at /0.0.0.0:8032
INFO [2016-06-03 17:26:58,958]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO [2016-06-03 17:26:59,959]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO [2016-06-03 17:27:00,961]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Hope this helps.

Thanks,
Ben

On Jun 3, 2016, at 10:14 AM, Felix Cheung notifications@github.com wrote:

Can you check the log file?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MdO4CkE120f89WRXAnowf1Y1yXBks5qIGDvgaJpZM4IS7A7.

b3nbk1m70 · 2016-06-03T20:30:42Z

I saw that the config file zeppelin-env.sh contents are different than previous versions. It uses “set” instead of “export”. So, I tried to use the old config file from Zeppelin 0.5.6 to see if this would change anything. And, it turns out that Spark Scala code was able to run on YARN. This would mean that the new config file is not being read correctly. Now, I am getting another issue. I cannot run Spark SQL “%sql” queries. This is what I get.

ERROR [2016-06-03 20:18:58,230]({SparkListenerBus} Logging.scala[logError]:95) - Listener SQLListener threw an exception
java.lang.NullPointerException
at org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)

Do you know what this means?

Thanks,
Ben

On Jun 3, 2016, at 10:14 AM, Felix Cheung notifications@github.com wrote:

Can you check the log file?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MdO4CkE120f89WRXAnowf1Y1yXBks5qIGDvgaJpZM4IS7A7.

meniluca · 2016-06-13T15:52:28Z

@bbuild11 do you have an update and can you share your "-conf" file?
Thank you in advance,
Luca

b3nbk1m70 · 2016-06-13T16:47:26Z

Hi Luca,

Here are the zeppelin-env.sh file contents. I hope this is what you mean?

export JAVA_HOME=/usr/java/latest
export MASTER=yarn-client
export ZEPPELIN_JAVA_OPTS="-Dspark.yarn.queue=root.bi"

export ZEPPELIN_NOTEBOOK_S3_BUCKET=zeppelin-dps2
export ZEPPELIN_NOTEBOOK_S3_USER=gxetl

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

export HADOOP_CONF_DIR=/etc/hadoop/conf

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip
export SPARK_YARN_USER_ENV="PYTHONPATH=$PYTHONPATH"

export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.S3NotebookRepo, com.nflabs.zeppelinhub.notebook.repo.ZeppelinHubRepo"
export ZEPPELINHUB_API_ADDRESS="https://www.zeppelinhub.com"
export ZEPPELINHUB_API_TOKEN=“"

The zeppelin-site.xml has just S3 settings enabled and set like above.

Thanks,
Ben

On Jun 13, 2016, at 8:52 AM, Luca Menichetti notifications@github.com wrote:

@bbuild11 https://github.com/bbuild11 do you have an update and can you share your "-conf" file?
Thank you in advance,
Luca

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9CHBsYeHN6n2CwxTSJLNIvWxGj4xks5qLXzLgaJpZM4IS7A7.

felixcheung · 2016-06-13T18:04:52Z

Are folks still having problems with this?

b3nbk1m70 · 2016-06-13T18:32:10Z

Felix,

I will attempt CDH 5.7.0 again this week. I will try Zeppelin also. If I get any issues, I will let you know.

Thanks,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

secsubs · 2016-06-13T18:50:25Z

I managed to build successfully against CDH 5.7.0 right after the fix was released. Anyone need any specific artifacts? I am using Spark and Drill interpreters.

kovalenko-boris · 2016-06-13T19:07:34Z

@secsubs can you publish a short tutorial on how to build the jar?

secsubs · 2016-06-13T19:32:56Z

@kovalenko-boris I used this tutorial:
https://community.mapr.com/docs/DOC-1493

That said, if there is any specific aspect that you need me to zoom in on then I can look.

felixcheung · 2016-06-13T19:46:02Z

That was MapR though, you probably want something like this for CDH:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

secsubs · 2016-06-13T20:23:00Z

Sorry, my bad, this tutorial from Cloudera blog still works:
http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/

b3nbk1m70 · 2016-06-25T17:06:47Z

Felix,

I tried again with the latest. I was able to compile. But, I found that there was a jackson incompatibility. So, I removed the jackson jars from the zeppelin-server/target/lib and zeppelin-zengine/target/lib and replaced them with symlinks to the jackson jars in /opt/cloudera/parcels/CDH/jars to use the CDH 5.7.1 versions. Now, I’m getting this error below.

java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is this happening because we are on jdk1.8.0_60 (Java 8)?

Thanks,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

b3nbk1m70 · 2016-06-25T22:09:11Z

To add…

We have Livy server up and running in our cluster. I tried to use it so we could use Spark independent of Zeppelin. In this way, Zeppelin can interface with any version of Spark maintained separately. Unfortunately, the Livy interpreter does not work. This would have solved things too.

Cheers,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

felixcheung · 2016-06-26T02:18:31Z

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?

b3nbk1m70 · 2016-06-26T02:40:46Z

Hi Felix,

All our environments are on jdk 8. We no longer support jdk 7 because Oracle end-of-life’d it. So, we moved to the Cloudera recommended version of jdk 8.

I will submit my livy issue on the other mailing list. Basically, I could not add the livy interpreter because it just doesn’t save.

Thanks,
Ben

On Jun 25, 2016, at 7:18 PM, Felix Cheung notifications@github.com wrote:

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9JoDrBqfPT23EHZR-uxmNk4EjqI5ks5qPeGCgaJpZM4IS7A7.

b3nbk1m70 · 2016-06-26T02:44:11Z

Felix,

I forgot to add. Zeppelin 0.5.6 seems to work fine with jdk 8. So, I’m thinking that it might be something else specific to CDH 5.7. But, I’m not sure.

Thanks,
Ben

On Jun 25, 2016, at 7:18 PM, Felix Cheung notifications@github.com wrote:

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9JoDrBqfPT23EHZR-uxmNk4EjqI5ks5qPeGCgaJpZM4IS7A7.

felixcheung · 2016-07-09T06:29:22Z

From the email discussions seem like you are through with the error above. This "ClassNotFoundException" would seem like some sort of mismatch on the Spark side.
Unfortunately I no longer have a way to test CDH, so I'm hoping someone else could test this out.

Kris Geusebroek added 3 commits April 29, 2016 17:24

Use slf4j instead of parquet bundled one since parquet doesn't bundle…

50717dd

… it anymore

method classServerUri not available in cdh5.7.0 Spark version. Only s…

24ea584

…et config if variable is filled

Don't use extra driver classpath option since I use SPARK_CLASSPATH i…

146b524

…n spark-env.sh and using both is not supported

felixcheung reviewed May 10, 2016
View reviewed changes

Added logging and comments to clarify reason not throwing exception

488cce6

felixcheung mentioned this pull request May 12, 2016

[ZEPPELIN-605] Add support for Scala 2.11 #747

Closed

Revert "Don't use extra driver classpath option since I use SPARK_CLA…

e33d520

…SSPATH in spark-env.sh and using both is not supported" This reverts commit 146b524.

asfgit closed this in 78c7b55 May 26, 2016

Make zeppelin work with CDH5.7.0 #868

Make zeppelin work with CDH5.7.0 #868

Uh oh!

Conversation

krisgeus commented Apr 29, 2016

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Uh oh!

felixcheung commented May 4, 2016

Uh oh!

krisgeus commented May 4, 2016

Uh oh!

b3nbk1m70 commented May 8, 2016

Uh oh!

felixcheung May 10, 2016

Choose a reason for hiding this comment

Uh oh!

krisgeus May 10, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung May 10, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung May 10, 2016

Choose a reason for hiding this comment

Uh oh!

felixcheung commented May 10, 2016

Uh oh!

felixcheung commented May 12, 2016

Uh oh!

krisgeus commented May 13, 2016

Uh oh!

felixcheung commented May 13, 2016

Uh oh!

felixcheung commented May 13, 2016

Uh oh!

meniluca commented May 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b3nbk1m70 commented May 16, 2016

Uh oh!

felixcheung commented May 17, 2016

Uh oh!

meniluca commented May 17, 2016

Uh oh!

felixcheung commented May 17, 2016

Uh oh!

krisgeus commented May 17, 2016

Uh oh!

felixcheung commented May 17, 2016

Uh oh!

secsubs commented May 18, 2016

Uh oh!

felixcheung commented May 19, 2016

Uh oh!

kovalenko-boris commented May 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung commented May 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Leemoonsoo commented May 25, 2016

Uh oh!

b3nbk1m70 commented May 25, 2016

Uh oh!

b3nbk1m70 commented Jun 3, 2016

Uh oh!

felixcheung commented Jun 3, 2016

Uh oh!

b3nbk1m70 commented Jun 3, 2016

Uh oh!

b3nbk1m70 commented Jun 3, 2016

Uh oh!

meniluca commented Jun 13, 2016

Uh oh!

meniluca commented May 16, 2016 •

edited

Loading

kovalenko-boris commented May 24, 2016 •

edited

Loading

felixcheung commented May 25, 2016 •

edited

Loading

secsubs commented Jun 13, 2016 •

edited

Loading