Skip to content

Conversation

@krisgeus
Copy link

What is this PR for?

The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together.

What type of PR is it?

Bug Fix

Todos

What is the Jira issue?

https://www.mail-archive.com/users@zeppelin.incubator.apache.org/msg03471.html

How should this be tested?

Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop)
build zeppeling (with this patch) with the following options:
mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? Didn't verify
  • Does this needs documentation? Don't think so

@felixcheung
Copy link
Member

Thanks for sorting this out and contributing a PR.
Do you know if this change specific to CDH 5.7.0 is intentional - do you know if there's a JIRA or if this is in a change list or Cloudera supports talk about this? This seems odd that Apache Spark release 1.6.0 works but Spark 1.6 in CDH 5.7.0 does not (though it has happened before)

@krisgeus
Copy link
Author

krisgeus commented May 4, 2016

Hi Felix,

No i’m not aware this change is specific for CDH5.7.0. I ran into the issue when installing the latest CDH on my laptop and wanted to experiment with zeppelin a bit.
I have seen small differences on other occasions between the apache version of hadoop stuff and CDH.
I know they sometimes add patches to their versions which are coming in future versions of the apache stuff and sometimes need to change things to make it work nicer with their distribution.

If you need my help in any other way feel free to contact me

Cheers Kris

On 04 May 2016, at 06:29, Felix Cheung notifications@github.com wrote:

Thanks for sorting this out and contributing a PR.
Do you know if this change specific to CDH 5.7.0 is intentional - do you know if there's a JIRA or if this is in a change list or Cloudera supports talk about this? This seems odd that Apache Spark release 1.6.0 works but Spark 1.6 in CDH 5.7.0 does not (though it has happened before)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #868 (comment)

@b3nbk1m70
Copy link

I'm having a similar issue using CDH 5.7.0 with Zeppelin 0.6.0. This blog link seems to have evidence of some back-porting that was done from Spark 2.0.

https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/java-lang-NoSuchMethodException-exception-with-CDH-5-7-0-Apache/m-p/40427

I hope that there's an easy patch to fix this.


if [[ -n "${SPARK_SUBMIT}" ]]; then
${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} &
#${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate why we need to change this command line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the --driver-class-path because it was conflicting with setting the SPARK_CLASSPATH env var in my local environment.
I'm setting the following classpath in spark-env.sh to make spark from cdh5.7.0 work locally with hdfs (in local and speudo distributed mode):
SPARK_CLASSPATH=$(hadoop classpath)
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HIVE_HOME/lib/*
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_PREFIX/share/hadoop/tools/lib/*

having both SPARK_CLASSPATH set and using --driver-class-path is not supported.

Feel free to mark this specific change as only needed for me locally and skip it from the merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's exactly what ZEPPELIN_CLASSPATH_OVERRIDES is for?
In your case you just need to change it to:

ZEPPELIN_CLASSPATH_OVERRIDES=$(hadoop classpath)
ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HIVE_HOME/lib/*
ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HADOOP_PREFIX/share/hadoop/tools/lib/*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you are saying you need to set SPARK_CLASSPATH for spark-shell or spark-submit?
I think the best way to do that is to check if SPARK_CLASSPATH is set then add it to --driver-class-path here and unset SPARK_CLASSPATH?
Either case, it might be good to separate that into a new JIRA issue.

@felixcheung
Copy link
Member

Sure - sounds like we should push this PR.

@felixcheung
Copy link
Member

Hi @krisgeus could you separate the change referenced in here so we could merge the fix?

…SSPATH in spark-env.sh and using both is not supported"

This reverts commit 146b524.
@krisgeus
Copy link
Author

Hi Felix,

I have reverted that specific commit in the PR.
Feel free to go ahead with merging

Cheers Kris

On 13 May 2016, at 00:58, Felix Cheung notifications@github.com wrote:

Hi @krisgeus https://github.com/krisgeus could you separate the change referenced in here https://github.com/apache/incubator-zeppelin/pull/868/files#r62711339 so we could merge the fix?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #868 (comment)

@felixcheung
Copy link
Member

LGTM

@felixcheung
Copy link
Member

@krisgeus thanks - it looks like tests are failing fairly consistently, but not clear what's going on. Would you have some time to take a look, or run them locally to investigate?
https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt

@meniluca
Copy link
Contributor

meniluca commented May 16, 2016

Thank you @krisgeus for the workaround, however it is not sufficient for what I'm trying to do, I have an entire cluster in cdh5.7. When this issue can be solved in your opinion?

@b3nbk1m70
Copy link

Has this been merged to master ready for a clean build?

Thanks,
Ben

On May 13, 2016, at 3:27 PM, Felix Cheung notifications@github.com wrote:

@krisgeus https://github.com/krisgeus thanks - it looks like tests are failing fairly consistently, but not clear what's going on. Would you have some time to take a look, or run them locally to investigate?
https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt https://s3.amazonaws.com/archive.travis-ci.org/jobs/129069198/log.txt

You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

@felixcheung
Copy link
Member

I will try to look into the test error in a couple of days.
@H4ml3t Not sure I understand - could you elaborate?

@meniluca
Copy link
Contributor

@felixcheung not much to elaborate, I would need zeppelin to work with external Spark installation with cdh5.7 :)
Cheers,
Luca

@felixcheung
Copy link
Member

Does the fix in this PR make it work for your case?

@krisgeus
Copy link
Author

@felixcheung I've made some progress in the testing area. It appears that spark-core-1.6.0-cdh5.7.0 brings in older akka version 2.2.3 instead of the 2.3.11 mentioned as akka.version property in the spark-1.6 profile.

I've temporarily fixed this by adding a exclusion to this profile. Not the way it should end up but a temp solution for me to be able to run more tests successfully.

The diff:
diff --git a/spark-dependencies/pom.xml b/spark-dependencies/pom.xml
index 8e23f22..58af3f0 100644
--- a/spark-dependencies/pom.xml
+++ b/spark-dependencies/pom.xml
@@ -518,6 +518,23 @@

   <dependencies>
  •    <dependency>
    
  •      <groupId>org.apache.spark</groupId>
    
  •      <artifactId>spark-core_2.10</artifactId>
    
  •      <version>${spark.version}</version>
    
  •      <exclusions>
    
  •        <exclusion>
    
  •          <groupId>org.apache.hadoop</groupId>
    
  •          <artifactId>hadoop-client</artifactId>
    
  •        </exclusion>
    
  •        <exclusion>
    
  •          <groupId>org.spark-project.akka</groupId>
    
  •          <artifactId>*</artifactId>
    
  •        </exclusion>
    
  •      </exclusions>
    
  •    </dependency>
    

Still a few tests failing but getting closer.
Thought you wanted to know before you start running tests on your end.

@felixcheung
Copy link
Member

Thanks! 
CDH is known to have an older Akka and normally we build with the profile -Pvendor-repo to get that to work.

@secsubs
Copy link

secsubs commented May 18, 2016

I have the exact same issue with CDH-5.7.0 and Zepplin-0.5.6. Is there a workaround available? Thanks.

@felixcheung
Copy link
Member

@secsubs unfortunately no, it seems to be a breaking API change in CDH-5.7 (also in Spark 2.0)
Let's get Travis tests to pass and merge this ASAP. Thanks!

@kovalenko-boris
Copy link

kovalenko-boris commented May 24, 2016

same problem here, merge it please!!! @felixcheung

@felixcheung
Copy link
Member

felixcheung commented May 25, 2016

ok, with the same code changes everything passes except for selenium, which is a known test issue.
https://travis-ci.org/apache/incubator-zeppelin/builds/132738602

merging if no more comment

@Leemoonsoo
Copy link
Member

+1 for merge

@b3nbk1m70
Copy link

Please do…

On May 24, 2016, at 9:29 PM, Felix Cheung notifications@github.com wrote:

ok, with the same code changes everything passes except selenium, which is a known test issue.
https://travis-ci.org/apache/incubator-zeppelin/builds/132738602 https://travis-ci.org/apache/incubator-zeppelin/builds/132738602
merging if no more comment


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

@asfgit asfgit closed this in 78c7b55 May 26, 2016
shijinkui pushed a commit to shijinkui/incubator-zeppelin that referenced this pull request May 26, 2016
### What is this PR for?
The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together.

### What type of PR is it?
Bug Fix

### Todos

### What is the Jira issue?
https://www.mail-archive.com/userszeppelin.incubator.apache.org/msg03471.html

### How should this be tested?
Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop)
build zeppeling (with this patch) with the following options:
mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? Didn't verify
* Does this needs documentation? Don't think so

Author: Kris Geusebroek <kgeusebroek@KgMBP2015.local>

Closes apache#868 from krisgeus/master and squashes the following commits:

e33d520 [Kris Geusebroek] Revert "Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported"
488cce6 [Kris Geusebroek] Added logging and comments to clarify reason not throwing exception
146b524 [Kris Geusebroek] Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported
24ea584 [Kris Geusebroek] method classServerUri not available in cdh5.7.0 Spark version. Only set config if variable is filled
50717dd [Kris Geusebroek] Use slf4j instead of parquet bundled one since parquet doesn't bundle it anymore
@b3nbk1m70
Copy link

Hi all,

I downloaded the latest master clone from github and built it using this command:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

I got this error running the Zeppelin Tutorial example:

java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

So, I rebuilt using this command:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo -Pyarn

Now, I get this after a long while:

java.net.ConnectException: Connection refused

Can someone provide some advice on getting this to work with CDH 5.7.0?

Thanks,
Ben

On May 24, 2016, at 9:43 PM, Lee moon soo notifications@github.com wrote:

+1 for merge


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #868 (comment)

@felixcheung
Copy link
Member

Can you check the log file?

@b3nbk1m70
Copy link

I get this in the Zeppelin log file:

ERROR [2016-06-03 16:49:20,929]({qtp519821334-82} NotebookServer.java[onMessage]:210) - Can't handle message
java.lang.NullPointerException
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:128)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:56)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
INFO [2016-06-03 16:54:20,934]({qtp519821334-86} NotebookServer.java[onClose]:216) - Closed connection to 10.4.67.203 : 56658. (1001) Idle Timeout
ERROR [2016-06-03 17:06:45,825]({Thread-21} JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:361)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:226)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:279)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:264)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:358)
... 3 more
ERROR [2016-06-03 17:26:57,357]({Thread-40} JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:361)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:226)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:279)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:264)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:358)
... 3 more

And in the Interpreter for Spark log file:

INFO [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:225) - ------ Create new SparkContext yarn-client -------
WARN [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:249) - Spark method classServerUri not available due to: [org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri()]
WARN [2016-06-03 17:26:57,774]({pool-2-thread-3} Logging.scala[logWarning]:70) - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:83)
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:330)
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:118)
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:499)
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:408)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1492)
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1477)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
INFO [2016-06-03 17:26:57,775]({pool-2-thread-3} Logging.scala[logInfo]:58) - Running Spark version 1.6.0
WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) -
SPARK_CLASSPATH was detected (set to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/*::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes').
This is deprecated in Spark 1.0+.

Please instead use:

  • ./spark-submit with --driver-class-path to augment the driver classpath

  • spark.executor.extraClassPath to augment the executor classpath

    INFO [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:225) - ------ Create new SparkContext yarn-client -------
    WARN [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:249) - Spark method classServerUri not available due to: [org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri()]
    WARN [2016-06-03 17:26:57,774]({pool-2-thread-3} Logging.scala[logWarning]:70) - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
    org.apache.spark.SparkContext.(SparkContext.scala:83)
    org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:330)
    org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:118)
    org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:499)
    org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
    org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
    org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
    org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:408)
    org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1492)
    org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1477)
    org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    java.lang.Thread.run(Thread.java:745)
    INFO [2016-06-03 17:26:57,775]({pool-2-thread-3} Logging.scala[logInfo]:58) - Running Spark version 1.6.0
    WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) -
    SPARK_CLASSPATH was detected (set to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/*::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes').
    This is deprecated in Spark 1.0+.

Please instead use:

  • ./spark-submit with --driver-class-path to augment the driver classpath

  • spark.executor.extraClassPath to augment the executor classpath

    WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) - Setting 'spark.executor.extraClassPath' to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/:/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes' as a work-around.
    WARN [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logWarning]:70) - Setting 'spark.driver.extraClassPath' to ':/home/zeppelin/incubator-zeppelin/interpreter/spark/dep/
    :/home/zeppelin/incubator-zeppelin/interpreter/spark/:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/lib/::/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/conf:/home/zeppelin/incubator-zeppelin/zeppelin-interpreter/target/classes' as a work-around.
    INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - Changing view acls to: zeppelin
    INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - Changing modify acls to: zeppelin
    INFO [2016-06-03 17:26:57,776]({pool-2-thread-3} Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); users with modify permissions: Set(zeppelin)
    INFO [2016-06-03 17:26:57,784]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'sparkDriver' on port 51559.
    INFO [2016-06-03 17:26:57,843]({sparkDriverActorSystem-akka.actor.default-dispatcher-2} Slf4jLogger.scala[applyOrElse]:80) - Slf4jLogger started
    INFO [2016-06-03 17:26:57,846]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Starting remoting
    INFO [2016-06-03 17:26:57,853]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.28.22.133:34296]
    INFO [2016-06-03 17:26:57,853]({sparkDriverActorSystem-akka.actor.default-dispatcher-3} Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@172.28.22.133:34296]
    INFO [2016-06-03 17:26:57,854]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'sparkDriverActorSystem' on port 34296.
    INFO [2016-06-03 17:26:57,854]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering MapOutputTracker
    INFO [2016-06-03 17:26:57,855]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering BlockManagerMaster
    INFO [2016-06-03 17:26:57,856]({pool-2-thread-3} Logging.scala[logInfo]:58) - Created local directory at /tmp/blockmgr-7aec5561-fd7c-4023-84cf-5ddd3aef88a4
    INFO [2016-06-03 17:26:57,857]({pool-2-thread-3} Logging.scala[logInfo]:58) - MemoryStore started with capacity 528.1 MB
    INFO [2016-06-03 17:26:57,890]({pool-2-thread-3} Logging.scala[logInfo]:58) - Registering OutputCommitCoordinator
    INFO [2016-06-03 17:26:57,897]({pool-2-thread-3} Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
    INFO [2016-06-03 17:26:57,902]({pool-2-thread-3} AbstractConnector.java[doStart]:338) - Started SelectChannelConnector@0.0.0.0:4040
    INFO [2016-06-03 17:26:57,903]({pool-2-thread-3} Logging.scala[logInfo]:58) - Successfully started service 'SparkUI' on port 4040.
    INFO [2016-06-03 17:26:57,904]({pool-2-thread-3} Logging.scala[logInfo]:58) - Started SparkUI at http://172.28.22.133:4040
    INFO [2016-06-03 17:26:57,924]({pool-2-thread-3} Logging.scala[logInfo]:58) - Created default pool default, schedulingMode: FIFO, minShare: 0, weight: 1
    INFO [2016-06-03 17:26:57,955]({pool-2-thread-3} RMProxy.java[createRMProxy]:98) - Connecting to ResourceManager at /0.0.0.0:8032
    INFO [2016-06-03 17:26:58,958]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    INFO [2016-06-03 17:26:59,959]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    INFO [2016-06-03 17:27:00,961]({pool-2-thread-3} Client.java[handleConnectionFailure]:867) - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Hope this helps.

Thanks,
Ben

On Jun 3, 2016, at 10:14 AM, Felix Cheung notifications@github.com wrote:

Can you check the log file?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MdO4CkE120f89WRXAnowf1Y1yXBks5qIGDvgaJpZM4IS7A7.

@b3nbk1m70
Copy link

I saw that the config file zeppelin-env.sh contents are different than previous versions. It uses “set” instead of “export”. So, I tried to use the old config file from Zeppelin 0.5.6 to see if this would change anything. And, it turns out that Spark Scala code was able to run on YARN. This would mean that the new config file is not being read correctly. Now, I am getting another issue. I cannot run Spark SQL “%sql” queries. This is what I get.

ERROR [2016-06-03 20:18:58,230]({SparkListenerBus} Logging.scala[logError]:95) - Listener SQLListener threw an exception
java.lang.NullPointerException
at org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)

Do you know what this means?

Thanks,
Ben

On Jun 3, 2016, at 10:14 AM, Felix Cheung notifications@github.com wrote:

Can you check the log file?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MdO4CkE120f89WRXAnowf1Y1yXBks5qIGDvgaJpZM4IS7A7.

@meniluca
Copy link
Contributor

@bbuild11 do you have an update and can you share your "-conf" file?
Thank you in advance,
Luca

@b3nbk1m70
Copy link

Hi Luca,

Here are the zeppelin-env.sh file contents. I hope this is what you mean?

export JAVA_HOME=/usr/java/latest
export MASTER=yarn-client
export ZEPPELIN_JAVA_OPTS="-Dspark.yarn.queue=root.bi"

export ZEPPELIN_NOTEBOOK_S3_BUCKET=zeppelin-dps2
export ZEPPELIN_NOTEBOOK_S3_USER=gxetl

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

export HADOOP_CONF_DIR=/etc/hadoop/conf

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip
export SPARK_YARN_USER_ENV="PYTHONPATH=$PYTHONPATH"

export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.S3NotebookRepo, com.nflabs.zeppelinhub.notebook.repo.ZeppelinHubRepo"
export ZEPPELINHUB_API_ADDRESS="https://www.zeppelinhub.com"
export ZEPPELINHUB_API_TOKEN=“"

The zeppelin-site.xml has just S3 settings enabled and set like above.

Thanks,
Ben

On Jun 13, 2016, at 8:52 AM, Luca Menichetti notifications@github.com wrote:

@bbuild11 https://github.com/bbuild11 do you have an update and can you share your "-conf" file?
Thank you in advance,
Luca


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9CHBsYeHN6n2CwxTSJLNIvWxGj4xks5qLXzLgaJpZM4IS7A7.

@felixcheung
Copy link
Member

Are folks still having problems with this?

@b3nbk1m70
Copy link

Felix,

I will attempt CDH 5.7.0 again this week. I will try Zeppelin also. If I get any issues, I will let you know.

Thanks,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

@secsubs
Copy link

secsubs commented Jun 13, 2016

I managed to build successfully against CDH 5.7.0 right after the fix was released. Anyone need any specific artifacts? I am using Spark and Drill interpreters.

@kovalenko-boris
Copy link

@secsubs can you publish a short tutorial on how to build the jar?

@secsubs
Copy link

secsubs commented Jun 13, 2016

@kovalenko-boris I used this tutorial:
https://community.mapr.com/docs/DOC-1493

That said, if there is any specific aspect that you need me to zoom in on then I can look.

@felixcheung
Copy link
Member

That was MapR though, you probably want something like this for CDH:

mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

@secsubs
Copy link

secsubs commented Jun 13, 2016

Sorry, my bad, this tutorial from Cloudera blog still works:
http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/

@b3nbk1m70
Copy link

Felix,

I tried again with the latest. I was able to compile. But, I found that there was a jackson incompatibility. So, I removed the jackson jars from the zeppelin-server/target/lib and zeppelin-zengine/target/lib and replaced them with symlinks to the jackson jars in /opt/cloudera/parcels/CDH/jars to use the CDH 5.7.1 versions. Now, I’m getting this error below.

java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is this happening because we are on jdk1.8.0_60 (Java 8)?

Thanks,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

@b3nbk1m70
Copy link

To add…

We have Livy server up and running in our cluster. I tried to use it so we could use Spark independent of Zeppelin. In this way, Zeppelin can interface with any version of Spark maintained separately. Unfortunately, the Livy interpreter does not work. This would have solved things too.

Cheers,
Ben

On Jun 13, 2016, at 11:05 AM, Felix Cheung notifications@github.com wrote:

Are folks still having problems with this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9MTgerppITff4k3RxkYAH5fZlwHjks5qLZvSgaJpZM4IS7A7.

@felixcheung
Copy link
Member

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?

@b3nbk1m70
Copy link

Hi Felix,

All our environments are on jdk 8. We no longer support jdk 7 because Oracle end-of-life’d it. So, we moved to the Cloudera recommended version of jdk 8.

I will submit my livy issue on the other mailing list. Basically, I could not add the livy interpreter because it just doesn’t save.

Thanks,
Ben

On Jun 25, 2016, at 7:18 PM, Felix Cheung notifications@github.com wrote:

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9JoDrBqfPT23EHZR-uxmNk4EjqI5ks5qPeGCgaJpZM4IS7A7.

@b3nbk1m70
Copy link

Felix,

I forgot to add. Zeppelin 0.5.6 seems to work fine with jdk 8. So, I’m thinking that it might be something else specific to CDH 5.7. But, I’m not sure.

Thanks,
Ben

On Jun 25, 2016, at 7:18 PM, Felix Cheung notifications@github.com wrote:

Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project.

As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub #868 (comment), or mute the thread https://github.com/notifications/unsubscribe/ABxu9JoDrBqfPT23EHZR-uxmNk4EjqI5ks5qPeGCgaJpZM4IS7A7.

@felixcheung
Copy link
Member

From the email discussions seem like you are through with the error above. This "ClassNotFoundException" would seem like some sort of mismatch on the Spark side.
Unfortunately I no longer have a way to test CDH, so I'm hoping someone else could test this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants