-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Make zeppelin work with CDH5.7.0 #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…et config if variable is filled
…n spark-env.sh and using both is not supported
|
Thanks for sorting this out and contributing a PR. |
|
Hi Felix, No i’m not aware this change is specific for CDH5.7.0. I ran into the issue when installing the latest CDH on my laptop and wanted to experiment with zeppelin a bit. If you need my help in any other way feel free to contact me Cheers Kris
|
|
I'm having a similar issue using CDH 5.7.0 with Zeppelin 0.6.0. This blog link seems to have evidence of some back-porting that was done from Spark 2.0. I hope that there's an easy patch to fix this. |
bin/interpreter.sh
Outdated
|
|
||
| if [[ -n "${SPARK_SUBMIT}" ]]; then | ||
| ${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} & | ||
| #${SPARK_SUBMIT} --class ${ZEPPELIN_SERVER} --driver-class-path "${ZEPPELIN_CLASSPATH_OVERRIDES}:${CLASSPATH}" --driver-java-options "${JAVA_INTP_OPTS}" ${SPARK_SUBMIT_OPTIONS} ${SPARK_APP_JAR} ${PORT} & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate why we need to change this command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the --driver-class-path because it was conflicting with setting the SPARK_CLASSPATH env var in my local environment.
I'm setting the following classpath in spark-env.sh to make spark from cdh5.7.0 work locally with hdfs (in local and speudo distributed mode):
SPARK_CLASSPATH=$(hadoop classpath)
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HIVE_HOME/lib/*
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_PREFIX/share/hadoop/tools/lib/*
having both SPARK_CLASSPATH set and using --driver-class-path is not supported.
Feel free to mark this specific change as only needed for me locally and skip it from the merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's exactly what ZEPPELIN_CLASSPATH_OVERRIDES is for?
In your case you just need to change it to:
ZEPPELIN_CLASSPATH_OVERRIDES=$(hadoop classpath)
ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HIVE_HOME/lib/*
ZEPPELIN_CLASSPATH_OVERRIDES=$ZEPPELIN_CLASSPATH_OVERRIDES:$HADOOP_PREFIX/share/hadoop/tools/lib/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you are saying you need to set SPARK_CLASSPATH for spark-shell or spark-submit?
I think the best way to do that is to check if SPARK_CLASSPATH is set then add it to --driver-class-path here and unset SPARK_CLASSPATH?
Either case, it might be good to separate that into a new JIRA issue.
|
Sure - sounds like we should push this PR. |
…SSPATH in spark-env.sh and using both is not supported" This reverts commit 146b524.
|
Hi Felix, I have reverted that specific commit in the PR. Cheers Kris
|
|
LGTM |
|
@krisgeus thanks - it looks like tests are failing fairly consistently, but not clear what's going on. Would you have some time to take a look, or run them locally to investigate? |
|
Thank you @krisgeus for the workaround, however it is not sufficient for what I'm trying to do, I have an entire cluster in cdh5.7. When this issue can be solved in your opinion? |
|
Has this been merged to master ready for a clean build? Thanks,
|
|
I will try to look into the test error in a couple of days. |
|
@felixcheung not much to elaborate, I would need zeppelin to work with external Spark installation with cdh5.7 :) |
|
Does the fix in this PR make it work for your case? |
|
@felixcheung I've made some progress in the testing area. It appears that spark-core-1.6.0-cdh5.7.0 brings in older akka version 2.2.3 instead of the 2.3.11 mentioned as akka.version property in the spark-1.6 profile. I've temporarily fixed this by adding a exclusion to this profile. Not the way it should end up but a temp solution for me to be able to run more tests successfully. The diff:
Still a few tests failing but getting closer. |
|
Thanks! |
|
I have the exact same issue with CDH-5.7.0 and Zepplin-0.5.6. Is there a workaround available? Thanks. |
|
@secsubs unfortunately no, it seems to be a breaking API change in CDH-5.7 (also in Spark 2.0) |
|
same problem here, merge it please!!! @felixcheung |
|
ok, with the same code changes everything passes except for selenium, which is a known test issue. merging if no more comment |
|
+1 for merge |
|
Please do…
|
### What is this PR for? The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together. ### What type of PR is it? Bug Fix ### Todos ### What is the Jira issue? https://www.mail-archive.com/userszeppelin.incubator.apache.org/msg03471.html ### How should this be tested? Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop) build zeppeling (with this patch) with the following options: mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? Didn't verify * Does this needs documentation? Don't think so Author: Kris Geusebroek <kgeusebroek@KgMBP2015.local> Closes apache#868 from krisgeus/master and squashes the following commits: e33d520 [Kris Geusebroek] Revert "Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported" 488cce6 [Kris Geusebroek] Added logging and comments to clarify reason not throwing exception 146b524 [Kris Geusebroek] Don't use extra driver classpath option since I use SPARK_CLASSPATH in spark-env.sh and using both is not supported 24ea584 [Kris Geusebroek] method classServerUri not available in cdh5.7.0 Spark version. Only set config if variable is filled 50717dd [Kris Geusebroek] Use slf4j instead of parquet bundled one since parquet doesn't bundle it anymore
|
Hi all, I downloaded the latest master clone from github and built it using this command: mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo I got this error running the Zeppelin Tutorial example: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil So, I rebuilt using this command: mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo -Pyarn Now, I get this after a long while: java.net.ConnectException: Connection refused Can someone provide some advice on getting this to work with CDH 5.7.0? Thanks,
|
|
Can you check the log file? |
|
I get this in the Zeppelin log file: ERROR [2016-06-03 16:49:20,929]({qtp519821334-82} NotebookServer.java[onMessage]:210) - Can't handle message And in the Interpreter for Spark log file: INFO [2016-06-03 17:26:57,773]({pool-2-thread-3} SparkInterpreter.java[createSparkContext]:225) - ------ Create new SparkContext yarn-client ------- Please instead use:
Please instead use:
Hope this helps. Thanks,
|
|
I saw that the config file zeppelin-env.sh contents are different than previous versions. It uses “set” instead of “export”. So, I tried to use the old config file from Zeppelin 0.5.6 to see if this would change anything. And, it turns out that Spark Scala code was able to run on YARN. This would mean that the new config file is not being read correctly. Now, I am getting another issue. I cannot run Spark SQL “%sql” queries. This is what I get. ERROR [2016-06-03 20:18:58,230]({SparkListenerBus} Logging.scala[logError]:95) - Listener SQLListener threw an exception Do you know what this means? Thanks,
|
|
@bbuild11 do you have an update and can you share your "-conf" file? |
|
Hi Luca, Here are the zeppelin-env.sh file contents. I hope this is what you mean? export JAVA_HOME=/usr/java/latest export ZEPPELIN_NOTEBOOK_S3_BUCKET=zeppelin-dps2 export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.S3NotebookRepo, com.nflabs.zeppelinhub.notebook.repo.ZeppelinHubRepo" The zeppelin-site.xml has just S3 settings enabled and set like above. Thanks,
|
|
Are folks still having problems with this? |
|
Felix, I will attempt CDH 5.7.0 again this week. I will try Zeppelin also. If I get any issues, I will let you know. Thanks,
|
|
I managed to build successfully against CDH 5.7.0 right after the fix was released. Anyone need any specific artifacts? I am using Spark and Drill interpreters. |
|
@secsubs can you publish a short tutorial on how to build the jar? |
|
@kovalenko-boris I used this tutorial: That said, if there is any specific aspect that you need me to zoom in on then I can look. |
|
That was MapR though, you probably want something like this for CDH: |
|
Sorry, my bad, this tutorial from Cloudera blog still works: |
|
Felix, I tried again with the latest. I was able to compile. But, I found that there was a jackson incompatibility. So, I removed the jackson jars from the zeppelin-server/target/lib and zeppelin-zengine/target/lib and replaced them with symlinks to the jackson jars in /opt/cloudera/parcels/CDH/jars to use the CDH 5.7.1 versions. Now, I’m getting this error below. java.lang.ClassNotFoundException: Is this happening because we are on jdk1.8.0_60 (Java 8)? Thanks,
|
|
To add… We have Livy server up and running in our cluster. I tried to use it so we could use Spark independent of Zeppelin. In this way, Zeppelin can interface with any version of Spark maintained separately. Unfortunately, the Livy interpreter does not work. This would have solved things too. Cheers,
|
|
Hey Ben, are you building Zeppelin with jdk 8 though, I think they have to match your running environment, ideally. I thought we are still having problem with it in the project. As for Livy, what's the problem you have, you can send email to dev@zeppelin to see someone can help you? |
|
Hi Felix, All our environments are on jdk 8. We no longer support jdk 7 because Oracle end-of-life’d it. So, we moved to the Cloudera recommended version of jdk 8. I will submit my livy issue on the other mailing list. Basically, I could not add the livy interpreter because it just doesn’t save. Thanks,
|
|
Felix, I forgot to add. Zeppelin 0.5.6 seems to work fine with jdk 8. So, I’m thinking that it might be something else specific to CDH 5.7. But, I’m not sure. Thanks,
|
|
From the email discussions seem like you are through with the error above. This "ClassNotFoundException" would seem like some sort of mismatch on the Spark side. |
What is this PR for?
The downloadable zeppelin install wasn;t working with CDH 5.7.0. I had to change a few things to make it work nicely together.
What type of PR is it?
Bug Fix
Todos
What is the Jira issue?
https://www.mail-archive.com/users@zeppelin.incubator.apache.org/msg03471.html
How should this be tested?
Install local cdh5.7.0 (see https://github.com/krisgeus/ansible_local_cdh_hadoop)
build zeppeling (with this patch) with the following options:
mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo
Screenshots (if appropriate)
Questions: