Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-yarn package doesn't exist in CDH repo #18

Closed
pliguori opened this issue Mar 17, 2015 · 5 comments
Closed

spark-yarn package doesn't exist in CDH repo #18

pliguori opened this issue Mar 17, 2015 · 5 comments

Comments

@pliguori
Copy link

this is the definition for the cdh532 build:
<spark.version>1.2.0-cdh5.3.2</spark.version>

but inside the cloudera artifactory repository it doesn't exist.

The same applies for the cdh530 or cdh520 builds.

@chrisbennight
Copy link

Is there a separate 'spark-yarn'? I think the regular 'spark-core' package should work (did a quick glance and it seemed like it was build with yarn support)

http://archive-primary.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.3/RPMS/noarch/spark-core-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch.rpm

@pliguori
Copy link
Author

Sorry maybe I wasn't clear enough. I'm talking about the maven artifacts that are defined in the mrgeo pom file.
In the cdh532 profile the pom wants to download the following spark-yarn dependency:

org.apache.spark
spark-yarn_2.10
${spark.version}

Where spark.version is<spark.version>1.2.0-cdh5.3.2</spark.version>
But this version does not exist in the cdh artifactory as you can see here:

https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-yarn_2.10/

@ttislerdg
Copy link
Contributor

For some reason, Cloudera has left out the spark_yarn_2.10 jar from its CDH5.3.x repos. They do exist in previous and more recent versions. To get around the problem, I had to download the spark release from Cloudera (located at http://archive.cloudera.com/cdh5/cdh/5/, specifically http://archive.cloudera.com/cdh5/cdh/5/spark-1.2.0-cdh5.3.2.tar.gz for CDH5.3.2). Unpacking the tar contains all the spark jars. Then take the spark_yarn_2.10 and load it into your local maven repo using a typical mvn install:install-file command.

Let me know if this works for you.

@pliguori
Copy link
Author

I will try soon. In the meantime I spoke to Cloudera support and they recommended to use spark-network-yarn instead of spark-yarn.

@pliguori
Copy link
Author

I can confirm that your trick worked. The workaround suggested by Cloudera doesn'work at all, since spark-network-yarn depends on spark-yarn.
However it seems that Cloudera will redeploy the 1.2 artifact on their repository soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants