Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Spark version to 1.4 #752

Closed
laserson opened this issue Aug 4, 2015 · 23 comments
Closed

Bump Spark version to 1.4 #752

laserson opened this issue Aug 4, 2015 · 23 comments

Comments

@laserson
Copy link
Contributor

laserson commented Aug 4, 2015

This issue will track any progress necessary for that.

See #750.
See #659.

@laserson laserson changed the title Bump Spark version so 1.4 Bump Spark version to 1.4 Aug 4, 2015
@laserson
Copy link
Contributor Author

laserson commented Aug 4, 2015

It seems spark.kryoserializer.buffer.mb has been deprecated.

@laserson
Copy link
Contributor Author

laserson commented Aug 4, 2015

See #751 for a change that needs to be included.

@laserson
Copy link
Contributor Author

laserson commented Aug 4, 2015

Correct

@laserson laserson reopened this Aug 5, 2015
@ryan-williams
Copy link
Member

So, I realized something awkward about this that maybe others have processed but that I hadn't yet: there's spark.version in ADAM's POM, and then there's whatever spark version the user's $SPARK_HOME is, and they're pretty much independent.

The former is only relevant to the spark classes we link against, which AFAIK have not changed in a way we care about since 1.2, and so bumping it or not, in isolation, shouldn't make any difference to anyone.

The latter informs what scripts (e.g. Spark's bin/utils.sh) are available vs. not; it definitely seems like high time to allow $SPARK_HOME to point at a Spark >= 1.4, but there's a question of how/whether to also support $SPARK_HOME pointing at Sparks < 1.4, which is currently being discussed on #754, I guess?

@heuermh
Copy link
Member

heuermh commented Aug 5, 2015

Yep, the compile time dependency shouldn't matter, unless it does. :) E.g. if a binary incompatibility in 1.x versions of Spark or one of its transitive dependencies slips through.

Is #754 backward compatible script-wise with previous versions of Spark? A few simple examples I tried worked for me on Spark 1.3.1. It would be best if we didn't require $SPARK_HOME to be set at all, I think it was only required to find the utils.sh script.

@ryan-williams
Copy link
Member

Good point about transitive dependencies.

If the net result of #754 ends up being that we don't need $SPARK_HOME set and the ADAM scripts work against arbitrary Spark versions >= 1.2, then that seems like a great step forward. I'll try to keep following along there.

@fnothaft
Copy link
Member

fnothaft commented Aug 5, 2015

The transitive dependency thing is both nasty and common unfortunately...

@laserson
Copy link
Contributor Author

laserson commented Aug 5, 2015

SPARK_HOME is also needed to find the spark-submit, spark-shell, and pyspark scripts.

@heuermh
Copy link
Member

heuermh commented Aug 5, 2015

SPARK_HOME is also needed to find the spark-submit, spark-shell, and pyspark scripts.

Would it be a fair assumption that those should be on the user's path?

@laserson
Copy link
Contributor Author

laserson commented Aug 5, 2015

Probably depends on the user. It never is for me, bc I often use different versions of Spark. One option would be to check if SPARK_HOME is set, and if not simply try for whatever is on the path.

@heuermh
Copy link
Member

heuermh commented Aug 5, 2015

One option would be to check if SPARK_HOME is set, and if not simply try for whatever is on the path.

+1

@ryan-williams
Copy link
Member

Ah yea, I don't have spark-{submit,shell} on my $PATH either since I frequently switch Sparks.

Checking both sgtm2.

Is this done now that #754 is in?

Should we bump the POM version? I can file a separate issue for that if necessary; I was just noting while doing README refactoring in #763 / #764 that we say we continuous-build against Spark 1.1.0, our POM says Spark 1.2.0, and we actually support up to Spark 1.4.1 (and likely soon 1.5.0). Any defrag'ing we should do about those?

@fnothaft
Copy link
Member

Well, the POM is 1.4.1 now, right? How about we move CI to 1.4.1 as well? That seems like the simplest solution. I'll prep a PR.

@ryan-williams
Copy link
Member

Well, the POM is 1.4.1 now, right?

Nope! 1.2.0. Unless I am really out of it.

@fnothaft
Copy link
Member

You are correct, nevermind!

I am OK with having the Jenkins sanity test script scripts/jenkins-test check multiple versions of Spark.(e.g., 1.2.1, 1.3.1, 1.4.1) That might be a good idea anyways.

@ryan-williams
Copy link
Member

Having jenkins test a matrix of Sparks sgtm @fnothaft.

I was going to just bump the Spark version in the POM via github's web-edit-file flow, but then I remembered your warnings about transitive deps above. Do you have some system for evaluating the danger of such an upgrade?

@fnothaft
Copy link
Member

Do we want to matrix test Spark at both the build and the executable level? I am OK with either.

I was going to just bump the Spark version in the POM via github's web-edit-file flow, but then I remembered your warnings about transitive deps above. Do you have some system for evaluating the danger of such an upgrade?

The jenkins-test script, which alas, currently only tests a single version of Spark... How about I write an enhancement to our Jenkins flow, and then we merge that, and then we bump the POM?

@ryan-williams
Copy link
Member

Do we want to matrix test Spark at both the build and the executable level? I am OK with either.

"Build level": basically run mvn package?
"Executable level": run some ADAM commands, or?

Upgrading jenkins-test then bumping POM and updating docs sgtm.

Other random question: now that SPARK-8057 is in, will we be able to support Hadoop 1 again in Spark 1.5.0? Do we care / want to? I was just noticing Hadoop-1-specific logic in jenkins-test.

@fnothaft
Copy link
Member

"Build level": basically run mvn package?
"Executable level": run some ADAM commands, or?

Exactly!

Upgrading jenkins-test then bumping POM and updating docs sgtm.

+1

Other random question: now that SPARK-8057 is in, will we be able to support Hadoop 1 again in Spark 1.5.0? Do we care / want to? I was just noticing Hadoop-1-specific logic in jenkins-test.

Since spark-ec2 is Hadoop 1 centric, I'd like to keep testing hooks in that ensure that people can run adam on top of those scripts. I think we would have to waive the Hadoop 1/Spark 1.4.1 combo, but otherwise should be OK. That exclusion is straightforward in Jenkins.

@heuermh
Copy link
Member

heuermh commented Aug 10, 2015

"Build level": basically run mvn package?
"Executable level": run some ADAM commands, or?

Exactly!

Right, some of the potential binary incompatibility issues with transitive dependencies won't show up at build time, and there is a possibility that the classpath in test scope could be different than runtime.

@heuermh
Copy link
Member

heuermh commented Aug 19, 2015

If I have everything right:

#750 has been closed.
I believe #659 can be closed.
Pull request #753 was merged and then rolled back.
Additional Jenkins builds (and another transitive dep fix) were added in #765.
New pull request #778 re-applies change to Spark compile time dependency to version 1.4.1.

@ryan-williams
Copy link
Member

That looks right, I just closed #659

@fnothaft
Copy link
Member

Closed by 7e8eb05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants