Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

Spark 2.0 Branch / Support [Enhancement] #78

Closed
javadba opened this issue Jun 8, 2016 · 10 comments
Closed

Spark 2.0 Branch / Support [Enhancement] #78

javadba opened this issue Jun 8, 2016 · 10 comments

Comments

@javadba
Copy link
Contributor

javadba commented Jun 8, 2016

I added a PR for Spark 2.0 using the SparkSession instead of SparkContext. In addition the libraries were moved to scala 2.11 and hadoop 2.7.1 to be more in line with spark 2.X direction. The tests were run and pass.

#77

@javadba javadba changed the title Spark 2.0 Branch / Support Spark 2.0 Branch / Support [Enhancement] Jun 8, 2016
@mriduljain
Copy link
Contributor

It would be great if you can make this backwards compatible

@javadba
Copy link
Contributor Author

javadba commented Jun 8, 2016

To be backwards compatible we would need to retain the SparkContext as the
input parameter to all the methods. That is squarely contrary to Spark 2.X
. 2.X is a breaking change - so to be consistent it is a breaking
change here.

Now if you really want a non-breaking change I can add it in a separate
branch. Keep in mind it will be using deprecated api's in that case.

2016-06-08 11:39 GMT-07:00 mriduljain notifications@github.com:

It would be great if you can make this backwards compatible


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#78 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAZEkTCZysTVmES4YnObn_xbIYGWPYiJks5qJwxUgaJpZM4IxOus
.

@javadba
Copy link
Contributor Author

javadba commented Jun 8, 2016

fyi I have created said branch and looking to see if can make this happen. update l8r today.

@javadba
Copy link
Contributor Author

javadba commented Jun 8, 2016

Backwards compatibility is in place. Two methods are provided: using SparkSession or using SparkContext. The updates are in the same branch to simplify our discussion. Feel free to take look at the latest commit on the previously provided PR.

@mriduljain
Copy link
Contributor

Thanks will go through and comment soon

On Wed, Jun 8, 2016 at 1:21 PM, StephenBoesch notifications@github.com
wrote:

Backwards compatibility is in place. Two methods are provided: using
SparkSession or using SparkContext. The updates are in the same branch
to simplify our discussion. Feel free to take look at the latest commit on
the previously provided PR.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#78 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACCTVQcIaGQOIpF-2yn6iqMX67V0bXN0ks5qJyRmgaJpZM4IxOus
.

@javadba
Copy link
Contributor Author

javadba commented Jun 8, 2016

I should clarify: the backwards compatibility applies only to the consuming source code: they do not need to be changed.

The SparkSession class is included and thus this change will not run against Spark 1.X. To truly achieve the backwards compatibility then we would need to add some shell scripts to manipulate the source files. I do not believe that were worth it. Instead maintain in separate Spark 2.X branch. At some point you decide to merge it into main and then the Spark 1.X becomes maintenance mode.

@javadba
Copy link
Contributor Author

javadba commented Jun 8, 2016

A new PR has been opened that simplifies the approach and provides full backwards compatibility via a maven profile. #79
Use

mvn -Dspark2  <actions>  

to use Spark 2.x. The actual versions of the following items are specified in that profile

Spark  Scala Hadoop

@javadba
Copy link
Contributor Author

javadba commented Jun 9, 2016

I found a bug in #79 and am looking into it.

java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/DataFrame;
at com.yahoo.ml.caffe.DataFrameTest$$anonfun$1.apply$mcV$sp(DataFrameTest.scala:51)

@javadba
Copy link
Contributor Author

javadba commented Jun 9, 2016

False alarm! The reason is:

If you first try the spark 1.X via

mvn package

and then try spark 2.X via

mvn -Dspark2 package

it WILL fail.

We need to do clean to get all the stuff recompiled. So the following is required:

mvn -Dspark2  clean package

The tests are passing both under spark 1.x / scala 2.10 and spark 2.x / scala 2.11

@mriduljain
Copy link
Contributor

I guess this pull request has been merged. Closing this. Thanks so much

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants