[SPARK-15052][SQL] Use builder pattern to create SparkSession #12830

rxin · 2016-05-02T03:49:01Z

What changes were proposed in this pull request?

This patch creates a builder pattern for creating SparkSession. The new code is unused and mostly deadcode. I'm putting it up here for feedback.

There are a few TODOs that can be done as follow-up pull requests:

Update tests to use this
Update examples to use this
Clean up SQLContext code w.r.t. this one (i.e. SparkSession shouldn't call into SQLContext.getOrCreate; it should be the other way around)
Remove SparkSession.withHiveSupport
Disable the old constructor (by making it private) so the only way to start a SparkSession is through this builder pattern

How was this patch tested?

Part of the future pull request is to clean this up and switch existing tests to use this.

rxin · 2016-05-02T03:51:23Z

cc @yhuai @andrewor14 and also cc @dongjoon-hyun since you have been working on the example files

dongjoon-hyun · 2016-05-02T04:12:25Z

Thank you for notifying me. It looks good to me. Then, the three-line pattern will be replace into one factory statement, right?

Spark 1.x

val conf = new SparkConf().setMaster("local[4]").setAppName("App")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

Spark 2.0

val spark = SparkSession.builder().master("local").config("spark.some.config.option", "some-value").getOrCreate()

rxin · 2016-05-02T04:15:28Z

Yes. Technically we don't really reduce the line length, but definitely reduces the number of concepts people need to use if they are just using DataFrame/Dataset.

dongjoon-hyun · 2016-05-02T04:19:32Z

Yes, right. And, this can reduce the import statement for SparkConf and SparkContext for those people. It become much simpler. Cool. I will update my PR accordingly.

dongjoon-hyun · 2016-05-02T04:51:34Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

+   */
+  class Builder {
+
+    private[this] val options = new scala.collection.mutable.HashMap[String, String]


What about using j.u.c.ConcurrentHashMap?

It creates a lot of garbage for something that's not expected to be concurrent.

Oh, what I meant was moving locking point from Building instance into options. I thought only getOrCreate needs locking on Builder instance.

But, forget about my comments. Builder is so simple and current implementation is solid, too.

SparkQA · 2016-05-02T05:14:51Z

Test build #57501 has finished for PR 12830 at commit 8172d91.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-05-02T20:20:53Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

+     * Enables Hive support, including connectivity to a persistent Hive metastore, support for
+     * Hive serdes, and Hive user-defined functions.
+     *
+     * @return 2.0.0


andrewor14 · 2016-05-02T20:25:10Z

LGTM. This is beautiful.

rxin · 2016-05-02T22:26:59Z

Thanks - going to merge this. I added removing the existing withHiveSupport as a TODO in the pr description.

## What changes were proposed in this pull request? This patch creates a builder pattern for creating SparkSession. The new code is unused and mostly deadcode. I'm putting it up here for feedback. There are a few TODOs that can be done as follow-up pull requests: - [ ] Update tests to use this - [ ] Update examples to use this - [ ] Clean up SQLContext code w.r.t. this one (i.e. SparkSession shouldn't call into SQLContext.getOrCreate; it should be the other way around) - [ ] Remove SparkSession.withHiveSupport - [ ] Disable the old constructor (by making it private) so the only way to start a SparkSession is through this builder pattern ## How was this patch tested? Part of the future pull request is to clean this up and switch existing tests to use this. Author: Reynold Xin <rxin@databricks.com> Closes #12830 from rxin/sparksession-builder. (cherry picked from commit ca1b219) Signed-off-by: Reynold Xin <rxin@databricks.com>

tedyu · 2016-05-02T23:20:12Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

+     *
+     * @since 2.0.0
+     */
+    def config(key: String, value: Long): Builder = synchronized {


What about other primitive types for the value: Int, Float, Short ?

They don't matter as they just map into Long / Double.

SparkQA · 2016-05-03T00:30:42Z

Test build #57566 has finished for PR 12830 at commit 0005a3d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-05-04T03:39:34Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

+  /**
+   * Builder for [[SparkSession]].
+   */
+  class Builder {


How about adding a clear() method so that Builder instance can be reused ?

rxin added 2 commits May 1, 2016 20:46

[SPARK-15052][SQL] Use builder pattern to create SparkSession

0e969ec

Finish doc

8172d91

dongjoon-hyun reviewed May 2, 2016
View reviewed changes

andrewor14 reviewed May 2, 2016
View reviewed changes

Fix bug

0005a3d

rxin mentioned this pull request May 2, 2016

[SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. #12809

Closed

asfgit closed this in ca1b219 May 2, 2016

tedyu reviewed May 2, 2016
View reviewed changes

tedyu reviewed May 4, 2016
View reviewed changes

[SPARK-15052][SQL] Use builder pattern to create SparkSession #12830

[SPARK-15052][SQL] Use builder pattern to create SparkSession #12830

Uh oh!

Conversation

rxin commented May 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented May 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 2, 2016

Uh oh!

rxin commented May 2, 2016

Uh oh!

dongjoon-hyun commented May 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented May 2, 2016

Uh oh!

rxin commented May 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rxin commented May 2, 2016 •

edited

Loading

rxin commented May 2, 2016 •

edited

Loading