-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15052][SQL] Use builder pattern to create SparkSession #12830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @yhuai @andrewor14 and also cc @dongjoon-hyun since you have been working on the example files |
|
Thank you for notifying me. It looks good to me. Then, the three-line pattern will be replace into one factory statement, right? Spark 1.x Spark 2.0 |
|
Yes. Technically we don't really reduce the line length, but definitely reduces the number of concepts people need to use if they are just using DataFrame/Dataset. |
|
Yes, right. And, this can reduce the |
| */ | ||
| class Builder { | ||
|
|
||
| private[this] val options = new scala.collection.mutable.HashMap[String, String] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using j.u.c.ConcurrentHashMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It creates a lot of garbage for something that's not expected to be concurrent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, what I meant was moving locking point from Building instance into options. I thought only getOrCreate needs locking on Builder instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, forget about my comments. Builder is so simple and current implementation is solid, too.
|
Test build #57501 has finished for PR 12830 at commit
|
| * Enables Hive support, including connectivity to a persistent Hive metastore, support for | ||
| * Hive serdes, and Hive user-defined functions. | ||
| * | ||
| * @return 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope!
|
LGTM. This is beautiful. |
|
Thanks - going to merge this. I added removing the existing withHiveSupport as a TODO in the pr description. |
## What changes were proposed in this pull request? This patch creates a builder pattern for creating SparkSession. The new code is unused and mostly deadcode. I'm putting it up here for feedback. There are a few TODOs that can be done as follow-up pull requests: - [ ] Update tests to use this - [ ] Update examples to use this - [ ] Clean up SQLContext code w.r.t. this one (i.e. SparkSession shouldn't call into SQLContext.getOrCreate; it should be the other way around) - [ ] Remove SparkSession.withHiveSupport - [ ] Disable the old constructor (by making it private) so the only way to start a SparkSession is through this builder pattern ## How was this patch tested? Part of the future pull request is to clean this up and switch existing tests to use this. Author: Reynold Xin <rxin@databricks.com> Closes #12830 from rxin/sparksession-builder. (cherry picked from commit ca1b219) Signed-off-by: Reynold Xin <rxin@databricks.com>
| * | ||
| * @since 2.0.0 | ||
| */ | ||
| def config(key: String, value: Long): Builder = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other primitive types for the value: Int, Float, Short ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't matter as they just map into Long / Double.
|
Test build #57566 has finished for PR 12830 at commit
|
| /** | ||
| * Builder for [[SparkSession]]. | ||
| */ | ||
| class Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a clear() method so that Builder instance can be reused ?
What changes were proposed in this pull request?
This patch creates a builder pattern for creating SparkSession. The new code is unused and mostly deadcode. I'm putting it up here for feedback.
There are a few TODOs that can be done as follow-up pull requests:
How was this patch tested?
Part of the future pull request is to clean this up and switch existing tests to use this.