-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. #12809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I thought this is a bug fix, but I think I need to update to use |
|
Are there other examples that we should update? |
|
Oh, sure. May I proceed this PR for all examples and all testsuites together? |
|
If you don't mind, I prefer to do as a single one. |
|
I mean |
|
Test build #57422 has finished for PR 12809 at commit
|
|
Test build #57424 has finished for PR 12809 at commit
|
|
I'd put all the example changes together, and have a separate pr for the test changes. |
|
I see. No problem. Then, I'll use this one for all the example changes (including some fix like here). |
|
As you know, for the example, I need to verify the result manually. |
|
Test build #57438 has finished for PR 12809 at commit
|
|
Test build #57440 has finished for PR 12809 at commit
|
|
Test build #57441 has finished for PR 12809 at commit
|
|
Hi, @rxin . For this issue, I'll add new constructor for Please let me know if there is some problem for this. |
|
Yea that makes sense. For this one, I think we should also have a SparkSession ctor that takes in SparkConf, which calls SparkContext.getOrCreate to load a SparkContext. Then users can use this without declaring two things! |
|
I updated Java examples with SparkSession(JavaSparkContext). For |
|
Now, I addressed all comments so far. |
|
Test build #57453 has finished for PR 12809 at commit
|
|
Test build #57454 has finished for PR 12809 at commit
|
|
Rebased. |
|
Test build #57462 has finished for PR 12809 at commit
|
|
Test build #57465 has finished for PR 12809 at commit
|
|
Hi, @rxin . |
|
Hey @dongjoon-hyun Give me a day or two to think about the API for creating SparkSession. It'd be great to finalize that and then update the examples to reflect that. My current thinking is to have a factory method that can be used to instantiate SparkSession, something like SparkSession
.withMaster("local[4]")
.withConfig("...", "...")
.getOrCreate() |
|
Thank you for feedback, @rxin . Sure! |
|
I've merged #12830 want to update the example using that? |
|
Sure! I'm waiting for it. :) Thank you! |
|
Rebased to resolve conflicts. |
|
Hi, @rxin and @andrewor14 . |
|
Test build #57779 has finished for PR 12809 at commit
|
| SparkConf conf = new SparkConf().setAppName("JavaAFTSurvivalRegressionExample"); | ||
| JavaSparkContext jsc = new JavaSparkContext(conf); | ||
| SQLContext jsql = new SQLContext(jsc); | ||
| SparkSession spark = SparkSession.builder().appName("JavaAFTSurvivalRegressionExample").getOrCreate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line too long, can you break it into multiple lines (here and other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Two or more?
Two Lines
SparkSession spark = SparkSession
.builder().appName("JavaAFTSurvivalRegressionExample").getOrCreate();
More Lines
SparkSession spark = SparkSession
.builder()
.appName("JavaAFTSurvivalRegressionExample")
.getOrCreate();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, thank you for fast review! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more lines looks better to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll use that format for all languages.
|
Thanks @dongjoon-hyun. This is mostly straightforward and it LGTM. I pointed out a few parts that were not totally straightforward for other reviewers. |
|
Thank you for review, again! |
|
I'm going to merge this one first to avoid conflicts since it's such a big patch. |
|
Merging into master 2.0 |
## What changes were proposed in this pull request? This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`. - Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files. - Add `getConf` in Python SparkContext class: `python/pyspark/context.py` - Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**: - `SqlNetworkWordCount.scala` - `JavaSqlNetworkWordCount.java` - `sql_network_wordcount.py` Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue. - `simple_params_example.py` - `aft_survival_regression.py` ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12809 from dongjoon-hyun/SPARK-15031. (cherry picked from commit cdce4e6) Signed-off-by: Andrew Or <andrew@databricks.com>
|
Oh, thank you, @andrewor14 . |
This PR removes `sqlContext` in examples. Actual usage was all replaced in #12809 but there are some in comments. Manual style checking. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13006 from HyukjinKwon/minor-docs. (cherry picked from commit 2992a21) Signed-off-by: Andrew Or <andrew@databricks.com>
…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031. (cherry picked from commit e2ec32d) Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
…with SparkSession ## What changes were proposed in this pull request? It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below: - `simple_params_example.py` - `aft_survival_regression.py` are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet. This PR corrects the example and make this use SparkSession. In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`). According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61. So, in this PR, it sets the equivalent value so that this does not throw an exception. ## How was this patch tested? Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`) Author: hyukjinkwon <gurwls223@gmail.com> Closes #13135 from HyukjinKwon/SPARK-15031.
What changes were proposed in this pull request?
This PR aims to update Scala/Python/Java examples by replacing
SQLContextwith newly addedSparkSession.getConfin Python SparkContext class:python/pyspark/context.pySqlNetworkWordCount.scalaJavaSqlNetworkWordCount.javasql_network_wordcount.pyNow,
SQLContextsare used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.simple_params_example.pyaft_survival_regression.pyHow was this patch tested?
Manual.