Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 30, 2016

What changes were proposed in this pull request?

This PR aims to update Scala/Python/Java examples by replacing SQLContext with newly added SparkSession.

  • Use SparkSession Builder Pattern in 154(Scala 55, Java 52, Python 47) files.
  • Add getConf in Python SparkContext class: python/pyspark/context.py
  • Replace SQLContext Singleton Pattern with SparkSession Singleton Pattern:
    • SqlNetworkWordCount.scala
    • JavaSqlNetworkWordCount.java
    • sql_network_wordcount.py

Now, SQLContexts are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.

  • simple_params_example.py
  • aft_survival_regression.py

How was this patch tested?

Manual.

@dongjoon-hyun
Copy link
Member Author

I thought this is a bug fix, but I think I need to update to use SparkSession, too.
I will add a commit very soon.

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

Are there other examples that we should update?

@dongjoon-hyun
Copy link
Member Author

Oh, sure. May I proceed this PR for all examples and all testsuites together?

@dongjoon-hyun
Copy link
Member Author

If you don't mind, I prefer to do as a single one.
I can update the JIRA and PR description.

@dongjoon-hyun
Copy link
Member Author

I mean SparkSession updating here since I didn't run all examples.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57422 has finished for PR 12809 at commit f2087fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57424 has finished for PR 12809 at commit 7fee97a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

I'd put all the example changes together, and have a separate pr for the test changes.

@dongjoon-hyun
Copy link
Member Author

I see. No problem. Then, I'll use this one for all the example changes (including some fix like here).

@dongjoon-hyun
Copy link
Member Author

As you know, for the example, I need to verify the result manually.
I'll proceed testsuite one first because it's automatically verified while this one is marked as [WIP].

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-15031][EXAMPLE] Fix SQL Python example. [WIP][SPARK-15031][EXAMPLE] Fix SQL Python example. Apr 30, 2016
@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57438 has finished for PR 12809 at commit 94e5be3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-15031][EXAMPLE] Fix SQL Python example. [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python example. Apr 30, 2016
@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57440 has finished for PR 12809 at commit 5ebc6a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57441 has finished for PR 12809 at commit 7a52177.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin . For this issue, I'll add new constructor for SparkSession and proceed to Java examples.

  def this(sparkContext: JavaSparkContext) = this(sparkContext.sc)

Please let me know if there is some problem for this.

@rxin
Copy link
Contributor

rxin commented Apr 30, 2016

Yea that makes sense.

For this one, I think we should also have a SparkSession ctor that takes in SparkConf, which calls SparkContext.getOrCreate to load a SparkContext. Then users can use this without declaring two things!

@dongjoon-hyun
Copy link
Member Author

I updated Java examples with SparkSession(JavaSparkContext). For SparkSession(SparkConf), I'll handle soon.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python example. [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. Apr 30, 2016
@dongjoon-hyun
Copy link
Member Author

Now, I addressed all comments so far.
Thank you for fast reviews, @rxin .

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57453 has finished for PR 12809 at commit 627f418.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaSparkSessionSingleton

@SparkQA
Copy link

SparkQA commented Apr 30, 2016

Test build #57454 has finished for PR 12809 at commit 113d7ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Rebased.

@SparkQA
Copy link

SparkQA commented May 1, 2016

Test build #57462 has finished for PR 12809 at commit c1833bb.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 1, 2016

Test build #57465 has finished for PR 12809 at commit 6ac52c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
Could you merge this PR please?
Or, please let me know if there is something to do more.

@rxin
Copy link
Contributor

rxin commented May 1, 2016

Hey @dongjoon-hyun Give me a day or two to think about the API for creating SparkSession. It'd be great to finalize that and then update the examples to reflect that.

My current thinking is to have a factory method that can be used to instantiate SparkSession, something like

SparkSession
  .withMaster("local[4]")
  .withConfig("...", "...")
  .getOrCreate()

@dongjoon-hyun
Copy link
Member Author

Thank you for feedback, @rxin . Sure!
Also, congratulation for branching 2.0. :)

@rxin
Copy link
Contributor

rxin commented May 2, 2016

I've merged #12830

want to update the example using that?

@dongjoon-hyun
Copy link
Member Author

Sure! I'm waiting for it. :) Thank you!

@dongjoon-hyun
Copy link
Member Author

Rebased to resolve conflicts.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin and @andrewor14 .
Now, Scala/Java/Python examples use new style.
Could you review this PR?

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57779 has finished for PR 12809 at commit 0346dda.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaSparkSessionSingleton

SparkConf conf = new SparkConf().setAppName("JavaAFTSurvivalRegressionExample");
JavaSparkContext jsc = new JavaSparkContext(conf);
SQLContext jsql = new SQLContext(jsc);
SparkSession spark = SparkSession.builder().appName("JavaAFTSurvivalRegressionExample").getOrCreate();
Copy link
Contributor

@andrewor14 andrewor14 May 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long, can you break it into multiple lines (here and other places)

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun May 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Two or more?

Two Lines

SparkSession spark = SparkSession
  .builder().appName("JavaAFTSurvivalRegressionExample").getOrCreate();

More Lines

SparkSession spark = SparkSession
  .builder()
  .appName("JavaAFTSurvivalRegressionExample")
  .getOrCreate();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, thank you for fast review! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more lines looks better to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll use that format for all languages.

@andrewor14
Copy link
Contributor

Thanks @dongjoon-hyun. This is mostly straightforward and it LGTM. I pointed out a few parts that were not totally straightforward for other reviewers.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, again!

@andrewor14
Copy link
Contributor

I'm going to merge this one first to avoid conflicts since it's such a big patch.

@andrewor14
Copy link
Contributor

Merging into master 2.0

@asfgit asfgit closed this in cdce4e6 May 4, 2016
asfgit pushed a commit that referenced this pull request May 4, 2016
## What changes were proposed in this pull request?

This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.

- Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
- Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
- Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
  - `SqlNetworkWordCount.scala`
  - `JavaSqlNetworkWordCount.java`
  - `sql_network_wordcount.py`

Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
- `simple_params_example.py`
- `aft_survival_regression.py`

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12809 from dongjoon-hyun/SPARK-15031.

(cherry picked from commit cdce4e6)
Signed-off-by: Andrew Or <andrew@databricks.com>
@dongjoon-hyun
Copy link
Member Author

Oh, thank you, @andrewor14 .

asfgit pushed a commit that referenced this pull request May 9, 2016
This PR removes `sqlContext` in examples. Actual usage was all replaced in #12809 but there are some in comments.

Manual style checking.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13006 from HyukjinKwon/minor-docs.

(cherry picked from commit 2992a21)
Signed-off-by: Andrew Or <andrew@databricks.com>
asfgit pushed a commit that referenced this pull request May 9, 2016
This PR removes `sqlContext` in examples. Actual usage was all replaced in #12809 but there are some in comments.

Manual style checking.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13006 from HyukjinKwon/minor-docs.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-15031 branch May 12, 2016 01:01
asfgit pushed a commit that referenced this pull request May 19, 2016
…with SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13135 from HyukjinKwon/SPARK-15031.

(cherry picked from commit e2ec32d)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
asfgit pushed a commit that referenced this pull request May 19, 2016
…with SparkSession

## What changes were proposed in this pull request?

It seems most of Python examples were changed to use SparkSession by #12809. This PR said both examples below:

- `simple_params_example.py`
- `aft_survival_regression.py`

are not changed because it dose not work. It seems `aft_survival_regression.py` is changed by #13050 but `simple_params_example.py` is not yet.

This PR corrects the example and make this use SparkSession.

In more detail, it seems `threshold` is replaced to `thresholds` here and there by 5a23213. However, when it calls `lr.fit(training, paramMap)` this overwrites the values. So, `threshold` was 5 and `thresholds` becomes 5.5 (by `1 / (1 + thresholds(0) / thresholds(1)`).

According to the comment below. this is not allowed, https://github.com/apache/spark/blob/354f8f11bd4b20fa99bd67a98da3525fd3d75c81/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L58-L61.

So, in this PR, it sets the equivalent value so that this does not throw an exception.

## How was this patch tested?

Manully (`mvn package -DskipTests && spark-submit simple_params_example.py`)

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #13135 from HyukjinKwon/SPARK-15031.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants