[SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding #14098

wangmiao1981 · 2016-07-07T23:29:17Z

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
In current sql-programming-guide.md, Python examples are hard coded in the md file.

I update the file by adding a separate SparkSQLExample.py as ml examples.

In this file, I included all working and hard-coded examples as a self-contained application, except for Hive examples. For example, spark.refershtable, which doesn't exist in SparkSession. We can revisit these examples and put it in the self-contained application.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Manual tests:
./bin/spark-submit examples/src/main/python/SparkSQLExample.py
Build docs and check generated document including correct examples as ml document.

wangmiao1981 · 2016-07-07T23:30:39Z

@liancheng Can you review it? Thanks!

SparkQA · 2016-07-07T23:33:29Z

Test build #61939 has finished for PR 14098 at commit 94df090.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-08T00:10:16Z

Test build #61940 has finished for PR 14098 at commit d92d933.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-07-13T09:00:20Z

examples/src/main/python/SparkSQLExample.py

@rxin @cloud-fan Shall we deprecate, or at least not recommend, using df['...'] to reference columns since this may lead to potential ambiguity in self-join cases? I'm thinking about replacing it with col('...') in all Python example code.

cc @JoshRosen @davies

liancheng · 2016-07-13T09:13:14Z

Thanks for doing this! Overall it's pretty nice. A few high level comments:

It might be better to split the whole example file into several methods, as what [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119 did. In this way, the code can be easier to follow, and we can avoid using variable names like df1, df2, and df3.
Could you please add actual output after each .show() call? I think you can simply copy them from [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119.
Could you please update example snippet label names to match those used in [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119?

wangmiao1981 · 2016-07-13T16:37:37Z

@liancheng Thanks for your review! I will address your comments asap. Currently, I am working on a ML wrapper for R.

wangmiao1981 · 2016-07-16T07:56:37Z

Not completed. Please hold on for review.

SparkQA · 2016-07-16T07:58:27Z

Test build #62405 has finished for PR 14098 at commit 8ed04fa.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-16T21:21:09Z

Test build #62413 has finished for PR 14098 at commit 8d94dd3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-16T22:01:47Z

Test build #62415 has finished for PR 14098 at commit 91a2e10.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-16T22:10:01Z

Test build #62416 has finished for PR 14098 at commit 68e65fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-07-18T06:14:51Z

docs/sql-programming-guide.md

The original file name sql.py should be fine. Actually, the new name SparkSQLExample.py doesn't conform to Python convention. (You may check other file names under examples/src/main/python.)

And the file path is wrong. The actual path is python/sql/SparkSqlExample.py.

liancheng · 2016-07-18T06:16:01Z

@wangmiao1981 Is this ready for review now? Also, please update the PR title to:

[SPARK-16380][SQL][EXAMPLE] Update SQL examples and programming guide for Python language binding

liancheng · 2016-07-18T06:26:22Z

@wangmiao1981 I guess it's not ready yet. You may put a [WIP] tag in the PR title when it's in WIP status and remove it when it is ready for review.

rxin · 2016-07-18T06:56:44Z

Please be careful with case sensitivity. It broke the release candidate last time.

liancheng · 2016-07-18T11:58:55Z

Yea, especially on case insensitive OS'es like Mac and Windows, the doc actually builds successfully even when cases of the example file names don't match. I guess that's probably why we missed SPARK-16553.

wangmiao1981 · 2016-07-20T06:18:43Z

@liancheng Sorry for replying late. I was on vacation last a few days.

I have addressed most of your comments. Only the .md file is not updated yet.

By the way, I am trying to make the hive example work, but I still can not get it work. Any suggestions? I found that pyspark sql is different from the corresponding scala hive example.

Thanks!

Miao

SparkQA · 2016-07-21T02:11:27Z

Test build #62648 has finished for PR 14098 at commit ac47d8d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-21T02:22:45Z

Test build #62649 has finished for PR 14098 at commit 95b16f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2016-07-21T02:26:21Z

@liancheng I addressed all your comments. Except: 1). 2-spaces indents; I tried it, but it failed on python style tests. So I leave it 4-spaces indents; 2) col('...') I haven't changed it yet. Do you have a finial decision on this part? I have the code ready.

It is ready to review now.

Thanks!

SparkQA · 2016-07-21T02:45:32Z

Test build #62651 has finished for PR 14098 at commit 8563ecb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-07-22T12:12:18Z

@wangmiao1981 Thanks for working on this. For the Hive example, I guess you probably forgot to call enableHiveSupport() over the SparkSession object. And I made a mistake about the indentation, 4-space indentation is OK for Python. Sorry for the trouble...

I just opened PR #14317 based on your work. Sorry that I didn't realize that you finished working on this PR before sending #14317. However, that one also addressed more minor styling issues and added a separate Hive example file. Would you mind to help review it? I will attribute this work to you in #14317.

wangmiao1981 · 2016-07-23T18:39:35Z

@liancheng Thanks! I will review the PR #14317

… Python language binding This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update. (cherry picked from commit 53b2456) Signed-off-by: Reynold Xin <rxin@databricks.com>

… Python language binding This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update.

wangmiao1981 · 2016-07-23T18:57:40Z

As #14317 has been merged, I close this PR.

liancheng reviewed Jul 13, 2016
View reviewed changes

wangmiao1981 force-pushed the sql branch from d92d933 to 8ed04fa Compare July 16, 2016 07:55

liancheng reviewed Jul 18, 2016
View reviewed changes

liancheng mentioned this pull request Jul 18, 2016

[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update #14245

Closed

wangmiao1981 changed the title ~~[SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding~~ [WIP][SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding Jul 20, 2016

wangmiao1981 added 8 commits July 20, 2016 17:50

add SQL example in python

e969012

add examples and change md file

0919cfc

fix python style error

79c8b94

address review comments, part 1, not completed, WIP

612d8b7

fix python style error

bf7e793

update examples

2516a5f

move SQL example into sql folder

b88258a

update md file

ac47d8d

wangmiao1981 force-pushed the sql branch from 68e65fa to ac47d8d Compare July 21, 2016 01:40

rename file

95b16f5

change help message

8563ecb

wangmiao1981 changed the title ~~[WIP][SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding~~ [SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding Jul 21, 2016

liancheng mentioned this pull request Jul 22, 2016

[SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python language binding #14317

Closed

wangmiao1981 closed this Jul 23, 2016

[SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding #14098

[SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding #14098

Uh oh!

Conversation

wangmiao1981 commented Jul 7, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

wangmiao1981 commented Jul 7, 2016

Uh oh!

SparkQA commented Jul 7, 2016

Uh oh!

SparkQA commented Jul 8, 2016

Uh oh!

liancheng Jul 13, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng Jul 13, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jul 13, 2016

Uh oh!

wangmiao1981 commented Jul 13, 2016

Uh oh!

wangmiao1981 commented Jul 16, 2016

Uh oh!

SparkQA commented Jul 16, 2016

Uh oh!

SparkQA commented Jul 16, 2016

Uh oh!

SparkQA commented Jul 16, 2016

Uh oh!

SparkQA commented Jul 16, 2016

Uh oh!

liancheng Jul 18, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng Jul 18, 2016

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jul 18, 2016

Uh oh!

liancheng commented Jul 18, 2016

Uh oh!

rxin commented Jul 18, 2016

Uh oh!

liancheng commented Jul 18, 2016

Uh oh!

wangmiao1981 commented Jul 20, 2016

Uh oh!

SparkQA commented Jul 21, 2016

Uh oh!

SparkQA commented Jul 21, 2016

Uh oh!

wangmiao1981 commented Jul 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 21, 2016

Uh oh!

liancheng commented Jul 22, 2016

Uh oh!

wangmiao1981 commented Jul 23, 2016

Uh oh!

wangmiao1981 commented Jul 23, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangmiao1981 commented Jul 21, 2016 •

edited

Loading