Skip to content

Conversation

@wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
In current sql-programming-guide.md, Python examples are hard coded in the md file.

I update the file by adding a separate SparkSQLExample.py as ml examples.

In this file, I included all working and hard-coded examples as a self-contained application, except for Hive examples. For example, spark.refershtable, which doesn't exist in SparkSession. We can revisit these examples and put it in the self-contained application.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Manual tests:
./bin/spark-submit examples/src/main/python/SparkSQLExample.py
Build docs and check generated document including correct examples as ml document.

@wangmiao1981
Copy link
Contributor Author

@liancheng Can you review it? Thanks!

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61939 has finished for PR 14098 at commit 94df090.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61940 has finished for PR 14098 at commit d92d933.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin @cloud-fan Shall we deprecate, or at least not recommend, using df['...'] to reference columns since this may lead to potential ambiguity in self-join cases? I'm thinking about replacing it with col('...') in all Python example code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng
Copy link
Contributor

Thanks for doing this! Overall it's pretty nice. A few high level comments:

  1. It might be better to split the whole example file into several methods, as what [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119 did. In this way, the code can be easier to follow, and we can avoid using variable names like df1, df2, and df3.
  2. Could you please add actual output after each .show() call? I think you can simply copy them from [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119.
  3. Could you please update example snippet label names to match those used in [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programming guide and examples #14119?

@wangmiao1981
Copy link
Contributor Author

@liancheng Thanks for your review! I will address your comments asap. Currently, I am working on a ML wrapper for R.

@wangmiao1981
Copy link
Contributor Author

Not completed. Please hold on for review.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62405 has finished for PR 14098 at commit 8ed04fa.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62413 has finished for PR 14098 at commit 8d94dd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62415 has finished for PR 14098 at commit 91a2e10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2016

Test build #62416 has finished for PR 14098 at commit 68e65fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original file name sql.py should be fine. Actually, the new name SparkSQLExample.py doesn't conform to Python convention. (You may check other file names under examples/src/main/python.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the file path is wrong. The actual path is python/sql/SparkSqlExample.py.

@liancheng
Copy link
Contributor

@wangmiao1981 Is this ready for review now? Also, please update the PR title to:

[SPARK-16380][SQL][EXAMPLE] Update SQL examples and programming guide for Python language binding

@liancheng
Copy link
Contributor

@wangmiao1981 I guess it's not ready yet. You may put a [WIP] tag in the PR title when it's in WIP status and remove it when it is ready for review.

@rxin
Copy link
Contributor

rxin commented Jul 18, 2016

Please be careful with case sensitivity. It broke the release candidate last time.

@liancheng
Copy link
Contributor

Yea, especially on case insensitive OS'es like Mac and Windows, the doc actually builds successfully even when cases of the example file names don't match. I guess that's probably why we missed SPARK-16553.

@wangmiao1981 wangmiao1981 changed the title [SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding [WIP][SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding Jul 20, 2016
@wangmiao1981
Copy link
Contributor Author

@liancheng Sorry for replying late. I was on vacation last a few days.

I have addressed most of your comments. Only the .md file is not updated yet.

By the way, I am trying to make the hive example work, but I still can not get it work. Any suggestions? I found that pyspark sql is different from the corresponding scala hive example.

Thanks!

Miao

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62648 has finished for PR 14098 at commit ac47d8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62649 has finished for PR 14098 at commit 95b16f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981 wangmiao1981 changed the title [WIP][SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding [SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding Jul 21, 2016
@wangmiao1981
Copy link
Contributor Author

wangmiao1981 commented Jul 21, 2016

@liancheng I addressed all your comments. Except: 1). 2-spaces indents; I tried it, but it failed on python style tests. So I leave it 4-spaces indents; 2) col('...') I haven't changed it yet. Do you have a finial decision on this part? I have the code ready.

It is ready to review now.

Thanks!

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62651 has finished for PR 14098 at commit 8563ecb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@wangmiao1981 Thanks for working on this. For the Hive example, I guess you probably forgot to call enableHiveSupport() over the SparkSession object. And I made a mistake about the indentation, 4-space indentation is OK for Python. Sorry for the trouble...

I just opened PR #14317 based on your work. Sorry that I didn't realize that you finished working on this PR before sending #14317. However, that one also addressed more minor styling issues and added a separate Hive example file. Would you mind to help review it? I will attribute this work to you in #14317.

@wangmiao1981
Copy link
Contributor Author

@liancheng Thanks! I will review the PR #14317

asfgit pushed a commit that referenced this pull request Jul 23, 2016
… Python language binding

This PR is based on PR #14098 authored by wangmiao1981.

## What changes were proposed in this pull request?

This PR replaces the original Python Spark SQL example file with the following three files:

- `sql/basic.py`

  Demonstrates basic Spark SQL features.

- `sql/datasource.py`

  Demonstrates various Spark SQL data sources.

- `sql/hive.py`

  Demonstrates Spark SQL Hive interaction.

This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.

## How was this patch tested?

Manually tested.

Author: wm624@hotmail.com <wm624@hotmail.com>
Author: Cheng Lian <lian@databricks.com>

Closes #14317 from liancheng/py-examples-update.

(cherry picked from commit 53b2456)
Signed-off-by: Reynold Xin <rxin@databricks.com>
asfgit pushed a commit that referenced this pull request Jul 23, 2016
… Python language binding

This PR is based on PR #14098 authored by wangmiao1981.

## What changes were proposed in this pull request?

This PR replaces the original Python Spark SQL example file with the following three files:

- `sql/basic.py`

  Demonstrates basic Spark SQL features.

- `sql/datasource.py`

  Demonstrates various Spark SQL data sources.

- `sql/hive.py`

  Demonstrates Spark SQL Hive interaction.

This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.

## How was this patch tested?

Manually tested.

Author: wm624@hotmail.com <wm624@hotmail.com>
Author: Cheng Lian <lian@databricks.com>

Closes #14317 from liancheng/py-examples-update.
@wangmiao1981
Copy link
Contributor Author

As #14317 has been merged, I close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants