-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16380][SQL][Example]:Update SQL examples and programming guide for Python language binding #14098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@liancheng Can you review it? Thanks! |
|
Test build #61939 has finished for PR 14098 at commit
|
|
Test build #61940 has finished for PR 14098 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin @cloud-fan Shall we deprecate, or at least not recommend, using df['...'] to reference columns since this may lead to potential ambiguity in self-join cases? I'm thinking about replacing it with col('...') in all Python example code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks for doing this! Overall it's pretty nice. A few high level comments:
|
|
@liancheng Thanks for your review! I will address your comments asap. Currently, I am working on a ML wrapper for R. |
|
Not completed. Please hold on for review. |
|
Test build #62405 has finished for PR 14098 at commit
|
|
Test build #62413 has finished for PR 14098 at commit
|
|
Test build #62415 has finished for PR 14098 at commit
|
|
Test build #62416 has finished for PR 14098 at commit
|
docs/sql-programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original file name sql.py should be fine. Actually, the new name SparkSQLExample.py doesn't conform to Python convention. (You may check other file names under examples/src/main/python.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the file path is wrong. The actual path is python/sql/SparkSqlExample.py.
|
@wangmiao1981 Is this ready for review now? Also, please update the PR title to: |
|
@wangmiao1981 I guess it's not ready yet. You may put a |
|
Please be careful with case sensitivity. It broke the release candidate last time. |
|
Yea, especially on case insensitive OS'es like Mac and Windows, the doc actually builds successfully even when cases of the example file names don't match. I guess that's probably why we missed SPARK-16553. |
|
@liancheng Sorry for replying late. I was on vacation last a few days. I have addressed most of your comments. Only the .md file is not updated yet. By the way, I am trying to make the hive example work, but I still can not get it work. Any suggestions? I found that pyspark sql is different from the corresponding scala hive example. Thanks! Miao |
|
Test build #62648 has finished for PR 14098 at commit
|
|
Test build #62649 has finished for PR 14098 at commit
|
|
@liancheng I addressed all your comments. Except: 1). 2-spaces indents; I tried it, but it failed on python style tests. So I leave it 4-spaces indents; 2) It is ready to review now. Thanks! |
|
Test build #62651 has finished for PR 14098 at commit
|
|
@wangmiao1981 Thanks for working on this. For the Hive example, I guess you probably forgot to call I just opened PR #14317 based on your work. Sorry that I didn't realize that you finished working on this PR before sending #14317. However, that one also addressed more minor styling issues and added a separate Hive example file. Would you mind to help review it? I will attribute this work to you in #14317. |
|
@liancheng Thanks! I will review the PR #14317 |
… Python language binding This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update. (cherry picked from commit 53b2456) Signed-off-by: Reynold Xin <rxin@databricks.com>
… Python language binding This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update.
|
As #14317 has been merged, I close this PR. |
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
In current sql-programming-guide.md, Python examples are hard coded in the md file.
I update the file by adding a separate SparkSQLExample.py as ml examples.
In this file, I included all working and hard-coded examples as a self-contained application, except for Hive examples. For example, spark.refershtable, which doesn't exist in SparkSession. We can revisit these examples and put it in the self-contained application.
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Manual tests:
./bin/spark-submit examples/src/main/python/SparkSQLExample.py
Build docs and check generated document including correct examples as ml document.