Skip to content

Conversation

@liancheng
Copy link
Contributor

@liancheng liancheng commented Jul 22, 2016

This PR is based on PR #14098 authored by @wangmiao1981.

What changes were proposed in this pull request?

This PR replaces the original Python Spark SQL example file with the following three files:

  • sql/basic.py

    Demonstrates basic Spark SQL features.

  • sql/datasource.py

    Demonstrates various Spark SQL data sources.

  • sql/hive.py

    Demonstrates Spark SQL Hive interaction.

This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the include_example Liquid template tag.

How was this patch tested?

Manually tested.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #62724 has finished for PR 14317 at commit 5849497.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #62725 has finished for PR 14317 at commit ba8aae4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

@JoshRosen Would you mind to have a look at this? Thanks!

@rxin
Copy link
Contributor

rxin commented Jul 23, 2016

Merging in master/2.0.

asfgit pushed a commit that referenced this pull request Jul 23, 2016
… Python language binding

This PR is based on PR #14098 authored by wangmiao1981.

## What changes were proposed in this pull request?

This PR replaces the original Python Spark SQL example file with the following three files:

- `sql/basic.py`

  Demonstrates basic Spark SQL features.

- `sql/datasource.py`

  Demonstrates various Spark SQL data sources.

- `sql/hive.py`

  Demonstrates Spark SQL Hive interaction.

This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.

## How was this patch tested?

Manually tested.

Author: wm624@hotmail.com <wm624@hotmail.com>
Author: Cheng Lian <lian@databricks.com>

Closes #14317 from liancheng/py-examples-update.

(cherry picked from commit 53b2456)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`:

{% include_example init_session python/sql.py %}
{% include_example init_session python/sql/basic.py %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name is not consistent with Scala and Java version. The file names are SparkSQLExample.scala and SparkSQLExample.java. The Hive and Data Source examples file names are not consistent either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Scala and Java, it's a convention that the file name should be the same as the (major) class defined in the file, while camel case file name doesn't conform to Python code convention. You may check other PySpark file names in the repo as a reference.

@asfgit asfgit closed this in 53b2456 Jul 23, 2016
# +-------+

# Select everybody, but increment the age by 1
df.select(df['name'], df['age'] + 1).show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use col('...'). I have tested it and it works.

Copy link
Contributor Author

@liancheng liancheng Jul 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I know I brought up this issue, but it is still in question... Although df['...'] has potential issue with self-join, it is the way Pandas DataFrame works. Considering we've tried to workaround various self-join corner cases within Catalyst, now I tend to preserve it as is. Maybe we'll deprecate this syntax later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants