Skip to content

Conversation

@asl3
Copy link
Contributor

@asl3 asl3 commented Apr 15, 2025

What changes were proposed in this pull request?

Add PySpark User Guide to OSS Docs webpage. The following sections are included in the v1 User Guide:

  • DataFrames – A View into Your Structured Data by Amanda Liu @asl3
  • A Tour of PySpark Data Types by Takuya Ueshin @ueshin
  • Function Junction – Data Manipulation with PySpark by Hyukjin Kwon @HyukjinKwon
  • Bug Busting – Debugging PySpark by Haejoon Lee @itholic
  • Unleashing UDFs and UDTFs by Ruifeng Zheng @zhengruifeng
  • Old SQL, New Tricks – Working with SQL in PySpark by Allison Wang @allisonwang-db
  • Load and Behold – Data Loading by Xinrong Meng @xinrong-meng

with guidance from Xiao Li @gatorsmile, DB Tsai @dbtsai, and Jules Damji @dmatrix

Why are the changes needed?

Currently, there is a lack of up-to-date PySpark user guides - especially documentation highlighting new 4.0 PySpark features. This user guide aims to develop comprehensive PySpark materials to improve the PySpark user experience.

Does this PR introduce any user-facing change?

Yes - docs update

How was this patch tested?

Existing docs build

Was this patch authored or co-authored using generative AI tooling?

No

@asl3 asl3 changed the title [SPARK-51802] OSS PySpark User Guide Docs [SPARK-51802][PYTHON][DOCS] OSS PySpark User Guide Docs Apr 15, 2025
@HyukjinKwon
Copy link
Member

Merged to master and branch-4.0.

HyukjinKwon pushed a commit that referenced this pull request Apr 29, 2025
### What changes were proposed in this pull request?

Add PySpark User Guide to OSS Docs webpage. The following sections are included in the v1 User Guide:

```
DataFrames - A view into your structured data
A Tour of PySpark Data Types
Function Junction - Data manipulation with PySpark
Bug Busting - Debugging PySpark
Unleashing UDFs and UDTFs
Old SQL, New Tricks
Load and Behold - Data loading
```

### Why are the changes needed?

Currently, there is a lack of up-to-date PySpark user guides - especially documentation highlighting new 4.0 PySpark features. This user guide aims to develop comprehensive PySpark materials to improve the PySpark user experience.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing docs build

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #50589 from asl3/pysparkdocs-adduserguide.

Authored-by: Amanda Liu <amanda.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 02faae5)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
### What changes were proposed in this pull request?

Add PySpark User Guide to OSS Docs webpage. The following sections are included in the v1 User Guide:

```
DataFrames - A view into your structured data
A Tour of PySpark Data Types
Function Junction - Data manipulation with PySpark
Bug Busting - Debugging PySpark
Unleashing UDFs and UDTFs
Old SQL, New Tricks
Load and Behold - Data loading
```

### Why are the changes needed?

Currently, there is a lack of up-to-date PySpark user guides - especially documentation highlighting new 4.0 PySpark features. This user guide aims to develop comprehensive PySpark materials to improve the PySpark user experience.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing docs build

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50589 from asl3/pysparkdocs-adduserguide.

Authored-by: Amanda Liu <amanda.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
### What changes were proposed in this pull request?

Add PySpark User Guide to OSS Docs webpage. The following sections are included in the v1 User Guide:

```
DataFrames - A view into your structured data
A Tour of PySpark Data Types
Function Junction - Data manipulation with PySpark
Bug Busting - Debugging PySpark
Unleashing UDFs and UDTFs
Old SQL, New Tricks
Load and Behold - Data loading
```

### Why are the changes needed?

Currently, there is a lack of up-to-date PySpark user guides - especially documentation highlighting new 4.0 PySpark features. This user guide aims to develop comprehensive PySpark materials to improve the PySpark user experience.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing docs build

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50589 from asl3/pysparkdocs-adduserguide.

Authored-by: Amanda Liu <amanda.liu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 26e478b)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants