[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

zedtang · 2024-07-22T19:55:42Z

What changes were proposed in this pull request?

Introduce clusterBy DataFrameWriter API for Python.

Also fix the issue that listColumns doesn't support V1Table.

Why are the changes needed?

Introduce more ways for users to interact with clustered tables.

Does this PR introduce any user-facing change?

Yes, it introduces a new PySpark DataFrame API to specify clustering columns during write operations.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

zedtang · 2024-07-22T19:56:01Z

This PR depends on #47451

zedtang · 2024-07-29T21:01:53Z

Hi @cloud-fan , this PR is ready for review, thanks.

cloud-fan · 2024-07-30T03:23:40Z

cc @HyukjinKwon as well

HyukjinKwon · 2024-07-30T15:49:46Z

Merged to master.

Introduce clusterBy DataFrameWriter API for Python. Also fix the issue that `listColumns` doesn't support `V1Table`. Introduce more ways for users to interact with clustered tables. Yes, it introduces a new PySpark DataFrame API to specify clustering columns during write operations. New unit tests. No. Closes apache#47452 from zedtang/clusterby-python-api. Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

github-actions bot added SQL BUILD PYTHON R CONNECT labels Jul 22, 2024

zedtang changed the title ~~[SPARK-48762] Introduce clusterBy DataFrameWriter API for Python~~ [SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python Jul 22, 2024

zedtang force-pushed the clusterby-python-api branch 2 times, most recently from d5ad45e to d223870 Compare July 26, 2024 01:56

github-actions bot removed BUILD R labels Jul 26, 2024

zedtang added 2 commits July 29, 2024 00:57

Support clusterBy DataFrame API for Python

10cea70

dev/reformat-python

974ccfd

zedtang force-pushed the clusterby-python-api branch from d223870 to 974ccfd Compare July 29, 2024 07:58

zedtang added 2 commits July 29, 2024 08:47

fix tests

ef58c29

fix tests

5152785

fix tests

62866e0

cloud-fan approved these changes Jul 30, 2024

View reviewed changes

HyukjinKwon approved these changes Jul 30, 2024

View reviewed changes

HyukjinKwon closed this in 33e463e Jul 30, 2024

zedtang deleted the clusterby-python-api branch July 30, 2024 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

Uh oh!

zedtang commented Jul 22, 2024 •

edited

Loading

Uh oh!

zedtang commented Jul 22, 2024

Uh oh!

zedtang commented Jul 29, 2024

Uh oh!

cloud-fan commented Jul 30, 2024

Uh oh!

HyukjinKwon commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

Uh oh!

Conversation

zedtang commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zedtang commented Jul 22, 2024

Uh oh!

zedtang commented Jul 29, 2024

Uh oh!

cloud-fan commented Jul 30, 2024

Uh oh!

HyukjinKwon commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zedtang commented Jul 22, 2024 •

edited

Loading