Skip to content

Conversation

@zedtang
Copy link
Contributor

@zedtang zedtang commented Jul 22, 2024

What changes were proposed in this pull request?

Introduce clusterBy DataFrameWriter API for Python.

Also fix the issue that listColumns doesn't support V1Table.

Why are the changes needed?

Introduce more ways for users to interact with clustered tables.

Does this PR introduce any user-facing change?

Yes, it introduces a new PySpark DataFrame API to specify clustering columns during write operations.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@zedtang
Copy link
Contributor Author

zedtang commented Jul 22, 2024

This PR depends on #47451

@zedtang zedtang changed the title [SPARK-48762] Introduce clusterBy DataFrameWriter API for Python [SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python Jul 22, 2024
@zedtang zedtang force-pushed the clusterby-python-api branch 2 times, most recently from d5ad45e to d223870 Compare July 26, 2024 01:56
@zedtang zedtang force-pushed the clusterby-python-api branch from d223870 to 974ccfd Compare July 29, 2024 07:58
@zedtang
Copy link
Contributor Author

zedtang commented Jul 29, 2024

Hi @cloud-fan , this PR is ready for review, thanks.

@cloud-fan
Copy link
Contributor

cc @HyukjinKwon as well

@HyukjinKwon
Copy link
Member

Merged to master.

@zedtang zedtang deleted the clusterby-python-api branch July 30, 2024 17:22
lwz9103 pushed a commit to Kyligence/spark that referenced this pull request Mar 27, 2025
Introduce clusterBy DataFrameWriter API for Python.

Also fix the issue that `listColumns` doesn't support `V1Table`.

Introduce more ways for users to interact with clustered tables.

Yes, it introduces a new PySpark DataFrame API to specify clustering columns during write operations.

New unit tests.

No.

Closes apache#47452 from zedtang/clusterby-python-api.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
lwz9103 pushed a commit to Kyligence/spark that referenced this pull request Apr 22, 2025
Introduce clusterBy DataFrameWriter API for Python.

Also fix the issue that `listColumns` doesn't support `V1Table`.

Introduce more ways for users to interact with clustered tables.

Yes, it introduces a new PySpark DataFrame API to specify clustering columns during write operations.

New unit tests.

No.

Closes apache#47452 from zedtang/clusterby-python-api.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants