Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49734][PYTHON] Add seed argument for function shuffle #48184

Closed
wants to merge 4 commits into from

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

1, Add seed argument for function shuffle;
2, Rewrite and enable the doctest by specify the seed and control the partitioning;

Why are the changes needed?

feature parity, seed is support in SQL side

Does this PR introduce any user-facing change?

yes, new argument

How was this patch tested?

updated doctest

Was this patch authored or co-authored using generative AI tooling?

no

@@ -17723,7 +17723,7 @@ def array_sort(


@_try_remote_functions
def shuffle(col: "ColumnOrName") -> Column:
def shuffle(col: "ColumnOrName", seed: Optional[Union[Column, int]] = None) -> Column:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why do we want to support column type here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually written on the top. e.g., in case we support seed as non-foldable expression in the future.

* @group array_funcs
* @since 4.0.0
*/
def shuffle(e: Column, seed: Long): Column = shuffle(e, lit(seed))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't actually add this in scala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me remove it

@zhengruifeng
Copy link
Contributor Author

merged to master

@zhengruifeng zhengruifeng deleted the py_func_shuffle branch September 23, 2024 02:46
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
1, Add `seed` argument for function `shuffle`;
2, Rewrite and enable the doctest by specify the seed and control the partitioning;

### Why are the changes needed?
feature parity, seed is support in SQL side

### Does this PR introduce _any_ user-facing change?
yes, new argument

### How was this patch tested?
updated doctest

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48184 from zhengruifeng/py_func_shuffle.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?
1, Add `seed` argument for function `shuffle`;
2, Rewrite and enable the doctest by specify the seed and control the partitioning;

### Why are the changes needed?
feature parity, seed is support in SQL side

### Does this PR introduce _any_ user-facing change?
yes, new argument

### How was this patch tested?
updated doctest

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48184 from zhengruifeng/py_func_shuffle.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants