-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49734][PYTHON] Add seed
argument for function shuffle
#48184
Conversation
@@ -17723,7 +17723,7 @@ def array_sort( | |||
|
|||
|
|||
@_try_remote_functions | |||
def shuffle(col: "ColumnOrName") -> Column: | |||
def shuffle(col: "ColumnOrName", seed: Optional[Union[Column, int]] = None) -> Column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why do we want to support column type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually written on the top. e.g., in case we support seed
as non-foldable expression in the future.
* @group array_funcs | ||
* @since 4.0.0 | ||
*/ | ||
def shuffle(e: Column, seed: Long): Column = shuffle(e, lit(seed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't actually add this in scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me remove it
e8cee6c
to
2c1e130
Compare
merged to master |
### What changes were proposed in this pull request? 1, Add `seed` argument for function `shuffle`; 2, Rewrite and enable the doctest by specify the seed and control the partitioning; ### Why are the changes needed? feature parity, seed is support in SQL side ### Does this PR introduce _any_ user-facing change? yes, new argument ### How was this patch tested? updated doctest ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48184 from zhengruifeng/py_func_shuffle. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? 1, Add `seed` argument for function `shuffle`; 2, Rewrite and enable the doctest by specify the seed and control the partitioning; ### Why are the changes needed? feature parity, seed is support in SQL side ### Does this PR introduce _any_ user-facing change? yes, new argument ### How was this patch tested? updated doctest ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48184 from zhengruifeng/py_func_shuffle. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
1, Add
seed
argument for functionshuffle
;2, Rewrite and enable the doctest by specify the seed and control the partitioning;
Why are the changes needed?
feature parity, seed is support in SQL side
Does this PR introduce any user-facing change?
yes, new argument
How was this patch tested?
updated doctest
Was this patch authored or co-authored using generative AI tooling?
no