Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer methods for Dataset to cover the most common mechanisms for moving data between partitions #245

Open
karlhigley opened this issue Mar 13, 2023 · 0 comments
Assignees
Labels
api Changes or tweaks to the Core API chore Maintenance for the repository clean up

Comments

@karlhigley
Copy link
Contributor

karlhigley commented Mar 13, 2023

The proposed methods would be shuffle_by_keys, sort_by_keys, and group_by_keys. Right now, we only have shuffle_by_keys.

@rjzamora says:

exposing a clear space for documentation is probably the best reason to add it. That documentation should also clarify that these global operations (requiring inter-partition data movement) should be avoided unless absolutely necessary 🙂

@karlhigley karlhigley added clean up chore Maintenance for the repository api Changes or tweaks to the Core API labels Mar 13, 2023
@rjzamora rjzamora self-assigned this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Changes or tweaks to the Core API chore Maintenance for the repository clean up
Projects
None yet
Development

No branches or pull requests

2 participants