Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Make num_blocks in repartition optional #50997

Merged
merged 11 commits into from
Mar 7, 2025
Merged

Conversation

gvspraveen
Copy link
Contributor

@gvspraveen gvspraveen commented Feb 28, 2025

Why are these changes needed?

  • Make the num_blocks argument optional. So no need to set num_blocks=None when using target_num_rows_per_block.

  • Add type hint for none value

  • Fix formatting in docs page

image

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Praveen Gorthy <praveeng@anyscale.com>
@gvspraveen gvspraveen requested a review from a team as a code owner February 28, 2025 19:43
@raulchen
Copy link
Contributor

raulchen commented Mar 4, 2025

lint is failing

gvspraveen and others added 6 commits March 6, 2025 13:24
Co-authored-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Praveen <gorthypraveen@gmail.com>
Signed-off-by: Praveen Gorthy <praveeng@anyscale.com>
Signed-off-by: Praveen Gorthy <praveeng@anyscale.com>
Copy link
Contributor

@raulchen raulchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the PR title to reflect that num_blocks now defaults to None.

@@ -163,13 +163,13 @@ def test_repartition_target_num_rows_per_block(
4,
10,
False,
"Either `num_blocks` or `target_num_rows_per_block` must be set, but not both.",
"Only one of `num_blocks` or `target_num_rows_per_block` must be set, but not both.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also remove num_blocks=None on Line 132?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. done!

Signed-off-by: Praveen Gorthy <praveeng@anyscale.com>
gvspraveen and others added 3 commits March 7, 2025 01:48
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Praveen <gorthypraveen@gmail.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Praveen <gorthypraveen@gmail.com>
@raulchen raulchen changed the title [Data] Make num_blocks type annotations more clear [Data] Make num_blocks in repartition optional Mar 7, 2025
@raulchen raulchen enabled auto-merge (squash) March 7, 2025 18:16
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Mar 7, 2025
@richardliaw richardliaw added the data Ray Data-related issues label Mar 7, 2025
@raulchen raulchen merged commit 5744bdc into master Mar 7, 2025
7 checks passed
@raulchen raulchen deleted the fix-data-repartion_args branch March 7, 2025 19:06
elimelt pushed a commit to elimelt/ray that referenced this pull request Mar 9, 2025
## Why are these changes needed?

- Make the num_blocks argument optional. So no need to set
`num_blocks=None` when using `target_num_rows_per_block`.

- Add type hint for none value

- Fix formatting in [docs
page](https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.repartition.html)


![image](https://github.com/user-attachments/assets/bfe8a845-3c37-4be6-a2dc-ef78d56c80d4)

---------

Signed-off-by: Praveen Gorthy <praveeng@anyscale.com>
Signed-off-by: Praveen <gorthypraveen@gmail.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
Co-authored-by: Alexey Kudinkin <ak@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants