Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Update Data progress bars to use row as the iteration unit #46579

Closed
scottjlee opened this issue Jul 11, 2024 · 0 comments · Fixed by #46699 or #46924
Closed

[Data] Update Data progress bars to use row as the iteration unit #46579

scottjlee opened this issue Jul 11, 2024 · 0 comments · Fixed by #46699 or #46924
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@scottjlee
Copy link
Contributor

Description

As a followup to #46432, we want to further improve clarity of the progress bar output. Currently, we use bundle (corresponding to RefBundle) and block (corresponding to Block) as the iteration unit, which are both internal concepts that some Ray Data users may be unfamiliar with.

With some more involved code changes, we can replace these with row, corresponding to rows in the output Dataset. This is the most atomic unit of the Dataset that all users should be well familiar with, since rows are a fundamental concept in almost all data processing libraries.

Code pointers:

Use case

Further improve clarity of Ray Data progress bar

@scottjlee scottjlee added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks triage Needs triage (eg: priority, bug/not-bug, and owning component) data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 11, 2024
scottjlee added a commit that referenced this issue Aug 12, 2024
…tion unit (#46924)

## Why are these changes needed?

close #46579
## Related issue number

## Checks

- [√] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [√] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [√] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [√] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: zhilong <zhilong.chen@mail.mcgill.ca>
Signed-off-by: zhilong <121425509+Bye-legumes@users.noreply.github.com>
Co-authored-by: Scott Lee <scottjlee@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
1 participant