Skip to content

Conversation

@chuang0221
Copy link
Contributor

Why are these changes needed?

Currently, the type hints for Dataset.map() suggest it supports both direct returns and generators, but in practice generators are not supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

Related issue number

Closes #52279

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: chuang0221 <chuangellow@gmail.com>
@chuang0221 chuang0221 requested a review from a team as a code owner April 19, 2025 13:59
@chuang0221
Copy link
Contributor Author

During investigation, I found that _CallableClassProtocol with Iterator return type also doesn't work with map(), despite what the internal type hint suggests:

# In block.py
class _CallableClassProtocol(Protocol[T, U]):
    def __call__(self, __arg: T) -> Union[U, Iterator[U]]  # Iterator not actually supported for map()

Since this is an internal implementation detail not exposed in the public API, I'm keeping this PR focused on fixing the user-facing documentation. The internal type hint inconsistency could be addressed separately if needed.

@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 21, 2025
@mascharkh mascharkh added the data Ray Data-related issues label Apr 21, 2025
fn: UserDefinedFunction[Dict[str, Any], Dict[str, Any]],
fn: Union[
Callable[[Dict[str, Any]], Dict[str, Any]],
"_CallableClassProtocol",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think we should expose _CallableClassProtocol as a public typehint since this is an internal class.

Can we use instead just the Callable[[Dict[str, Any]], Dict[str, Any]]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the original design here - since UserDefinedFunction is used in other type hints, was there a specific reason for including the internal class _CallableClassProtocol rather than just using Callable in the UserDefinedFunction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i don't have the context there unfortunately.

@richardliaw richardliaw added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Apr 22, 2025
Signed-off-by: chuang0221 <chuangellow@gmail.com>
@richardliaw richardliaw added go add ONLY when ready to merge, run all tests and removed @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. labels May 22, 2025
@github-actions
Copy link

github-actions bot commented Jun 9, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 9, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@github-actions github-actions bot closed this Jun 23, 2025
@richardliaw richardliaw reopened this Sep 16, 2025
@richardliaw richardliaw removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 16, 2025
@richardliaw richardliaw merged commit 0516aa2 into ray-project:master Sep 16, 2025
5 checks passed
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Zhiqiang Ma <zhiqiang.ma@intel.com>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: zac <zac@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes #52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Marco Stephan <marco@magic.dev>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes #52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes #52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…52455)

## Why are these changes needed?

Currently, the type hints for `Dataset.map()` suggest it supports both
direct returns and generators, but in practice generators are not
supported and raise errors. This is misleading for users.

This PR updates the type hints to accurately reflect supported types.

## Related issue number

Closes ray-project#52279

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: chuang0221 <chuangellow@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] [Doc] whether map() supports generator UDF

4 participants