Skip to content

Conversation

@ArthurBook
Copy link
Contributor

Why are these changes needed?

The current Generic types in AggregateFnV2 are not tied to the class, so they are not picked up properly by static type checkers such as mypy.

By adding the Generic[] in the class definition, we get full type checking support.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • [N/A] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • [N/A] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ArthurBook ArthurBook requested a review from a team as a code owner October 7, 2025 21:30
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly makes AggregateFnV2 a generic class, which is a great improvement for type safety. By inheriting from Generic[AggType, U], subclasses can now specify concrete types for the accumulator and result, allowing static type checkers like mypy to perform more robust validation.

To fully leverage this change, I recommend specifying the concrete generic types for all subclasses of AggregateFnV2 in ray/data/aggregate.py. For example, Count could be updated to class Count(AggregateFnV2[int, int]):. Applying this to all subclasses (Sum, Min, Max, etc.) would complete this typing enhancement. As this is outside the lines of the current diff, I'm mentioning it here as a suggestion for a follow-up.

The changes in this PR are correct and a good step forward. No further comments on the current changes.

@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from db34450 to b2e09a6 Compare October 7, 2025 21:37
cursor[bot]

This comment was marked as outdated.

@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from b2e09a6 to 16b0bc3 Compare October 7, 2025 21:48
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Oct 8, 2025
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch 4 times, most recently from 9a5517b to 8251e55 Compare October 10, 2025 18:40
@ArthurBook ArthurBook requested a review from a team as a code owner October 10, 2025 18:40
…gType, U])

Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from 8251e55 to 3a05b2a Compare October 10, 2025 19:08

@PublicAPI
class Min(AggregateFnV2):
class Min(AggregateFnV2[Union[int, float], Union[int, float]]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for min, max aggregates it's not guaranteed that we will have int and float types for the result/accumulator cause it ultimately depends on the underlying pyarrow type of the column.

I recommend leaving this as a generic type and on instantiation specify the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the value must at least be comparable be with a float because the (hard coded) default factory is a float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, which default factory are you referring to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The

            zero_factory=lambda: float("+inf"),

on line 428

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ @ArthurBook Can you please update this thread? Are you referring to the zero_factory? We should ideally change that to reflect the actual type of the column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goutamvenkat-anyscale I can adjust it, but I'm currently not fully in the clear how we want to address this without changing the actual runtime code.

I believe that current implementation would fail for other than floats and ints because they will initially be compared with a float.

My intention with this PR was only to fix type hints and leave runtime untouched.

Let me know what you think and I'll happily add it in though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to support any type, we can move the zero_factory to an __init__ kwarg and then tie the TypeVar to a SupportsRichComparison protocol that we define ourselves.

cursor[bot]

This comment was marked as outdated.

@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from 8c08234 to 28fd4bf Compare October 10, 2025 21:19
Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from 28fd4bf to bf2686b Compare October 10, 2025 21:20
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title style: make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) [Data] - make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) Oct 10, 2025
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) [Data] - Make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) Oct 10, 2025
cursor[bot]

This comment was marked as outdated.

ArthurBook added a commit to ArthurBook/ray that referenced this pull request Oct 10, 2025
Signed-off-by: Arthur <atte.book@gmail.com>
ArthurBook added a commit to ArthurBook/ray that referenced this pull request Oct 10, 2025
Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from 9949a02 to e87c298 Compare October 10, 2025 21:28
Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from e87c298 to fc00f22 Compare October 10, 2025 21:29
@ArthurBook
Copy link
Contributor Author

Hey @goutamvenkat-anyscale, I was out on vacation but now back!
LMK if this one is good to go?

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook ArthurBook force-pushed the arthurbook/aggregatefnv2-typehints branch from 9600612 to b443078 Compare October 23, 2025 22:27
Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@goutamvenkat-anyscale
Copy link
Contributor

Can you please address the merge conflict @ArthurBook ?

Signed-off-by: Arthur <atte.book@gmail.com>
@ArthurBook
Copy link
Contributor Author

@goutamvenkat-anyscale done!

@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Oct 27, 2025
Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @ArthurBook!

@alexeykudinkin alexeykudinkin enabled auto-merge (squash) October 28, 2025 18:48
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp

@alexeykudinkin alexeykudinkin merged commit 9e8b291 into ray-project:master Oct 28, 2025
7 checks passed
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
…AggType, U]) (ray-project#57281)

## Why are these changes needed?
The current Generic types in `AggregateFnV2` are not tied to the class,
so they are not picked up properly by static type checkers such as mypy.

<!-- Please give a short summary of the change and the problem this
solves. -->
By adding the Generic[] in the class definition, we get full type
checking support.


## Checks
- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [N/A] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [N/A] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Arthur <atte.book@gmail.com>
Co-authored-by: Goutam <goutam@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…AggType, U]) (ray-project#57281)

## Why are these changes needed?
The current Generic types in `AggregateFnV2` are not tied to the class,
so they are not picked up properly by static type checkers such as mypy.

<!-- Please give a short summary of the change and the problem this
solves. -->
By adding the Generic[] in the class definition, we get full type
checking support.


## Checks
- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [N/A] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [N/A] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Arthur <atte.book@gmail.com>
Co-authored-by: Goutam <goutam@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…AggType, U]) (ray-project#57281)

## Why are these changes needed?
The current Generic types in `AggregateFnV2` are not tied to the class,
so they are not picked up properly by static type checkers such as mypy.

<!-- Please give a short summary of the change and the problem this
solves. -->
By adding the Generic[] in the class definition, we get full type
checking support.

## Checks
- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [N/A] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [N/A] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Arthur <atte.book@gmail.com>
Co-authored-by: Goutam <goutam@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants