-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] - Make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) #57281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] - Make AggregateFnV2 generic over accumulator/result (Generic[AggType, U]) #57281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly makes AggregateFnV2 a generic class, which is a great improvement for type safety. By inheriting from Generic[AggType, U], subclasses can now specify concrete types for the accumulator and result, allowing static type checkers like mypy to perform more robust validation.
To fully leverage this change, I recommend specifying the concrete generic types for all subclasses of AggregateFnV2 in ray/data/aggregate.py. For example, Count could be updated to class Count(AggregateFnV2[int, int]):. Applying this to all subclasses (Sum, Min, Max, etc.) would complete this typing enhancement. As this is outside the lines of the current diff, I'm mentioning it here as a suggestion for a follow-up.
The changes in this PR are correct and a good step forward. No further comments on the current changes.
db34450 to
b2e09a6
Compare
b2e09a6 to
16b0bc3
Compare
9a5517b to
8251e55
Compare
…gType, U]) Signed-off-by: Arthur <atte.book@gmail.com>
8251e55 to
3a05b2a
Compare
python/ray/data/aggregate.py
Outdated
|
|
||
| @PublicAPI | ||
| class Min(AggregateFnV2): | ||
| class Min(AggregateFnV2[Union[int, float], Union[int, float]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for min, max aggregates it's not guaranteed that we will have int and float types for the result/accumulator cause it ultimately depends on the underlying pyarrow type of the column.
I recommend leaving this as a generic type and on instantiation specify the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the value must at least be comparable be with a float because the (hard coded) default factory is a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, which default factory are you referring to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
zero_factory=lambda: float("+inf"),on line 428
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ @ArthurBook Can you please update this thread? Are you referring to the zero_factory? We should ideally change that to reflect the actual type of the column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @goutamvenkat-anyscale I can adjust it, but I'm currently not fully in the clear how we want to address this without changing the actual runtime code.
I believe that current implementation would fail for other than floats and ints because they will initially be compared with a float.
My intention with this PR was only to fix type hints and leave runtime untouched.
Let me know what you think and I'll happily add it in though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to support any type, we can move the zero_factory to an __init__ kwarg and then tie the TypeVar to a SupportsRichComparison protocol that we define ourselves.
8c08234 to
28fd4bf
Compare
Signed-off-by: Arthur <atte.book@gmail.com>
28fd4bf to
bf2686b
Compare
Signed-off-by: Arthur <atte.book@gmail.com>
Signed-off-by: Arthur <atte.book@gmail.com>
9949a02 to
e87c298
Compare
Signed-off-by: Arthur <atte.book@gmail.com>
e87c298 to
fc00f22
Compare
|
Hey @goutamvenkat-anyscale, I was out on vacation but now back! |
Signed-off-by: Arthur <atte.book@gmail.com>
9600612 to
b443078
Compare
goutamvenkat-anyscale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Can you please address the merge conflict @ArthurBook ? |
Signed-off-by: Arthur <atte.book@gmail.com>
|
@goutamvenkat-anyscale done! |
Signed-off-by: Arthur <atte.book@gmail.com>
alexeykudinkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @ArthurBook!
angelinalg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp
…AggType, U]) (ray-project#57281) ## Why are these changes needed? The current Generic types in `AggregateFnV2` are not tied to the class, so they are not picked up properly by static type checkers such as mypy. <!-- Please give a short summary of the change and the problem this solves. --> By adding the Generic[] in the class definition, we get full type checking support. ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [N/A] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [N/A] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Arthur <atte.book@gmail.com> Co-authored-by: Goutam <goutam@anyscale.com>
…AggType, U]) (ray-project#57281) ## Why are these changes needed? The current Generic types in `AggregateFnV2` are not tied to the class, so they are not picked up properly by static type checkers such as mypy. <!-- Please give a short summary of the change and the problem this solves. --> By adding the Generic[] in the class definition, we get full type checking support. ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [N/A] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [N/A] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Arthur <atte.book@gmail.com> Co-authored-by: Goutam <goutam@anyscale.com>
…AggType, U]) (ray-project#57281) ## Why are these changes needed? The current Generic types in `AggregateFnV2` are not tied to the class, so they are not picked up properly by static type checkers such as mypy. <!-- Please give a short summary of the change and the problem this solves. --> By adding the Generic[] in the class definition, we get full type checking support. ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [N/A] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [N/A] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Arthur <atte.book@gmail.com> Co-authored-by: Goutam <goutam@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Why are these changes needed?
The current Generic types in
AggregateFnV2are not tied to the class, so they are not picked up properly by static type checkers such as mypy.By adding the Generic[] in the class definition, we get full type checking support.
Checks
git commit -s) in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.