Skip to content

_normalize_keyword_aggregation` named agg helper function should be made public #28472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arpit1997 opened this issue Sep 16, 2019 · 10 comments · Fixed by #51705
Closed

_normalize_keyword_aggregation` named agg helper function should be made public #28472

arpit1997 opened this issue Sep 16, 2019 · 10 comments · Fixed by #51705
Assignees
Labels
API Design good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions

Comments

@arpit1997
Copy link

Slightly related to #28380

Currently an issue in Dask dask/dask#5294 for implementing Named Aggregation (introduced in pandas 0.25.0) is open. To implement this it needs to use _normalize_keyword_aggregation and _is_multi_agg_with_relabel.

Making it public would be useful for frameworks like dask mars

cc: @TomAugspurger

@TomAugspurger
Copy link
Contributor

And as an alternative to making it part of the official public API for end-users, we could add a test to ensuring that it's implementation doesn't move, so that projects like dask can rely on it.

That last option conflicts a bit our desire to deprecate all of pandas.core though.

@zbrookle
Copy link
Contributor

@TomAugspurger Could I work on this? I'm interested in seeing this moved forward since I need the named aggregation feature in Dask to incorporate it into the dataframe_sql framework that I'm building

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 18, 2020 via email

@jbrockmendel
Copy link
Member

@rhshadrach these now live in core.apply, so im declaring this your call.

@rhshadrach
Copy link
Member

The linked issue is now closed, with dask implementing it using reconstruct_func (which calls normalize_keyword_aggregation). With this, I'm okay with considering reconstruct_func public and adding tests, but I don't think we should adding it to the API.

cc @mroeschke for any thoughts.

@rhshadrach rhshadrach added the Needs Tests Unit test(s) needed to prevent regressions label Feb 8, 2023
@jbrockmendel
Copy link
Member

+1 for something pseudo-public

@mroeschke
Copy link
Member

+1 as well for exposing this publicly somehow

@rhshadrach
Copy link
Member

What would be the reason for moving them?

@renatocmaciel
Copy link
Contributor

renatocmaciel commented Feb 24, 2023

The only path that explicitly mentions pseudo-public api is internals.
reconstruct_func , is_multi_agg_with_relabel and normalize_keyword_aggregation are accessible as they do not start with _ or __.

What are your thoughts on this issue? Should they be documented and made available on pandas/core/api.py or we only need to add tests?

@rhshadrach
Copy link
Member

rhshadrach commented Feb 24, 2023

This is not widely know, but the API documentation (https://pandas.pydata.org/pandas-docs/dev/reference/index.html) states:

The pandas.core, pandas.compat, and pandas.util top-level modules are PRIVATE. Stable functionality in such modules is not guaranteed.

As such, to ensure we don't accidentally break dask, I think it would be sufficient to add a test for just reconstruct_func within pandas/tests/apply with a comment to the effect of "Test is to ensure this method isn't moved; it is used by other libraries (e.g. dask)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants