Skip to content

ExtensionArray.value_counts #22843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomAugspurger opened this issue Sep 26, 2018 · 10 comments
Open

ExtensionArray.value_counts #22843

TomAugspurger opened this issue Sep 26, 2018 · 10 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Deprecate Functionality to remove in pandas Docs ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@TomAugspurger
Copy link
Contributor

Is this required? Right now

if is_extension_array_dtype(values) or is_sparse(values):
# handle Categorical and sparse,
result = Series(values)._values.value_counts(dropna=dropna)
assumes that it's implemented, but we don't document / implement it by default, so we end up with an AttributeError at runtime.

@TomAugspurger TomAugspurger added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 26, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Sep 26, 2018
@jorisvandenbossche
Copy link
Member

Personally I would remove it from the EA interface. value_counts is a very useful method, but IMO does not really fit on an array (returns a Series, while I think the arrays should be somewhat independent from pandas Series concepts), and is also rather easily to do based on _factorize/_values_for_factorize for EAs as a default implementation?

@TomAugspurger
Copy link
Contributor Author

Yes, the main sticking point though is the index. Once we have official support for extension indexes, we should be able to update algorithms.value_counts to do the right thing.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2018

nog against removing value_counts for the time being either until we have ExtensionIndex , which really comes up a lot when you groupby things :.

@jbrockmendel
Copy link
Member

agreed on removing from EAs. It introduces a nasty dependency of the EA code on index/series code.

@jbrockmendel
Copy link
Member

and is also rather easily to do based on _factorize/_values_for_factorize for EAs as a default implementation?

I tried this the other day and found I had to do a lot of special-casing for Categorical and SparseArray. Has anyone else tried this?

@jbrockmendel
Copy link
Member

AFAICT, based on a) trying to implement this and b) previous discussions which I've lost track of:

  1. It has been suggested that this should be doable using factorize, but I have been unable to get this to work.
  2. AFAICT SparseArray and possibly Categorical are going to have to override whatever general-case implementation we come up with.
    • This means that we will need something on the EA to allow subclasses to override the default behavior
  3. The EA should not implement value_counts -> Series, but instead _value_counts->Tuple[EA, ndarray[int]] (or Tuple[EA, IntegerArray])

@jorisvandenbossche
Copy link
Member

Do you remember mote details on the issues you ran into for Categorical?

@jbrockmendel
Copy link
Member

Not off the top of my head, no

@mroeschke mroeschke added Deprecate Functionality to remove in pandas and removed API Design labels Jun 22, 2021
@samgd
Copy link

samgd commented Jul 19, 2021

I'm implementing an ExtensionArray interface and am hitting a failing test due to the lack of value_counts. What is the status on this? Required? Can I mark the test as skip?

@simonjayhawkins simonjayhawkins added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Jun 11, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

AFAICT value_counts is de-facto required but not documented as such. As above I think it'd be great if we could implement it in terms of lower-level methods, but until then I think it needs to be documented as something authors are expected to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Deprecate Functionality to remove in pandas Docs ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

7 participants