-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: len(df.groupby(..., dropna=False))
raises ValueError: Categorical categories cannot be null
#35202
Comments
@ssche thanks for the report! |
take |
… while grouping np.nans. Pandas fails sometimes on groupby with np.nan in the dataframe: pandas-dev/pandas#35202 With this fix, we replace NaN regions (unannotated) with empty string, and then do the grouping. Tests of snipping added, covering NaN in "region" column of the windows dataframe for snipping.
…h NaNs (#216) * bugfix(snipping): features with empty regions return snips filled with NaNs. Related to 1a63791 which turned out to be not complete fix. * bugfix(snipping): Docstrings added, code simplified. * bugfix+tests(snipping): Snipping adapted for pandas failing sometimes while grouping np.nans. Pandas fails sometimes on groupby with np.nan in the dataframe: pandas-dev/pandas#35202 With this fix, we replace NaN regions (unannotated) with empty string, and then do the grouping. Tests of snipping added, covering NaN in "region" column of the windows dataframe for snipping. * style(snipping test): improved * formatting imporoved * Update cooltools/snipping.py Co-authored-by: Nezar Abdennur <nabdennur@gmail.com>
Hello, may I ask if anyone's figured out yet how to fix this? I just encountered the bug myself and was rather mystified. Is the |
@HalfWhitt I think you can use |
@ssche Oh, interesting... I don't actually need But you prompted me to look at it again, and now that I've tried iterating over it and found that that works, I guess I can just do |
How about converting the datetime column to a string column before using groupby?
|
I now get 3 in the OP, could use tests. |
Still happening in |
@ssche - the fix is in main, will be released in 3.0. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(only in current master; dropna=False didn't exist previously)
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
When using groupby with the option of
dropna=False
, it may happen that aValueError
is being raised when the result oflen(groupby(..., dropna=False))
is obtained. This works fine in older version of Pandas (0.25.x) which always dropped na rows by default. Even in the current master version (1.1.0.dev0+2054.gc15f08084)len(...)
still works fine whendropna=True
is used. The interesting bit (for me at least) is that categoricals are somehow used while there are no categoricals in the dataframe to begin with. Perhaps this is a code path that is not meant to be executed.Expected Output
No exception being raised (there isn't any categorical after all) and
len(...)
isn't modifying anything so an exception is totally unexpected.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: