You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Groupby (observed=False) with a categorical multiIndex and integer data values returns zero for categories that do no appear in the data, as seen in the first example (there are no wild parrots).
The text was updated successfully, but these errors were encountered:
jamieforth
changed the title
Groupby with categorical multiIndex and timedelta returns incorrect type.
BUG: Groupby with categorical multiIndex and timedelta returns incorrect type.
Oct 5, 2021
This is now working, except that upon reindexing for categorical with observed=False, we specify the fill value as being 0 and this leads to the following result with object dtype.
0 0 days 00:00:00.000000001
1 0 days 00:00:00.000000002
2 0 days 00:00:00.000000003
3 0
Here is a minimal example of this behavior, I'm not sure if this is a bug in fillna:
ser = pd.to_timedelta(pd.Series([1, 2, 3, np.nan]))
print(ser)
# 0 0 days 00:00:00.000000001
# 1 0 days 00:00:00.000000002
# 2 0 days 00:00:00.000000003
# 3 NaT
# dtype: timedelta64[ns]
print(ser.fillna(0))
# 0 0 days 00:00:00.000000001
# 1 0 days 00:00:00.000000002
# 2 0 days 00:00:00.000000003
# 3 0
# dtype: object
Groupby (
observed=False
) with a categorical multiIndex and integer data values returns zero for categories that do no appear in the data, as seen in the first example (there are no wild parrots).But when using
Timedelta
data values anint
is returned, instead of a Timedelta.Error:
pd.show_versions()
INSTALLED VERSIONS
commit : 73c6825
python : 3.9.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-8-amd64
Version : #1 SMP Debian 5.10.46-5 (2021-09-23)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.3.3
numpy : 1.19.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.1.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.10.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.3.24
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.19.0
xlrd : 2.0.1
xlwt : None
numba : 0.54.0
The text was updated successfully, but these errors were encountered: