-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.core.groupby.groupby.DataFrameGroupBy.nth yielding false values with Interval MultiIndex Interval #24205
Comments
Can you create a minimal sample to recreate the issue? This is a rather large one so harder to reason about than it needs to be |
Yep, reduced it a bit:
|
@scootty1 your last example code produces a lot of NaNs in the intervals, though |
Aha, and that actually also seems to be the problem. So based on that, creating a smaller example:
So the result is kind of correct, apart from the fact that the NaN in the index gets filled with the previous category. I have a vague recollection we had an issue about this before. |
Interestingly it is not showing the NaNs in my case... But I extended the bins to not produce any NaNs. |
But do you still see the bug now? (I don't with your updated example from #24205 (comment)) |
Nope, seems to be gone. Didn't check that thoroughly enough. But still strange that I don't see any NaNs, even when I reproduce joris' example. |
Can you print the full code and output of what you did? |
No, unluckily not. After a restart of the console my output is the same as in your example. Should have restarted right away... But before the restart I had just copied your example to the IPython Console. |
The example in #24205 (comment) looks fixed. Could use a test
|
take |
Hi,
using a solution for my problem posted on SO, I stumbled upon this bug. Thanks @jorisvandenbossche for looking into that matter and confirming this to be a bug.
Problem description
When using
groupby
withdf.cut
it seems that taking the n-th row of a group withdf_grpd.nth(n)
withn > 1
results in values which lie out of the group boundaries, as shown withdf_grpd.nth(1).T
in my code example. Furthermore sometimes there are multiple rows per group forn=0
, as shown withdf_grpd.nth(0).loc[(pd.Interval(65, 70), pd.Interval(50, 55), pd.Interval(0.8, 0.85))].T
, also with values outside of the interval.Expected Output
I guess the expected output is quite clear in this case...
For
df_grpd.nth(1).T
:Output of
pd.show_versions()
pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.11
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Is there currently any workaround for this problem? Can I somehow access the group values in any other way?
Thanks in advance!
The text was updated successfully, but these errors were encountered: