-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
df.stack() behaving differently between 0.19.2 and 0.20.1 #16323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Simpler case:
passes under 0.19.2 for me and fails under 0.20.1. |
In 0.19.2, with the above dataframe, we have
In 0.20.1, we have
or, to use the new syntax,
That is, in 0.20.1, the sort actually affects the frame, and
is no longer justified, because the level order can have changed. I think that just using |
Thanks for the report. It looks like a regression. I think @dsm054 's fix looks good. |
) (pandas-dev#16325) (cherry picked from commit b1ff291)
hi, i think there is still some problem Code Sample, a copy-pastable example if possible# Your code here
import pandas as pd
# We create a MultiIndex
PAE = ['ITA', 'FRA']
VAR = ['A1', 'A2']
TYP = ['CRT', 'DBT', 'NET']
MI = pd.MultiIndex.from_product([PAE, VAR, TYP], names=['PAE', 'VAR', 'TYP'])
# We create a dataframe with multindex MI
V = [20, 10, 10, 40, 10, 30, 120, 110, 10, 140, 110, 30]
DF = pd.DataFrame(data=V, index=MI, columns=['VALUE'])
# We unstack the dataframe and drop level 0
DF = DF.unstack(['VAR', 'TYP'])
DF.columns = DF.columns.droplevel(0)
DF[('A0', 'NET')] = 9999
# We stack the dataframe
DF0 = DF.stack(['VAR', 'TYP'])
# DF0 is wrong
DF1 = DF.sort_index(axis=1).stack(['VAR', 'TYP'])
# DF1 is right Problem descriptiondata in DF0 doesn't correspond to original data before unstack and droplevel. Expected OutputPAE VAR TYP
FRA A0 NET 9999.0
A1 CRT 120.0
DBT 110.0
NET 10.0
A2 CRT 140.0
DBT 110.0
NET 30.0
ITA A0 NET 9999.0
A1 CRT 20.0
DBT 10.0
NET 10.0
A2 CRT 40.0
DBT 10.0
NET 30.0
dtype: float64 Output of
|
@ilmioalias : Can you try installing |
I can confirm that it looks like @ilmioalias has found a different failure mode not solved by the first fix, and obviously not caught by the original tests, which I thought were pretty deep. :-/ |
@dsm054 : Are you using |
Same under both, more's the pity. |
Okay, could you file a separate issue then and cross-reference this one? Afterwards, feel free to patch this since you were the one who spearheaded the initial fix. |
Sure, I'll take it. I'm annoyed that something managed to sneak through. 😒 Thanks for the report, @ilmioalias!! |
Well, if we could write tests that ALWAYS covered EVERY use-case, then we wouldn't need to take contributions from anyone because our tests would be perfect. 😉 |
Hi, INSTALLED VERSIONScommit: None pandas: 0.20.2 |
Code Sample, a copy-pastable example if possible
Problem description
Since switching to 0.20.1, when using
df.stack(0)
, the output looks like this:The columns change order and the tickers no longer correspond to the correct prices.
` Expected Output
0.19.2 maintains the correct hierarchy.
Output of
pd.show_versions()
pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 35.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.1
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: