BUG: df.stack() returns wrong data when NaT is in index (regression since 2.1.0, ok in <= 2.0.3) #57152
Open
3 tasks done
Labels
Bug
Regression
Functionality that used to work in a prior pandas version
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Milestone
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
First of all, sorry for the rather complex dataframe. It was already quite challenging to reduce it from the one I was actually using...
Let us consider a DataFrame with a column MultiIndex where a NaT happens to appear in one of the indexes.
Let's try to find out the timestamps where
date_type == "MAT"
. This can be done in two ways:a)
unique_dates_v1
: here it is a simple cut usingget_level_values
- works fineb)
unique_dates_via_stack
: by stacking all the columns, thus making a series where a cross section can then give us the result. This is the version failing from pandas >= 2.1.0I know there is
future_stack=True
in newer pandas versions - and thefuture_stack
seems to work fine (and is usually what I prefer). However, the error above was caused when migrating older code. Thestack
version simply returns wrong data. There is no MAT entry at all with a 1970 date. Even if the old stack variant introduces additional NaNs, it should never return wrong data, not even in a deprecated stack implementation.Expected Behavior
behavior as in pandas 2.0, i.e. not assigning wrong data to MAT
Installed Versions
works fine with pandas <= 2.0.3
fails with pandas >= 2.1.0
The text was updated successfully, but these errors were encountered: