-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
pct_change with freq on groupby broken #11811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems >>> rawdat.groupby(['Symbol']).resample('1M', label='right', how='sum')
<Lots of data>
>>> rawdat.groupby(['Symbol']).resample('1M', label='right', how=lambda period: (period[-1]/period[0]) - 1.0)
<Empty> |
Thanks for the report. Can you attach the content of |
I'm unable to provide the data itself because it's licensed (and several GB) unfortunately. When trying to give you a subset though, I discovered that the bug was only reproducible when I read more than 1032393 (an odd but specific number) rows from the CSV. If I read less, the functions above worked perfectly. If I read more, they all broke completely: >>> rawdat = pd.read_csv('./data.csv', parse_dates=[1], index_col=[1], usecols=[0, 1, 12], nrows=1032393)
>>> rawdat.groupby(['Symbol']).resample('1M', label='right', how=lambda period: (period[-1]/period[0]) - 1.0)
<Data>
>>> rawdat = pd.read_csv('./data.csv', parse_dates=[1], index_col=[1], usecols=[0, 1, 12], nrows=1032394)
>>> rawdat.groupby(['Symbol']).resample('1M', label='right', how=lambda period: (period[-1]/period[0]) - 1.0)
<Empty> Since that number was so specific, I looked into the data and saw that at that row, there was a gap of a few months in that symbol's data:
I skipped over that with
So I suspect that the problem is really the gap in the data. Also though, I found that grouping by # Column 0 is a symbol.
>>> rawdat = pd.read_csv('./data.csv', parse_dates=[1], index_col=[0, 1], usecols=[0, 1, 12], nrows=1000)
>>> rawdat.reset_index(level=0).groupby(['Symbol']).resample('1M', label='right', how=lambda period: (period[-1]/period[0]) - 1.0)
<Lots of data>
>>> rawdat.groupby(level=0).resample('1M', label='right', how=lambda period: (period[-1]/period[0]) - 1.0)
<Empty> I can share 1000 rows with you so here's all you need to reproduce the |
@bobobo1618 the last might be a bug. Can you put together a copy-pastable example that is reproducible. |
Replacing my function with this caused it to work: def month_change_resample(arraylike):
if len(arraylike) == 0:
return 0
return (arraylike[-1]/arraylike[0]) - 1.0 It seems some of the arraylike objects were of length zero, which would have caused an IndexError during the execution. Is outputting nothing expected behaviour in that situation? |
Ah sorry, no it didn't. Copy pastable example: curl http://ix.io/mJE > data.csv
cat << EOF > test.py
import pandas as pd
rawdat = pd.read_csv('./data.csv', parse_dates=[1], index_col=[0, 1], usecols=[0, 1, 12], nrows=1000)
def month_change_resample(arraylike):
if len(arraylike) == 0:
return 0
return (arraylike[-1]/arraylike[0]) - 1.0
print("Level result: ")
print(rawdat.groupby(level=0).resample('1M', label='right', how=month_change_resample))
print("No level result:")
print(rawdat.reset_index(level=0).groupby(['Symbol']).resample('1M', label='right', how=month_change_resample))
EOF
python test.py |
@bobobo1618 a copy-pastable example is one that I can actually copy and past. IOW, no files are involved. The dataframe should be much shorter. |
not reproducible and likely same issue as in #21200 |
The text was updated successfully, but these errors were encountered: