Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New feature: Lag or windows features grouped by #727
base: main
Are you sure you want to change the base?
New feature: Lag or windows features grouped by #727
Changes from 15 commits
4d653a9
4e9d849
7f40391
b476748
02c59bd
0dd92cc
47de2d6
dd43c27
7459811
12aa825
c3bee66
67725dc
9cb01ea
72ce43c
9d999b0
b7b8bc9
ba375a4
90f08f4
66baa75
92f996d
152c037
5343e50
ef1eaa8
09db782
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We resolved this in a different PR. Could we remove this change from here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are using pandas groupby under the hood, so the docs here should probably be identical or just a summary of what we see here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html and then refer the user to pandas groupby's documentation for more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it using a summary of pandas groupby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we defer the functionality to pandas, then we don't need this check. We just assign and let pandas handle the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, pandas will handle it, Thanks for your help 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are repeating the same string over and over, instead of writing it multiple times, we'd create a single text in the
_docstrings
module, and import it instead. Like we do withfit_transform
for example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to _docstring, Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example in the class' docstrings is just meant for the user to "copy and paste" a simple example, not a full blown demo. For that we have the user guide. Could we please keep the original example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do we need to loop?
Are we creating a grouped df for every variable passed to group_by_variables?
And is this the desired functionality? For time series forecasting, would we not have all ts in 1 col and then we would group by one or more variables that identify the ts, but we would not create many groups?
When would we need to create many groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me explain what I need to do here, the reason behind adding
group_by_variables
to time series transformers is because of this issue #668 , when we need to create some lags, rolling window, or expanding window features based on a set of groups.the above code loop over the set of groups to create the features for every group then concatenate them, and sort by index to return the dataframe to its original
let me explain it in the following code
the result is the dataframes of every group of ('x3', 'x4')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thank you for the explanation. Pandas should apply shift and rolling and expanding to the groups out of the box, there is no need to loop, as far as I understand. See for example these resources: https://www.statology.org/pandas-lag-by-group/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to loop over each group. Pandas does that under the hood if I recall correctly. So we'd just add groupby before .expanding. Check these resources:
https://www.statology.org/pandas-lag-by-group/
https://stackoverflow.com/questions/37231844/pandas-creating-a-lagged-column-with-grouped-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a simple way to perform the group_by operation to calculate expanding window features using the .apply() method in pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please keep the original example? Demos go in the user-guide :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to loop over the groups to apply the lags? pandas does the lags per group automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried many approaches to simplify this approach, but it is only working when using
periods
argument withshift()
method like the line in 231, however when using
freq
argument withshift()
method it doesn't work, so I used loop to make it work.kindly advice if we can simplify it.