-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value of id
column returned by roll_time_series is set to column_sort
#673
Comments
Hi @ironerumi!
|
Hi @nils-braun, thanks for the quick response! I'm certainly looking forward to seeing how would the function looks like after the fix! TSFRESH helped me a lot and certainly hope it would evolve even better! |
The PR is merged, now the old |
Hi @nils-braun! Is that really the case? I installed the package this morning from pip and exactly faced the same confusion as @ironerumi. |
@konradsemsch Thanks for testing it so quick! You might see this on other git projects, which only use "master" for released code (and have a develop branch - the so called "git flow model"). We have chosen for another branching schema. Two possibilities:
|
Ok, thanks for the answer! :) Could you perhaps make a small example on how extract_features should be applied after we used roll_time_series, in order to make sure that they are available per rolled date/ each time-serie id? My goal would be to have a suitable forecasting structure across many different time-serie ids. Unfortunately I find the documentation a little bit cryptic: Not 100% what each of those parameters is really responsible for after applying rolling to have a structure suitable for forecasting: And the tutorial doesn't discuss feature extraction here: https://tsfresh.readthedocs.io/en/latest/text/forecasting.html Could you shed a bit more light on this? |
@konradsemsch probably you already solved it, just post the way that works for me here. df_rolling = roll_time_series(df, 'id', column_sort='time',max_timeshift=1,min_timeshift=1)
df_features = extract_features(df_rolling, column_id='id', column_sort='time')
Basically, in my impression, the column_id and column_sort are the same for both rolling and extract function. All values other than id would be kept during the process. |
Thanks to @ironerumi for your answer and sorry to @konradsemsch for falling silent :-/ Just for us to know (where we need to improve the documentation): did also https://tsfresh.readthedocs.io/en/latest/text/data_formats.html#data-formats-label not help you? Now to your question: after rolling (in v0.16.0!) your rolled dataframe will contain a new column called "id", which will be different for each package of rolled date + time series id. If and how you need a I will try to improve the documentation - both for the data formats and an example for feature extraction. |
Ah, I just realized why I am so puzzled. All the new documentation we wrote on the rolling was not visible :-) (because of a mis-configuration with readthedocs...) |
@nils-braun thanks for the info! Last time when I checked the module reference, I suppose it was still not 0.16 yet. I think now the forecasting.html explains more clearly on the overall flow. |
Thanks for checking! And sorry all the documentation was outdated - readthedocs was not updated properly. |
I think it is indeed better right now. Something I would consider though is adding a tangible example in the repo as well. Right now you guys have two examples I believe but they do not concern multiple time series at a time. I think illustrating that would completely clarify things up for a lot of users |
Hi, probably I do not fully understand the usage of
tsfresh.utilities.dataframe_functions.roll_time_series
.Say if I have input data like below and apply
roll_time_series
to it:The result is like below, I was expecting the
id
column could remains the same but it was set to the value ofcolumn_sort
.time
accross id?id
?Many thanks!
The text was updated successfully, but these errors were encountered: