-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] removing the use of nested_univ/nested np.DataFrames in time series machine learning #110
Comments
Historically, we have used stored time series in Pandas, where every cell is a pd.Series. This gave maximum flexibility: we could have unequal length and time stamps. However, it adds a level of conversion and checking that is inefficient and does not suit scikit time very well. First stage of removing nested_univ all together is to make sure no classifiers use it. We will restrict to equal length problems until we introduce a nested numpy array format for unequal length series, which will happen once we deprecate explicit Pandas use for the internal fit and predict. Conversion can still happen at the base class level, so user experience will not change. Classifiers still using nested_univ internal type
|
this is now complete for classifiers, onwards to transformers |
Is your feature request related to a problem? Please describe.
Currently, the default is to load tsml problems into nested data frames, where each row is a time series, each column is a dimension/channel and each cell is a pd.Series. I have never liked this usage. It is confusing and inefficient. Furthermore, this use case is being deprecated in Pandas.
We could switch to pd.multiindex, but nearly all use cases is tsml are with numpy arrays. scikit, tensorflow, pytorch all use numpy. The only problem is when the problem has unequal length series. I propose switching to numpy whenever possible, removing nested pandas and for the problem case of unequal length using a list of numpy array and providing a good range of transformers to pad/truncate/downsample etc.
Describe the solution you'd like
I'll add occurrences here as I find them
"Convert data to format easily useable for applying cv"
The text was updated successfully, but these errors were encountered: