Description
Is your feature request related to a problem? Please describe.
Currently, the default is to load tsml problems into nested data frames, where each row is a time series, each column is a dimension/channel and each cell is a pd.Series. I have never liked this usage. It is confusing and inefficient. Furthermore, this use case is being deprecated in Pandas.
We could switch to pd.multiindex, but nearly all use cases is tsml are with numpy arrays. scikit, tensorflow, pytorch all use numpy. The only problem is when the problem has unequal length series. I propose switching to numpy whenever possible, removing nested pandas and for the problem case of unequal length using a list of numpy array and providing a good range of transformers to pad/truncate/downsample etc.
Describe the solution you'd like
- [ ]. Remove the default behaviour of loading nested DataFrame in the univariate single problem loaders [ENH] make single problem loaders for equal length problems return numpy arrays #109
- [ ]. Change the transformers that internally use nested DataFrame to use numpy [ENH] Write a function to perform the logic of DerivativeSlopeTransformer using numpy arrays instead of pd.DataFrame #85 [ENH] Slow transformers that may not be needed #98
- [ ]. Check code base for all uses of nested DataFrame and all current internal handling of unequal length (e.g. distances)
- [ ]. Switch unequal length loading to return list of numpy
- [ ]. Adapt/provide transformers for list of numpy to numpy (Padding etc)
- [ ]. Rewrite notebooks and webpages to reflect the changes
- [ ]. Refactor transformers to use list of np arrays for unequal
I'll add occurrences here as I find them
- def _fit_predict_boilerplate in BaseClassifier converts everything to nested DataFrame with the following comment
"Convert data to format easily useable for applying cv"