Skip to content

[ENH] removing the use of nested_univ/nested np.DataFrames in time series machine learning #110

Closed
@TonyBagnall

Description

@TonyBagnall

Is your feature request related to a problem? Please describe.

Currently, the default is to load tsml problems into nested data frames, where each row is a time series, each column is a dimension/channel and each cell is a pd.Series. I have never liked this usage. It is confusing and inefficient. Furthermore, this use case is being deprecated in Pandas.

We could switch to pd.multiindex, but nearly all use cases is tsml are with numpy arrays. scikit, tensorflow, pytorch all use numpy. The only problem is when the problem has unequal length series. I propose switching to numpy whenever possible, removing nested pandas and for the problem case of unequal length using a list of numpy array and providing a good range of transformers to pad/truncate/downsample etc.

Describe the solution you'd like

  1. def _fit_predict_boilerplate in BaseClassifier converts everything to nested DataFrame with the following comment
    "Convert data to format easily useable for applying cv"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature, improvement request or other non-bug code enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions