Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] removing the use of nested_univ/nested np.DataFrames in time series machine learning #110

Closed
TonyBagnall opened this issue Feb 25, 2023 · 2 comments · Fixed by #163
Labels
enhancement New feature, improvement request or other non-bug code enhancement

Comments

@TonyBagnall
Copy link
Contributor

TonyBagnall commented Feb 25, 2023

Is your feature request related to a problem? Please describe.

Currently, the default is to load tsml problems into nested data frames, where each row is a time series, each column is a dimension/channel and each cell is a pd.Series. I have never liked this usage. It is confusing and inefficient. Furthermore, this use case is being deprecated in Pandas.

We could switch to pd.multiindex, but nearly all use cases is tsml are with numpy arrays. scikit, tensorflow, pytorch all use numpy. The only problem is when the problem has unequal length series. I propose switching to numpy whenever possible, removing nested pandas and for the problem case of unequal length using a list of numpy array and providing a good range of transformers to pad/truncate/downsample etc.

Describe the solution you'd like

  1. def _fit_predict_boilerplate in BaseClassifier converts everything to nested DataFrame with the following comment
    "Convert data to format easily useable for applying cv"
@TonyBagnall TonyBagnall added the enhancement New feature, improvement request or other non-bug code enhancement label Feb 25, 2023
@TonyBagnall
Copy link
Contributor Author

TonyBagnall commented Mar 7, 2023

Historically, we have used stored time series in Pandas, where every cell is a pd.Series. This gave maximum flexibility: we could have unequal length and time stamps. However, it adds a level of conversion and checking that is inefficient and does not suit scikit time very well. First stage of removing nested_univ all together is to make sure no classifiers use it. We will restrict to equal length problems until we introduce a nested numpy array format for unequal length series, which will happen once we deprecate explicit Pandas use for the internal fit and predict. Conversion can still happen at the base class level, so user experience will not change.

Classifiers still using nested_univ internal type

@TonyBagnall
Copy link
Contributor Author

this is now complete for classifiers, onwards to transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant