Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] kNN Classifier and Regressor reimplementations #66

Merged
merged 8 commits into from
Feb 16, 2023
Merged

[ENH] kNN Classifier and Regressor reimplementations #66

merged 8 commits into from
Feb 16, 2023

Conversation

GuiArcencio
Copy link
Contributor

@GuiArcencio GuiArcencio commented Feb 15, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

  • Reimplemented both KNeighborsTimeSeriesClassifier and KNeighborsTimeSeriesRegressor in order to fix memory leaks and replace hard coded "distances" string list.
  • Removed computation of distance matrices in fit (which was useless in any case) and predict, as well as sklearn's KNeighbors instances contained within the models. The k-Neighbors algorithm is now implemented here.
  • Nearest neighbors are found by np.argpartitioning the distance vector into [0..k-1], [k], [k+1..], which is $O(n)$ as opposed to the $O(n \log n)$ in sorting.
  • Distance metrics are now generated by distance_factory, uncoupling the selection of possible metrics from the model.

What should a reviewer concentrate their feedback on?

  • Documentation for the new implementations might be lacking or inconsistent.
  • Other mtypes could be added for unequal-length support.
  • n_jobs is a parameter for the new classifier for compatibility purposes only. There is no parallelism implementation yet.

The following graphs show the results of current and new implementation benchmarks. The experiments were made on a regression dataset which consists of univariate time series of length 365. At each sample size, data was split between train and test in 70% - 30% proportions, respectively. The distance was always set to 'euclidean'.

euclidean

More benchmarks are underway, one using distance='dtw' and two others fixing train size and varying test size, and vice-versa.

Did you add any tests for the change?

Some tests were removed and/or changed due to being implementation-specific.

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

@TonyBagnall
Copy link
Contributor

fantastic, thanks @GuiArcencio. @chrisholder could you take a look please?

@TonyBagnall
Copy link
Contributor

hey @GuiArcencio do you have those timing/memory graphs? If so, good to post them here

Copy link
Contributor

@TonyBagnall TonyBagnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really good, thanks!

@TonyBagnall TonyBagnall self-requested a review February 16, 2023 19:19
@GuiArcencio GuiArcencio marked this pull request as ready for review February 16, 2023 19:27
Copy link
Contributor

@TonyBagnall TonyBagnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@TonyBagnall TonyBagnall merged commit 9797acf into aeon-toolkit:main Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Redesign KNN classifier and regressor
2 participants