-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] kNN Classifier and Regressor reimplementations #66
Conversation
fantastic, thanks @GuiArcencio. @chrisholder could you take a look please? |
hey @GuiArcencio do you have those timing/memory graphs? If so, good to post them here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really good, thanks!
sktime/regression/distance_based/tests/test_time_series_neighbors.py
Outdated
Show resolved
Hide resolved
sktime/regression/distance_based/tests/test_time_series_neighbors.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Reference Issues/PRs
What does this implement/fix? Explain your changes.
KNeighborsTimeSeriesClassifier
andKNeighborsTimeSeriesRegressor
in order to fix memory leaks and replace hard coded "distances" string list.fit
(which was useless in any case) andpredict
, as well assklearn
'sKNeighbors
instances contained within the models. The k-Neighbors algorithm is now implemented here.np.argpartition
ing the distance vector into[0..k-1]
,[k]
,[k+1..]
, which isdistance_factory
, uncoupling the selection of possible metrics from the model.What should a reviewer concentrate their feedback on?
n_jobs
is a parameter for the new classifier for compatibility purposes only. There is no parallelism implementation yet.The following graphs show the results of current and new implementation benchmarks. The experiments were made on a regression dataset which consists of univariate time series of length 365. At each sample size, data was split between train and test in 70% - 30% proportions, respectively. The distance was always set to 'euclidean'.
More benchmarks are underway, one using distance='dtw' and two others fixing train size and varying test size, and vice-versa.
Did you add any tests for the change?
Some tests were removed and/or changed due to being implementation-specific.
PR checklist
For all contributions