Generalize KNNRegressor to multitarget case #328

ablaom · 2020-10-22T19:27:45Z

Continuation of #327

For a 0.12.3 release

For a 0.12.4 release

For a 0.12.5 release

For a 0.12.6 release

For a 0.12.7 release

OkonSamuel · 2020-10-22T19:29:59Z

@ablaom could you hold on merging this while i review this PR

src/NearestNeighbors.jl

ablaom · 2020-10-22T20:09:41Z

@OkonSamuel I think those suggestions are fair. Are you happy to push the relevant changes to save @mateuszbaran opening a new PR?

OkonSamuel · 2020-10-22T20:24:15Z

@ablaom Ok

ablaom · 2020-10-22T21:43:39Z

@OkonSamuel Thanks for that!

Looking more carefully, it seems the testing around weights for regression is a bit light. I'm going to look into this, unless you want to do it?

ablaom · 2020-10-23T01:35:46Z

src/NearestNeighbors.jl

+        if m.weights == :uniform
+            preds[i,:] .= sum(values .* w_) / sum(w_)
+        else
+            preds[i,:] .= sum(values .* w_ .* (1.0 .- dists_ ./ sum(dists_))) / (sum(w_) - 1)


I'm having trouble understanding the formula in 147 (which predates this PR). @tlienart, can you recall how this was deduced or where it came from?

Naively, it seems to me that we want to write the prediction as a weighted sum of the target values, where the kth weight is simultaneously proportional to the prescribed sample weight w_[k] and inversely proportional to the distance dist_[k]. That is, prediction[i,:] = sum(w_ ./ dist_ .* values) / c where c is a normalisation constant, chosen such that the weights (coefficients of values in the sum) add up to one: c = sum(w_ ./ dist_). However, I can't reconcile this with the given formula. Am I missing something?

cc @mateuszbaran @OkonSamuel

Yeah had the same thoughts when i went through.
I would propose avoid passing weights to fit. i.e setting supports_weights trait to false since the weights needed for knn models are not per_observation weights but per_neighbor weights,
So we should stick to using weights derived from weights parameter passed during knn model construction. (i.e :uniform, :distance)

Yeah had the same thoughts when i went through.
I would propose avoid passing weights to fit. i.e setting supports_weights trait to false since the weights

I don't see anything wrong with mixing per sample weights with an inverse square law for the "neighbour" weights (if its done in a meaningful way). Also, presently, these two KNN models are one of the few models that support sample weights and are therefore used for testing 🙌

Numerator in this formula does make sense to me, multiplying by 1-dist[i]/sum(dist) should behave numerically better when distance to one of the neighbors is close to 0. The normalization constant in the denominator looks wrong though. Here is a comparison of different distance-based weights: https://perun.pmf.uns.ac.rs/radovanovic/publications/2016-kais-knn-weighting.pdf .

@mateuszbaran Thanks for pointing out the paper. Worth noting the evaluation there is for classifiers making point predictions (the mli classifier is probabilistic and so needs normalisation not needed there) and the testing was restricted to time-series applications. And the current PR is about regression. That said the paper nicely summarises a number of weighting schemes that probably covers cases in common use for both probabilistic classification and deterministic regression (mlj cases).

(Interestingly, I don't see the 1 - dist[i]/sum(dist) in the paper, although maybe it's a special case of Macleod ??).

It would be nice to implement them all and cite the paper for the definitions. But I would deem that out of scope of the current PR.

For the record sk-learn implements 1/dist[i] (with no epson-smoothing) and uniform, but also allows a custom function.

I propose we keep the current 1 - dist[i]/sum(dist) weight (and the 1/dist[i] weight currently used for the classifier) and do the normalisation post facto, as we do for classification. (I
don't believe there's a more efficient way, if we are mixing in sample-weights.) We'll view this a a "bug fix" (patch release) and open a new issue to generalize and get consistency with regression and classification - which will be breaking. We can do that at the same time as migrating the interface to its own package (#244).

@OkonSamuel @mateuszbaran Happy with this?

@ablaom. This can be merged for now. Further changes would be made when migrating to MLJNearestNeighborsInterface.jl.#331
There we may add more weighting schemes as kernels

Okay, I'll just merge this as is, despite the mysterious normalization, and we'll fix that when we migrate.

Yes, that paper isn't about kNN regression but still it nicely collects many weighting functions that can be used here as well.

Your plan sounds great 👍 .

Thanks for your help with the PR; I realise it was a bit more involved than you probably thought it would be 😄

No problem! Also thank you for your quick help in getting this done. I learned quite a bit and I think it was worth it.

…els.jl into dev

ablaom and others added 13 commits October 12, 2020 15:25

Merge pull request #313 from alan-turing-institute/dev

63004f7

For a 0.12.3 release

Merge pull request #316 from alan-turing-institute/dev

67e775a

For a 0.12.4 release

Merge pull request #320 from alan-turing-institute/dev

eb59c99

For a 0.12.5 release

Merge pull request #324 from alan-turing-institute/dev

e07b3d5

For a 0.12.6 release

Merge pull request #326 from alan-turing-institute/dev

b02cede

For a 0.12.7 release

support multivariate kNN regression

a4c8610

updated target of kNN regressor

0b8c3bd

changing target of KNNRegressor

865a352

target of kNN regressor again

3062a05

trying to make the multi-target kNN regressor work with tables

e52ea0a

fixing kNN regressor

a794ade

code review fixes

6f28040

update model registry

9bc0dfc

ablaom mentioned this pull request Oct 22, 2020

Support multivariate kNN regression #327

Closed

OkonSamuel requested changes Oct 22, 2020

View reviewed changes

src/NearestNeighbors.jl Outdated Show resolved Hide resolved

src/NearestNeighbors.jl Outdated Show resolved Hide resolved

ablaom added 2 commits October 23, 2020 09:10

update registry again

16ac6bf

fix check_registry issue

e7d5853

OkonSamuel added 3 commits October 22, 2020 21:47

Update NearestNeighbors.jl

46fea19

fix wrong call signature

d36948e

Update NearestNeighbors.jl

5325d5b

OkonSamuel added 3 commits October 22, 2020 23:24

replace Tables.schema with MMI.schema

a229988

Update NearestNeighbors.jl

a1a9ca2

Update NearestNeighbors.jl

ea2d9de

ablaom commented Oct 23, 2020

View reviewed changes

OkonSamuel self-assigned this Oct 26, 2020

Merge branch 'dev' of https://github.com/alan-turing-institute/MLJMod…

dedcaee

…els.jl into dev

Merge branch 'dev' into multiple-regression-knn2

c89f593

OkonSamuel removed their assignment Oct 27, 2020

ablaom merged commit 2791e74 into dev Oct 27, 2020

ablaom mentioned this pull request Oct 27, 2020

For a 0.12.9 release. #332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize KNNRegressor to multitarget case #328

Generalize KNNRegressor to multitarget case #328

ablaom commented Oct 22, 2020

OkonSamuel commented Oct 22, 2020

ablaom commented Oct 22, 2020

OkonSamuel commented Oct 22, 2020

ablaom commented Oct 22, 2020

ablaom Oct 23, 2020

OkonSamuel Oct 23, 2020 •

edited

Loading

ablaom Oct 23, 2020 •

edited

Loading

mateuszbaran Oct 23, 2020

ablaom Oct 27, 2020

OkonSamuel Oct 27, 2020 •

edited

Loading

ablaom Oct 27, 2020

mateuszbaran Oct 28, 2020

ablaom Oct 30, 2020

mateuszbaran Oct 30, 2020

Generalize KNNRegressor to multitarget case #328

Generalize KNNRegressor to multitarget case #328

Conversation

ablaom commented Oct 22, 2020

OkonSamuel commented Oct 22, 2020

ablaom commented Oct 22, 2020

OkonSamuel commented Oct 22, 2020

ablaom commented Oct 22, 2020

ablaom Oct 23, 2020

Choose a reason for hiding this comment

OkonSamuel Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

ablaom Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

mateuszbaran Oct 23, 2020

Choose a reason for hiding this comment

ablaom Oct 27, 2020

Choose a reason for hiding this comment

OkonSamuel Oct 27, 2020 • edited Loading

Choose a reason for hiding this comment

ablaom Oct 27, 2020

Choose a reason for hiding this comment

mateuszbaran Oct 28, 2020

Choose a reason for hiding this comment

ablaom Oct 30, 2020

Choose a reason for hiding this comment

mateuszbaran Oct 30, 2020

Choose a reason for hiding this comment

OkonSamuel Oct 23, 2020 •

edited

Loading

ablaom Oct 23, 2020 •

edited

Loading

OkonSamuel Oct 27, 2020 •

edited

Loading