-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KNN outlier detector #677
Add KNN outlier detector #677
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #677 +/- ##
==========================================
+ Coverage 80.33% 80.79% +0.46%
==========================================
Files 137 144 +7
Lines 9304 9619 +315
==========================================
+ Hits 7474 7772 +298
- Misses 1830 1847 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Small number of comments around defaults and threshold inference, but otherwise looks good to me. Nice one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few very minor comments remaining...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This reverts commit e09bd32.
What is this:
This PR implements the kNN outlier detector. See simple example.
This detector has a
PyTorch
backend which can be directly serialized usingtorchscript
. To do so the user should use:The kNN detector can be used as a single detector for a single
k
or an ensemble if passed multiplek
. In the latter case, the detector must also be passed an aggregator to accumulate the multiple scores output for the multiplek
s. For example:This PR also implements multiple aggregators and normalisers that can be used to normalise and aggregate ensemble outlier scores.
Out of scope
This PR does not Implement human-readable configuration or a
pykeops
backend. Neither does it implement documentation (i.e. example or method pages.) These will all be implemented in a later PR.Notes:
Sub Module responsibilities:
A feature we want is torchscript-able components in order to ease deployment (See notes here). In order to ensure that we can do so while also ensuring that the torchscript language constraints don't contaminate too much of the code base we scope this functionality to the relevant subcomponents. In particular, each module has separate responsibilities:
KNNTorch:
torch.nn.Module
fit
,infer_threshold
andpredict
methods.KNN:
fit
,infer_threshold
andpredict
torch.nn.module
mypy issue:
For some reason,
mypy
interprets the type of theaccumulator
attribute andcheck_fitted
method astorch.Tensor
This raises an error in mypy becausetorch.Tensor
doesn't have a call method and we call both of these objects within the predict method. Explicitly defining these objects types in the class fixes the issue but then causes problems with torch script compiling.Todo:
pytest -s --no-cov alibi_detect/od/backend/tests/test_knn_backend.py::test_knn_kernel
testtorchscript
serliaziation testsKNN
classMore Todo:
_to_numpy
pattern!torch.compile
on detectorfrom utils.types import Protocol
logic and replace withfrom typing_extensions import Protocol
_to_numpy
pattern as it messes up typing 🤦