Add KNN outlier detector #677

mauicv · 2022-11-17T17:06:11Z

What is this:

This PR implements the kNN outlier detector. See simple example.

This detector has a PyTorch backend which can be directly serialized using torchscript. To do so the user should use:

knn_detector = KNN(k=5)
knn_detector.fit(X_ref)
knn_detector.infer_threshold(X_ref, 0.1)
knn_detector = torch.jit.script(knn_detector.backend)
knn_detector.save('./knn_detector.pt')

The kNN detector can be used as a single detector for a single k or an ensemble if passed multiple k. In the latter case, the detector must also be passed an aggregator to accumulate the multiple scores output for the multiple ks. For example:

from alibi_detect.od.backend import ShiftAndScaleNormaliserTorch, AverageAggregatorTorch

knn_detector = KNN(
    k=[5, 8, 12], 
    aggregator=AverageAggregatorTorch(), 
    normaliser=ShiftAndScaleNormaliserTorch()
)

knn_detector.fit(X_ref)
knn_detector.infer_threshold(X_ref, 0.1)

This PR also implements multiple aggregators and normalisers that can be used to normalise and aggregate ensemble outlier scores.

Out of scope

This PR does not Implement human-readable configuration or a pykeops backend. Neither does it implement documentation (i.e. example or method pages.) These will all be implemented in a later PR.

Notes:

Sub Module responsibilities:

A feature we want is torchscript-able components in order to ease deployment (See notes here). In order to ensure that we can do so while also ensuring that the torchscript language constraints don't contaminate too much of the code base we scope this functionality to the relevant subcomponents. In particular, each module has separate responsibilities:

KNNTorch:

Implements torch KNN functionality, specifically forward method
Extends torch.nn.Module
Torchscript-able
has fit, infer_threshold and predict methods.

KNN:

Wrapper for the KNNTorch functionality. Specifically, fit, infer_threshold and predict
Deals with validation
Config <- (NOT IMPLEMENTED HERE)
Does not extend torch.nn.module
Uses backend functionality to convert inputs to correct backend tensor type.

mypy issue:

For some reason, mypy interprets the type of the accumulator attribute and check_fitted method as torch.Tensor This raises an error in mypy because torch.Tensor doesn't have a call method and we call both of these objects within the predict method. Explicitly defining these objects types in the class fixes the issue but then causes problems with torch script compiling.

Todo:

More Todo:

I've just realised that the detectors define an outlier detector return dictionary:

def outlier_prediction_dict():
    data = {
        'instance_score': None,
        'feature_score': None,
        'is_outlier': None
    }
    return copy.deepcopy({"data": data, "meta": DEFAULT_META})

This should be used instead of the custom one I'm using in the kNN detector.

Consider alternatives/unifications of _to_numpy pattern!
test torch.compile on detector
expose backend aggregator and normalizer class and rename backend to PyTorch. i.e. 1 and 3 in this comment.
remove from utils.types import Protocol logic and replace with from typing_extensions import Protocol
Revert _to_numpy pattern as it messes up typing 🤦
- I've Replaced this with a singular dispatch pattern. This is a little exotic 😬 so happy to remove if peeps want!

codecov-commenter · 2022-11-25T16:27:32Z

Codecov Report

Merging #677 (a535e8d) into master (40f4121) will increase coverage by 0.46%.
The diff coverage is 94.98%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   80.33%   80.79%   +0.46%     
==========================================
  Files         137      144       +7     
  Lines        9304     9619     +315     
==========================================
+ Hits         7474     7772     +298     
- Misses       1830     1847      +17

Flag	Coverage Δ
macos-latest-3.9	`?`
ubuntu-latest-3.10	`80.72% <94.98%> (+0.50%)`	⬆️
ubuntu-latest-3.7	`80.62% <94.92%> (+0.50%)`	⬆️
ubuntu-latest-3.8	`?`
ubuntu-latest-3.9	`80.67% <94.98%> (+0.50%)`	⬆️
windows-latest-3.9	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
alibi_detect/od/__init__.py	`100.00% <ø> (ø)`
alibi_detect/base.py	`85.45% <50.00%> (ø)`
alibi_detect/od/base.py	`75.75% <75.75%> (ø)`
alibi_detect/od/pytorch/base.py	`97.10% <97.10%> (ø)`
alibi_detect/od/pytorch/ensemble.py	`97.11% <97.11%> (ø)`
alibi_detect/od/_knn.py	`97.95% <97.95%> (ø)`
alibi_detect/exceptions.py	`100.00% <100.00%> (ø)`
alibi_detect/od/pytorch/__init__.py	`100.00% <100.00%> (ø)`
alibi_detect/od/pytorch/knn.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

alibi_detect/od/pytorch/base.py

alibi_detect/od/_knn.py

ojcobb · 2023-03-15T18:21:15Z

Small number of comments around defaults and threshold inference, but otherwise looks good to me. Nice one!

alibi_detect/exceptions.py

alibi_detect/od/pytorch/base.py

alibi_detect/od/tests/test_ensemble.py

alibi_detect/od/_knn.py

ascillitoe

LGTM! Just a few very minor comments remaining...

ascillitoe

LGTM!

This reverts commit e09bd32.

mauicv added 5 commits November 10, 2022 14:54

Initial commit

df5dfff

Add transforms

4a4d19d

Minor progress commit

e729855

Add transforms and fitted transforms

b2ba7ab

Fix flake8 errors

1c0c7c0

mauicv added the WIP PR is a Work in Progress label Nov 17, 2022

mauicv added 12 commits November 21, 2022 14:14

Add accumulator into KNNTorch backend

2bf0056

Add BaseTorchDetector functionality

c7f4825

Add torchscript tests for knn backend module

02ac5bc

Fix GaussianRBF knn kernel test

9068ecc

Minor correction

e95c884

Rewrite knn outlier detector

47eb6c7

Surface errors if for unfit detectors

4b201df

Merge backend test features into test_knn_backend

5e3bc7b

Make knn tests better

e272c74

Remove test file

bc7de79

Fix mypy errors

ef1eb83

Import Literal from typing_extensions for python version compatibility

945a15a

mauicv added 11 commits November 28, 2022 14:50

Add docstrings for backend ensemble and knn objects

7363cb5

Add docstrings for base torch outlier detector class

6ca8062

Add docstrings for kNN detector

4674f8e

Minor fixes

07f0ffc

Minor fixes

5e0c3f7

Add docstrings for outlier detector base class

e884e94

Fix mypy issue and test

756def5

Reorder imports

96009e7

Replace normaliser with normalizer

b6ac822

Add optional dependency tests

3dc8408

Add make_moons dataset tests for ensemble and single kNN detectors

322989e