Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KNN outlier detector #677

Merged
merged 122 commits into from
Mar 17, 2023
Merged

Conversation

mauicv
Copy link
Collaborator

@mauicv mauicv commented Nov 17, 2022

What is this:

This PR implements the kNN outlier detector. See simple example.

  • This detector has a PyTorch backend which can be directly serialized using torchscript. To do so the user should use:

    knn_detector = KNN(k=5)
    knn_detector.fit(X_ref)
    knn_detector.infer_threshold(X_ref, 0.1)
    knn_detector = torch.jit.script(knn_detector.backend)
    knn_detector.save('./knn_detector.pt')
  • The kNN detector can be used as a single detector for a single k or an ensemble if passed multiple k. In the latter case, the detector must also be passed an aggregator to accumulate the multiple scores output for the multiple ks. For example:

    from alibi_detect.od.backend import ShiftAndScaleNormaliserTorch, AverageAggregatorTorch
    
    knn_detector = KNN(
        k=[5, 8, 12], 
        aggregator=AverageAggregatorTorch(), 
        normaliser=ShiftAndScaleNormaliserTorch()
    )
    
    knn_detector.fit(X_ref)
    knn_detector.infer_threshold(X_ref, 0.1)
  • This PR also implements multiple aggregators and normalisers that can be used to normalise and aggregate ensemble outlier scores.

Out of scope

This PR does not Implement human-readable configuration or a pykeops backend. Neither does it implement documentation (i.e. example or method pages.) These will all be implemented in a later PR.

Notes:

Sub Module responsibilities:

A feature we want is torchscript-able components in order to ease deployment (See notes here). In order to ensure that we can do so while also ensuring that the torchscript language constraints don't contaminate too much of the code base we scope this functionality to the relevant subcomponents. In particular, each module has separate responsibilities:

KNNTorch:

  • Implements torch KNN functionality, specifically forward method
  • Extends torch.nn.Module
  • Torchscript-able
  • has fit, infer_threshold and predict methods.

KNN:

  • Wrapper for the KNNTorch functionality. Specifically, fit, infer_threshold and predict
  • Deals with validation
  • Config <- (NOT IMPLEMENTED HERE)
  • Does not extend torch.nn.module
  • Uses backend functionality to convert inputs to correct backend tensor type.

mypy issue:

For some reason, mypy interprets the type of the accumulator attribute and check_fitted method as torch.Tensor This raises an error in mypy because torch.Tensor doesn't have a call method and we call both of these objects within the predict method. Explicitly defining these objects types in the class fixes the issue but then causes problems with torch script compiling.

Todo:

  • Add more here
  • Fix pytest -s --no-cov alibi_detect/od/backend/tests/test_knn_backend.py::test_knn_kernel test
  • Decide on fit/infer-threshold error behaviour
  • Add torchscript serliaziation tests
  • Add integration tests
  • Add validation in KNN class
  • Protect optional dep imports in `KNN
  • correctly reorder imports
  • Add optional dep tests
  • convert all normaliser to normalizer.
  • Fix mypy

More Todo:

  • I've just realised that the detectors define an outlier detector return dictionary:
    def outlier_prediction_dict():
        data = {
            'instance_score': None,
            'feature_score': None,
            'is_outlier': None
        }
        return copy.deepcopy({"data": data, "meta": DEFAULT_META})
    This should be used instead of the custom one I'm using in the kNN detector.
  • Consider alternatives/unifications of _to_numpy pattern!
  • test torch.compile on detector
  • expose backend aggregator and normalizer class and rename backend to PyTorch. i.e. 1 and 3 in this comment.
  • remove from utils.types import Protocol logic and replace with from typing_extensions import Protocol
  • Revert _to_numpy pattern as it messes up typing 🤦
    • I've Replaced this with a singular dispatch pattern. This is a little exotic 😬 so happy to remove if peeps want!

@mauicv mauicv added the WIP PR is a Work in Progress label Nov 17, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2022

Codecov Report

Merging #677 (a535e8d) into master (40f4121) will increase coverage by 0.46%.
The diff coverage is 94.98%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   80.33%   80.79%   +0.46%     
==========================================
  Files         137      144       +7     
  Lines        9304     9619     +315     
==========================================
+ Hits         7474     7772     +298     
- Misses       1830     1847      +17     
Flag Coverage Δ
macos-latest-3.9 ?
ubuntu-latest-3.10 80.72% <94.98%> (+0.50%) ⬆️
ubuntu-latest-3.7 80.62% <94.92%> (+0.50%) ⬆️
ubuntu-latest-3.8 ?
ubuntu-latest-3.9 80.67% <94.98%> (+0.50%) ⬆️
windows-latest-3.9 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
alibi_detect/od/__init__.py 100.00% <ø> (ø)
alibi_detect/base.py 85.45% <50.00%> (ø)
alibi_detect/od/base.py 75.75% <75.75%> (ø)
alibi_detect/od/pytorch/base.py 97.10% <97.10%> (ø)
alibi_detect/od/pytorch/ensemble.py 97.11% <97.11%> (ø)
alibi_detect/od/_knn.py 97.95% <97.95%> (ø)
alibi_detect/exceptions.py 100.00% <100.00%> (ø)
alibi_detect/od/pytorch/__init__.py 100.00% <100.00%> (ø)
alibi_detect/od/pytorch/knn.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

alibi_detect/od/_knn.py Outdated Show resolved Hide resolved
alibi_detect/od/_knn.py Outdated Show resolved Hide resolved
@ojcobb
Copy link
Contributor

ojcobb commented Mar 15, 2023

Small number of comments around defaults and threshold inference, but otherwise looks good to me. Nice one!

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few very minor comments remaining...

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mauicv mauicv merged commit e09bd32 into SeldonIO:master Mar 17, 2023
mauicv added a commit to mauicv/alibi-detect that referenced this pull request Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New method New method proposals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants