Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/show_anomalies for multivariate #2544

Merged

Conversation

cnhwl
Copy link
Contributor

@cnhwl cnhwl commented Sep 27, 2024

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2114.

Summary

I determine whether this feature is enabled by adding parameter multivariate_plot: bool = False to show_anomalies(), which is implemented in the show_anomalies_from_scores function.

My general idea is to iterate through the components in the series and separately plot each component (including series, pred_series, pred_scores and anomalies). The following is a simple example, with the output shown below:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from darts import TimeSeries
from darts.ad.utils import (
    eval_metric_from_scores,
    show_anomalies_from_scores,
)
from darts.ad import (
    ForecastingAnomalyModel,
    NormScorer,
    WassersteinScorer,
)
from darts.models import RegressionModel

def generate_data_ex1(random_state: int):
    np.random.seed(random_state)

    # create the train set using standard normal distribution
    comp1 = np.expand_dims(np.random.normal(loc=0, scale=1, size=200), axis=1)
    comp2 = np.expand_dims(np.random.normal(loc=0, scale=1, size=200), axis=1)
    
    # Calculate means and standard deviations
    mean1, std1 = np.mean(comp1), np.std(comp1)
    mean2, std2 = np.mean(comp2), np.std(comp2)
    
    # Identify anomalies
    anomalies1 = (comp1 > mean1 + 2 * std1).astype(int)
    anomalies2 = (comp2 > mean2 + 2 * std2).astype(int)
    
    # Concatenate anomalies
    anomalies = np.concatenate([anomalies1, anomalies2], axis=1)
    
    # Concatenate the original values
    vals = np.concatenate([comp1, comp2], axis=1)
    
    return vals, anomalies

# Example usage
data, anomalies = generate_data_ex1(random_state=42)

series = TimeSeries.from_values(data, columns=["comp1", "comp2"])
series_train = series[:120]
series_test = series[120:]
anomalies_series = TimeSeries.from_values(anomalies, columns=["comp1_anomalies", "comp2_anomalies"])
anomalies_series_test = anomalies_series[120:]

anomaly_model = ForecastingAnomalyModel(
    model=RegressionModel(lags=10),
    scorer=[
        NormScorer(component_wise=True),
        WassersteinScorer(component_wise=True)
    ],
)

anomaly_model.fit(series_train, allow_model_training=True, verbose=True)

anomaly_model.show_anomalies(
    series=series_test,
    anomalies=anomalies_series_test,
    multivariate_plot=True
)

output

I would appreciate more input on how to further improve this feature, thank you very much!

Copy link

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 83.58209% with 11 lines in your changes missing coverage. Please review.

Project coverage is 94.15%. Comparing base (b441192) to head (95dd225).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
darts/ad/utils.py 83.58% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2544      +/-   ##
==========================================
- Coverage   94.24%   94.15%   -0.09%     
==========================================
  Files         141      141              
  Lines       15466    15491      +25     
==========================================
+ Hits        14576    14586      +10     
- Misses        890      905      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cnhwl cnhwl changed the title Add new feature to plot each series's component separately Feat/show_anomalies-for-multivariate Sep 27, 2024
@cnhwl cnhwl changed the title Feat/show_anomalies-for-multivariate Feat/show_anomalies for multivariate Sep 27, 2024
Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @cnhwl, sorry for making you wait so much for a review.

The documentation of the new argument is missing, also, some refactoring of the code to reduce code duplication would be great. Other than that, it looks great!

darts/ad/anomaly_model/anomaly_model.py Outdated Show resolved Hide resolved
darts/ad/anomaly_model/forecasting_am.py Outdated Show resolved Hide resolved
darts/ad/scorers/scorers.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
@cnhwl
Copy link
Contributor Author

cnhwl commented Dec 31, 2024

Thank you very much for your code review and guidance! @madtoinou
I've resolved the code changes and am ready to merge anytime now! 🚀

Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you a lot for the changes @cnhwl , I still have some improvement to suggest

darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
@cnhwl
Copy link
Contributor Author

cnhwl commented Dec 31, 2024

Thanks again for your help! @madtoinou 🤝

Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last changes request, after this, this PR should be ready to be reviewed by @dennisbader 🚀

darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
darts/ad/utils.py Outdated Show resolved Hide resolved
@cnhwl
Copy link
Contributor Author

cnhwl commented Jan 2, 2025

Happy New Year! @madtoinou @dennisbader
Please recheck the code and feel free to request any changes. 🤝

Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments! LGTM, let's wait for dennis' opinion :D

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cnhwl for the PR which looks nice, and @madtoinou for the review.

I saw one or two issues where I took the liberty to change some things:

  • fix issue where fig, axes were not defined in case pred_scores=None
  • fix issue which raised an error when input series was multivariate, but anomalies and scorers are component_wise=False

I'm not sure whether plot_multivariate is the best name. To me it sounds like it should plot all components on one plot if True (but it's the opposite). Should we rather use component_wise: bool = False (dafault False), as we define it also for the scorers?

My last concern is about plotting all the components separately in one figure. This can get quite big for larger numbers of components. But maybe it's a bit an overkill to improve this.

At least I find that we should improve the spacing a bit between suptitle and axes.
The box for the suptitle should be fixed regardless of the number of components, whereas now it is scaled with the number of plots (see my comment).

Let me know what you think, happy to discuss further :)

darts/ad/utils.py Outdated Show resolved Hide resolved
@cnhwl
Copy link
Contributor Author

cnhwl commented Jan 6, 2025

Thank you very much for your code review and fix! @dennisbader

Regarding parameter naming, multivariate_plot is just a temporary name. I also think we should use component_wise for better code and documentation consistency and I'll try to change it.

Regarding the size of the generated image, perhaps we could prompt the user in the documentation not to add too many components that would make the image too large?

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cnhwl, thanks for the updates. I made some last changes to address the figure size issue and some changes to the documentation.

Thanks again for the contribution and this nice PR 🚀 Always a pleasure to work together!

@dennisbader dennisbader merged commit 5bc1960 into unit8co:master Jan 8, 2025
9 checks passed
@cnhwl cnhwl deleted the feat/show_anomalies-for-multivariate branch January 9, 2025 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

show_anomalies for multivariate
3 participants