Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate range #39

Merged
merged 27 commits into from
Feb 8, 2022
Merged

Multivariate range #39

merged 27 commits into from
Feb 8, 2022

Conversation

laurafroelich
Copy link
Contributor

Add simple range detector for multivariate data. A time point is defined to be anomalous if any of the time series is outside of min/max range.

@laurafroelich laurafroelich marked this pull request as draft February 2, 2022 13:14
@laurafroelich laurafroelich marked this pull request as ready for review February 2, 2022 13:44

Parameters
----------
min_value : float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min and max values must be able to be varied for each individual time-series, e.g. for simultaneous measurement of temperature and humidity, you need different ranges. But keep the option to provide a single value applied for all timeseries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look quickly! Sure, I can do that. The intent was to have a very simple method with simple inputs. But the idea of allowing for both options (either same min/max for all or specified indvidually) is also reasonable.

The question is of course how many options to pack into one thing versus constructing another detector with this specific purpose. I think I might actually prefer pursuing the original intent and instead adding another detector that has min/max for each time series. It would also avoid some if/else in e.g. fitting, which will keep the code clearer.

Let me know what you think though, if you still prefer combining both options in one detector we can also do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow common conventions in NumPy and Pandas e.g pandas.DataFrame.clip

The clip method accepts both scalar and array as input.
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to use broadcasting to avoid if/else and have added changes that allow different ranges for different time series.

@ecomodeller ecomodeller merged commit 62c3bf6 into main Feb 8, 2022
@laurafroelich laurafroelich deleted the multivariate_range branch February 8, 2022 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants