-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Replace prts
metrics
#2400
base: main
Are you sure you want to change the base?
Conversation
Thank you for contributing to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these functions look like they should be private/protected. In the tests please run both the new and current functions to ensure the output is the same (using pytest)
aeon/benchmarking/metrics/anomaly_detection/tests/test_metrics.py
Outdated
Show resolved
Hide resolved
Changes made from the previous commit:
|
@MatthewMiddlehurst Please let me know if the test cases are fine, I can separate them into different functions if necessary. |
thanks for this, its really helpful. We will look again next week |
Done. Thanks a bunch for reviewing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your valuable contribution! The code looks good, and I especially appreciate your effort to compare the results of your implementation to the existing one. Which code did you use to generate the fixtures?
However, I am not 100% happy with the existing state:
- The code lacks information and details in the documentation, e.g.
- There is no reference to the original paper: http://papers.nips.cc/paper/7462-precision-and-recall-for-time-series.pdf
- There is no description, how the
ts_precision
or thets_recall
are calculated. What is "Global Precision"? - It is unclear, how the parameters influence the calculations. E.g., for
ts_precision
, the combinationbias_type="udf_gamma" && udf_gamma!=0
(I guess this is how one would want to set the existence reward weight) is not a good idea. Because precision by definition emphasizes on prediction quality, there is no need for an existence reward and the Gamma should always be set to 0. - From the documentation, it is unclear, how the inputs
y_pred
andy_real
should look like. How are the "real (actual) ranges" defined: in this case, we require a list of tuples with the begin and end indices. Is the end index inclusive or exclusive? How does the nested list work?
- The other AD performance metrics use bit masks as input. The new functions are incompatible. I guess, we can merge this without properly integrating the functions yet. Then, we can look at how to transform the bit masks to index ranges in a separate PR, when we remove the
prts
-package. - The tests are all using the same parameter configuration. We should also test other parameters to see if the code is compatible with the original authors' one.
I would like someone to independently verify the output of old and new are the same, given that the the expected results are hard coded. Alternatively, if you could provide a code snippet running both functions to help us verify that would be helpful (this would not go in the repo). |
The current implementation is a classical metric one. But the issue is to remove the Apologies for any confusion caused. |
@SebastianSchmidl @MatthewMiddlehurst Do you want me to add point based metrics in a different file? I'll alter code to range based in this file. |
Hi @aryanpola, I do not get what you mean by "point based metrics". We are interested in the metrics described in the paper by Tatbul et al.: "Precision and Recall for Time Series" at NeurIPS 2018, linked in the issue. The paper introduces "range-based recall" in Sect. 4.1 and "range-based precision" in Sect. 4.2. Those are the metrics, I want to have in aeon. They can, then, also be combined for a "range-based F1"-score. As I understand it, the repository https://github.com/IntelLabs/TSAD-Evaluator was created by the paper's authors and also links to this paper. So, I assume that it contains the implementation for the metrics. But I have not looked into the code in details, so I cannot tell you, which portion they describe in the paper. Excuse any possible confusions caused by me not explaining my view on the code earlier. Can you identify the relevant code for "range-based recall" and "range-based precision" in https://github.com/IntelLabs/TSAD-Evaluator? |
Yes yes, it's just that I got carried away trying to implement classic metrics (obviously not what the issue stated). |
Also, value errors will rise when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its fine to use bits of documentation from the other function when applicable. I'll leave the input type call to Sebastian but bit odd to change it IMO.
Again i would like the output of the current functions verified against the current ones or the current functions to be used in testing for now just to verify that the output is the same.
@MatthewMiddlehurst There is repetition of documentation in public functions so changing them to point towards other functions would be a bad idea. |
I'm not quite sure what you mean. In case there is confusion of the original issue, the end goal is to remove the other version of these functions eventually. |
The implementation supports range metrics (ts_precision, ts_recall, ts_fscore) and removes the dependency on |
Also, there are some changes to be made:
I'll update in some time. |
@MatthewMiddlehurst @SebastianSchmidl @TonyBagnall changes from my side are done, please review when free and suggest changes if any :) |
I may just be missing something here. Why would there be repetition in documentation if you use documentation from the functions we are removing and replacing. The Could you please:
|
I meant for the functions precision and recall in the newer implementation, they have the same docstrings. |
The already existing implementation (_binary.py) takes inputs as binary values (arrays). The one we are trying to replace with, takes inputs as list of tuples. So the changes would be to convert the list of tuples to binary arrays for compatibility with already existing functions? Thanks for pointing out the conversion, haven't checked the input data type for _binary.py. |
Reference Issues/PRs
Addresses #2066
What does this implement/fix? Explain your changes.
Implementation of Precision, Recall and F1-score metrics.
PR checklist
For all contributions