Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Replace prts metrics #2400

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Conversation

aryanpola
Copy link
Contributor

@aryanpola aryanpola commented Nov 26, 2024

Reference Issues/PRs

Addresses #2066

What does this implement/fix? Explain your changes.

Implementation of Precision, Recall and F1-score metrics.

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

@aeon-actions-bot aeon-actions-bot bot added benchmarking Benchmarking package enhancement New feature, improvement request or other non-bug code enhancement labels Nov 26, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#264F59}{\textsf{benchmarking}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

@aryanpola aryanpola marked this pull request as draft November 26, 2024 05:43
@MatthewMiddlehurst MatthewMiddlehurst changed the title Recall [ENH] [ENH] Recall Nov 26, 2024
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these functions look like they should be private/protected. In the tests please run both the new and current functions to ensure the output is the same (using pytest)

@MatthewMiddlehurst MatthewMiddlehurst changed the title [ENH] Recall [ENH] Replace prts metrics Dec 27, 2024
@aryanpola
Copy link
Contributor Author

Changes made from the previous commit:

  1. Added a function for flattening lists of lists into a list.
  2. Removes overlapping functions.
  3. Changed the metric calculations from averaging(w.r.t cardinality) to global(w.r.t overlapping positions) calculation of real and pred ranges.

@aryanpola aryanpola marked this pull request as ready for review December 30, 2024 18:41
@aryanpola
Copy link
Contributor Author

@MatthewMiddlehurst Please let me know if the test cases are fine, I can separate them into different functions if necessary.

@TonyBagnall
Copy link
Contributor

thanks for this, its really helpful. We will look again next week

@aryanpola
Copy link
Contributor Author

lgtm, just make the functions protected and its good to go imo

Done. Thanks a bunch for reviewing!

@aryanpola aryanpola requested a review from TonyBagnall January 14, 2025 10:27
Copy link
Member

@SebastianSchmidl SebastianSchmidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your valuable contribution! The code looks good, and I especially appreciate your effort to compare the results of your implementation to the existing one. Which code did you use to generate the fixtures?

However, I am not 100% happy with the existing state:

  • The code lacks information and details in the documentation, e.g.
    • There is no reference to the original paper: http://papers.nips.cc/paper/7462-precision-and-recall-for-time-series.pdf
    • There is no description, how the ts_precision or the ts_recall are calculated. What is "Global Precision"?
    • It is unclear, how the parameters influence the calculations. E.g., for ts_precision, the combination bias_type="udf_gamma" && udf_gamma!=0 (I guess this is how one would want to set the existence reward weight) is not a good idea. Because precision by definition emphasizes on prediction quality, there is no need for an existence reward and the Gamma should always be set to 0.
    • From the documentation, it is unclear, how the inputs y_pred and y_real should look like. How are the "real (actual) ranges" defined: in this case, we require a list of tuples with the begin and end indices. Is the end index inclusive or exclusive? How does the nested list work?
  • The other AD performance metrics use bit masks as input. The new functions are incompatible. I guess, we can merge this without properly integrating the functions yet. Then, we can look at how to transform the bit masks to index ranges in a separate PR, when we remove the prts-package.
  • The tests are all using the same parameter configuration. We should also test other parameters to see if the code is compatible with the original authors' one.

@MatthewMiddlehurst
Copy link
Member

I would like someone to independently verify the output of old and new are the same, given that the the expected results are hard coded.

Alternatively, if you could provide a code snippet running both functions to help us verify that would be helpful (this would not go in the repo).

@aryanpola
Copy link
Contributor Author

The current implementation is a classical metric one. But the issue is to remove the prts package which is a range based metric, my previous commits cover the code for range based metrics.
It was a fault on my end while trying to run TSAD-evaluator which provides classical and range based metrics.
TSAD-evaluator: https://github.com/IntelLabs/TSAD-Evaluator

Apologies for any confusion caused.

@aryanpola
Copy link
Contributor Author

@SebastianSchmidl @MatthewMiddlehurst Do you want me to add point based metrics in a different file? I'll alter code to range based in this file.

@SebastianSchmidl
Copy link
Member

Hi @aryanpola,

I do not get what you mean by "point based metrics". We are interested in the metrics described in the paper by Tatbul et al.: "Precision and Recall for Time Series" at NeurIPS 2018, linked in the issue. The paper introduces "range-based recall" in Sect. 4.1 and "range-based precision" in Sect. 4.2. Those are the metrics, I want to have in aeon. They can, then, also be combined for a "range-based F1"-score.

As I understand it, the repository https://github.com/IntelLabs/TSAD-Evaluator was created by the paper's authors and also links to this paper. So, I assume that it contains the implementation for the metrics. But I have not looked into the code in details, so I cannot tell you, which portion they describe in the paper. Excuse any possible confusions caused by me not explaining my view on the code earlier.

Can you identify the relevant code for "range-based recall" and "range-based precision" in https://github.com/IntelLabs/TSAD-Evaluator?

@aryanpola
Copy link
Contributor Author

Can you identify the relevant code for "range-based recall" and "range-based precision" in https://github.com/IntelLabs/TSAD-Evaluator?

Yes yes, it's just that I got carried away trying to implement classic metrics (obviously not what the issue stated).
I've updated the code with the necessary changes and added more test cases which should cover everything.

@aryanpola
Copy link
Contributor Author

Also, value errors will rise when gamma = udf_gamma in precision which wouldn't affect the precision calculation.

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its fine to use bits of documentation from the other function when applicable. I'll leave the input type call to Sebastian but bit odd to change it IMO.

Again i would like the output of the current functions verified against the current ones or the current functions to be used in testing for now just to verify that the output is the same.

@aryanpola
Copy link
Contributor Author

@MatthewMiddlehurst There is repetition of documentation in public functions so changing them to point towards other functions would be a bad idea.
Let me know if you want me to change them.

@MatthewMiddlehurst
Copy link
Member

I'm not quite sure what you mean. In case there is confusion of the original issue, the end goal is to remove the other version of these functions eventually.

@aryanpola
Copy link
Contributor Author

I'm not quite sure what you mean. In case there is confusion of the original issue, the end goal is to remove the other version of these functions eventually.

The implementation supports range metrics (ts_precision, ts_recall, ts_fscore) and removes the dependency on prts package.

@aryanpola
Copy link
Contributor Author

aryanpola commented Jan 23, 2025

Also, there are some changes to be made:

  1. Separate parameters in ts_fscore -> alpha, bias to p_alpha, r_alpha, p_bias, r_bias.
  2. Remove udf_gamma from precision. Currently, it only raises errors.

I'll update in some time.

@aryanpola
Copy link
Contributor Author

aryanpola commented Jan 24, 2025

@MatthewMiddlehurst @SebastianSchmidl @TonyBagnall changes from my side are done, please review when free and suggest changes if any :)

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Jan 24, 2025

The implementation supports range metrics (ts_precision, ts_recall, ts_fscore) and removes the dependency on prts package.

I may just be missing something here. Why would there be repetition in documentation if you use documentation from the functions we are removing and replacing. The prts functions are still present and prts is still a dependency for the package. The replacement functions do not take the same input. I would disagree with the old ones being removed in this PR currently, given the ones you have implemented are different and unverified.

Could you please:

@aryanpola
Copy link
Contributor Author

aryanpola commented Jan 25, 2025

I may just be missing something here. Why would there be repetition in documentation if you use documentation from the functions we are removing and replacing.

I meant for the functions precision and recall in the newer implementation, they have the same docstrings.

@aryanpola
Copy link
Contributor Author

aryanpola commented Jan 25, 2025

The already existing implementation (_binary.py) takes inputs as binary values (arrays). The one we are trying to replace with, takes inputs as list of tuples. So the changes would be to convert the list of tuples to binary arrays for compatibility with already existing functions?

Thanks for pointing out the conversion, haven't checked the input data type for _binary.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Benchmarking package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants