Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created benford_correlation(x) function #689

Merged
merged 11 commits into from
May 19, 2020

Conversation

gubertoli
Copy link
Contributor

Hi,

I have been using tsfresh for about 2 months, one attribute that came up to my mind in my cases is to use the deviation from Newcomb-Benford distribution, specifically the correlation between a data array and the expected distribution.

Newcomb-Benford distribution considers a log distribution of the first digits from data.

This analysis has been used for fraud and anomaly detection. More uses - https://scholar.google.com/scholar?q=benford+law+time+series

This is my first pull request to a major package, please if anything is wrong, please let me know.

@coveralls
Copy link

coveralls commented May 15, 2020

Coverage Status

Coverage increased (+0.6%) to 97.093% when pulling 1590938 on gubertoli:master into 8dd3f72 on blue-yonder:master.

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gubertoli!
Thank you very much for your pull request! We are super happy for new code, especially if they add more feature calculators! The PR is well documented and adds a very reasonable feature calculator.

So, welcome in the open-source community and welcome to tsfresh!

As you said that this is your first commit to a "major" package, I was especially nitpicky :-) So do not be discouraged by the number of review comments.

One thing which is however crucial is that you add a test for your calculator. Please have a look into the file test_feature_calculators.py and add one for your calculator there. Test it for small, large, negative, nan, floating-point, integer, whatever-you-can-think-of numbers. I will not approve the PR without tests ;-)

tsfresh/feature_extraction/feature_calculators.py Outdated Show resolved Hide resolved
tsfresh/feature_extraction/feature_calculators.py Outdated Show resolved Hide resolved
tsfresh/feature_extraction/feature_calculators.py Outdated Show resolved Hide resolved
tsfresh/feature_extraction/feature_calculators.py Outdated Show resolved Hide resolved
tsfresh/feature_extraction/feature_calculators.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks very good now!
Once I understood why the result of the equally distributed numbers is nan and if this is expected, we can merge!

@nils-braun
Copy link
Collaborator

Thanks again @gubertoli for your work! Appreciate the changes!

@nils-braun nils-braun merged commit f8a952e into blue-yonder:master May 19, 2020
@gubertoli
Copy link
Contributor Author

Thanks again @gubertoli for your work! Appreciate the changes!

Thank you @nils-braun for the patience and all the processual knowledge shared during this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants