Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #695 Add "subseq_isconstant" param to API #789

Merged
merged 77 commits into from
Mar 12, 2023

Conversation

NimaSarajpoor
Copy link
Collaborator

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Jan 28, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c7d5321) 99.24% compared to head (f1519ea) 99.25%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #789    +/-   ##
========================================
  Coverage   99.24%   99.25%            
========================================
  Files          82       82            
  Lines       12974    13121   +147     
========================================
+ Hits        12876    13023   +147     
  Misses         98       98            
Impacted Files Coverage Δ
stumpy/core.py 100.00% <100.00%> (ø)
stumpy/gpu_stump.py 100.00% <100.00%> (ø)
stumpy/stump.py 100.00% <100.00%> (ø)
stumpy/stumped.py 100.00% <100.00%> (ø)
tests/naive.py 100.00% <100.00%> (ø)
tests/test_core.py 100.00% <100.00%> (ø)
tests/test_gpu_stump.py 100.00% <100.00%> (ø)
tests/test_stump.py 100.00% <100.00%> (ø)
tests/test_stumped.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

stumpy/stump.py Outdated Show resolved Hide resolved
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jan 28, 2023

@seanlaw
For the sake of consistency, I think we should add the "subseq_isconstant" param to the following modules as well.

  • stumped
  • gpu_stump
  • floss
  • mpdist
  • mmotifs
  • mstump.py::multi_distance_profile
  • scrump (and prescrump)
  • snippets
  • stimp?
  • ....

@seanlaw
Copy link
Contributor

seanlaw commented Jan 29, 2023

For the sake of consistency, I think we should add the "subseq_isconstant" param to the following modules as well.

What about stumpi, mstumped, ostinato, ostinatoed, gpu_ostinato, mpdisted, gpu_mpdist, stimped, gpu_stimp?

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jan 29, 2023

What about stumpi, mstumped, ostinato, ostinatoed, gpu_ostinato, mpdisted, gpu_mpdist, stimped, gpu_stimp?

Well... I haven't explored all modules yet but we should defintely check them. I also need to check the ones I mentioned in my previous comment again to make sure the implementation is doable/ reasonable.

For instance: I have some difficulty in understanding how users can get benefit from this feature when data is updated dynamically. In stumpi, matrix profile is computed in the context of an streaming data. While users may provide their own input for subseq_isconstant for the initial input T, I do not understand how this should be updated as new data being inserted to the time series. (alternative option: ask users to provide stddev threshold)

What about PAN matrix profile? I haven't studied its module yet but it seems it computes matrix profile for different window length. So, in that case, I think we should avoid allowing user to insert their own "subseq_isconstant" array for just one window size. Or, we should allow them to provide this array for each window size.

I will try to explore modules one by one to see if we can add this new support for them. Please let me know if you have any suggestion.

@seanlaw
Copy link
Contributor

seanlaw commented Jan 30, 2023

Please let me know if you have any suggestion.

So, I'm wondering if we could do something like:

# core.py
import inspect

def rolling_isconstant(a, w, custom_func=None):
    """
    """
    axis = a.ndim - 1
    rolling_isconstant_func = _rolling_isconstant

    if custom_func is not None:
        custom_func_args = set(inspect.signature(some_func).parameters.keys()
        if len(custom_func_args.difference(set(['a', 'w']))):
            rolling_isconstant_func = custom_func
        else:
            msg = "Incompatible parameters found in custom function (in `rolling_isconstant`)"
            warnings.warn(msg)

    return np.apply_along_axis(
        lambda a_row, w: rolling_isconstant_func(a_row, w), axis=axis, arr=a, w=w
    )

And then, in a function like stumpy.stump (or other API functions), we can do something like:

if callable(T_subseq_isconstant):
    isconstant_func = T_subseq_isconstant  # save the function in case we need it for later??
    T_subseq_isconstant = core.rolling_isconstant(T, m, isconstant_func)
if T_subseq_isconstant is None:
    T_subseq_isconstant = core.rolling_isconstant(T, m)    

I haven't thought this through and it is still somewhat convoluted but some variation of this might work after we clean it up. It should even be usable for stimp since the user's custom function would always be applied in place of our _rolling_isconstant and it would be dynamic for each window size (i.e., stimp would only accept a custom function for T_subseq_isconstant and not a numpy array).

Again, just a soft proposal for you to consider.

@NimaSarajpoor
Copy link
Collaborator Author

It should even be usable for stimp since the user's custom function would always be applied in place of our _rolling_isconstant and it would be dynamic for each window size (i.e., stimp would only accept a custom function for T_subseq_isconstant and not a numpy array).

This actually sounds great! We can also let user know that subsequences with at least one nan or inf will be treated as "not constant" regardless of the provided custom function. (Otherwise, we need to modify T_subseq_isfinite)

@NimaSarajpoor NimaSarajpoor force-pushed the subseq_constant_in_API branch from 5f6704d to 5be22a7 Compare February 4, 2023 05:53
environment.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one minor suggestion. Also, I wonder if it makes sense to break this up into smaller individual PRs rather than one giant one? The current size of this PR is okay but maybe we merge this one when it is ready and then add other files in a separate PR?

stumpy/core.py Outdated Show resolved Hide resolved
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 10, 2023

@seanlaw

I wonder if it makes sense to break this up into smaller individual PRs rather than one giant one? The current size of this PR is okay but maybe we merge this one when it is ready and then add other files in a separate PR?

According to our experience in top-k PR, I think what you are suggesting is reasonable. I checked out the changed files and I think this PR is ready. We already added the param to stump, stumped, and gpu_stump. So, I think it is good to be merged. Please allow me to address your comment and take a look at the changes for one last time.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

Please allow me to address your comment and take a look at the changes for one last time.

[Update]
I addressed your comment, and checked the changed files. They look good to me. Please feel free to merge.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 10, 2023

@NimaSarajpoor It looks like we are missing some code coverage:

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
tests/naive.py        1216      1    99%   243
tests/test_core.py     993      7    99%   89, 1577, 1583, 1589, 1595, 1601, 1607
--------------------------------------------------
TOTAL                13037      8    99%

Note that even though these are are naive.py and test_core.py, this implies that some paths are not traversed within these functions, which is a problem (i.e., please do not simply do pragma no cover)

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

It looks like we are missing some code coverage

I really need to understand the importance of checking code coverage by heart :)

tests/naive.py Show resolved Hide resolved
def test_find_incompatible_args():
# case1: having exact required argument
def func_case1(x, y):
return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and other functions below it are designed to test the functionality of core._find_incompatible_args. However, since we do not call these functions (right?), these functions are skipped according to the result shown in code coverage. Any suggestion @seanlaw ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
FYI: To fix code coverage, I added # pragma: no cover here and for the next few functions.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 10, 2023

I really need to understand the importance of checking code coverage by heart :)

First, I recommend running the tests locally first for non-trivial PRs. Having said that, I'm going to add something to our coverage reporting to force it to fail if the coverage is below 100%. Hopefully, that'll help. I should've done it a long time ago

@NimaSarajpoor
Copy link
Collaborator Author

First, I recommend running the tests locally first for non-trivial PRs.

Right! Need to keep that in mind!

I'm going to add something to our coverage reporting to force it to fail if the coverage is below 100%.

Cool!! I think that would be a great idea!

@seanlaw
Copy link
Contributor

seanlaw commented Mar 10, 2023

@NimaSarajpoor I just pushed a new commit that I think/hope will cause a failure. Would you mind pulling it into this branch?

@NimaSarajpoor
Copy link
Collaborator Author

That was quick :) I will update my branch.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 11, 2023

Please pull the latest commit (the last one wasn't enough).

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I ran the test on google colab, and I got this:

Name                 Stmts   Miss  Cover   Missing
--------------------------------------------------
tests/naive.py        1216      1    99%   243
tests/test_core.py     993      7    99%   89, 1577, 1583, 1589, 1595, 1601, 1607
--------------------------------------------------
TOTAL                13037      8    99%

78 files skipped due to complete coverage.
Cleaning Up

I am going to push the commits...

@NimaSarajpoor
Copy link
Collaborator Author

We got error. That is good. I will wait till you handle the error occured in your last commit (You may want to see this: nedbat/coveragepy#198)

@seanlaw
Copy link
Contributor

seanlaw commented Mar 11, 2023

We got error. That is good. I will wait till you handle the error occured in your last commit (You may want to see this: nedbat/coveragepy#198)

It should be fixed now.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
Please let me know if I should take care of anything else for this PR.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 12, 2023

@NimaSarajpoor Everything looks good here. Merging now. Thanks!

@seanlaw seanlaw merged commit c06f0e9 into TDAmeritrade:main Mar 12, 2023
@NimaSarajpoor NimaSarajpoor deleted the subseq_constant_in_API branch September 4, 2023 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants