-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of matrix_profile feature #793
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Do you think you can also add some small tests for this feature?
And before we can merge, we would need the conda package (do not know how far you are already with that, just want to mention it)
m_p = mp.compute(x,**kwargs) | ||
|
||
else: | ||
m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the kwargs
are not used in the function call. Is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He should be using the threshold parameter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerwmarrs I don't think threshold needs to be set here, as we planned to go with the default value of 0.95, and that's already set in the maximum_sequence function. That said, I'll re-insert kwargs for maximum flexibility.
@@ -152,6 +152,8 @@ def __init__(self): | |||
"lempel_ziv_complexity": [{"bins": x} for x in [2, 3, 5, 10, 100]], | |||
"fourier_entropy": [{"bins": x} for x in [2, 3, 5, 10, 100]], | |||
"permutation_entropy": [{"tau": 1, "dimension": x} for x in [3, 4, 5, 6, 7]], | |||
"matrix_profile": [{"sample_pct": 1, "threshold": 0.98, "feature": f} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, as window
is not included, those kwargs are never used. Should we remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nils-braun per the earlier comment, I'll remove "sample_pct" and leave "threshold" for the time being.
requirements.txt
Outdated
@@ -8,3 +8,4 @@ scikit-learn>=0.19.2 | |||
tqdm>=4.10.0 | |||
dask[dataframe]>=2.9.0 | |||
distributed>=2.11.0 | |||
matrixprofile>=1.1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our versioning and the requirement of 1.1.7, this should be:
matrixprofile>=1.1.7<2.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Is there a reason we're specifying version < 2.0.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use semantic versioning like most packages. A version less than 2 should guarantee compatibility. See
|
||
else: | ||
m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1] | ||
return m_p[(~np.isnan(m_p)) & (~np.isinf(m_p))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to return a modified version of the matrix profile at this stage. What if additional features in the future handle imputation or something?
@nils-braun based on our conversation in the matrixprofile github issue, how do you want to handle any exception vs the specific "NoSolutionPossible" case?
return m_p
except mp.exceptions.NoSolutionPossible as e:
warnings.warn(str(e))
return None
except Exception as e:
# ?????????
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think a warning would make sense here! The question is, do we expect any other exception? If not, let's do not catch it and let it actually fail for the user
matrix_profiles[featureless_key] = _calculate_mp(**kwargs) | ||
|
||
m_p = matrix_profiles[featureless_key] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can find the finite indices and store them for functions that do not work on non-finite data.
finite_indices = np.finite(m_p)
|
||
|
||
if feature == "min": | ||
res[key] = np.min(m_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you would use the finite indices and in additional places where it makes sense. This is not really a performance hit because numpy is highly optimized when working with memory views.
res[key] = np.min(m_p[finite_indices])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerwmarrs this is a good callout. I like the idea of pulling out finite data later and leaving the full MP in case other features need it later.
@nils-braun tests are in there! Let me know if I should approach them differently. |
@set_property("fctype", "combiner") | ||
def matrix_profile(x, param): | ||
""" | ||
TODO: Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention, documentation is still missing :-)
@nils-braun thanks for the feedback! I'll update the documentation and try adding the NaN test. Would you happen to know why the checks above are failing? The error message isn't making sense to me (I checked the Matrix Profile code, and it runs just fine). |
Ok, all corrections have been made. |
I'll take a look later on. The error is basically saying that there is no valid indices from finite_indices variable. This could be that you are always returning [np.nan]. So something is always swallowing the real exception making it less obvious of what is really going on. If the dataset is the robot example that @nils-braun raised the issue in our repository about, he said every time he used a threshold and not a window, it always threw an exception because there is no correlation. |
Just to be sure (as @vanbenschoten said it is running fine): can you also reproduce the error locally? You can run the tests with |
@nils-braun to confirm, are you saying the code works when you run it?
…On Mon, Jan 18, 2021, 1:56 PM Nils Braun ***@***.***> wrote:
Just to be sure (as @vanbenschoten <https://github.com/vanbenschoten>
said it is running fine): can you also reproduce the error locally? You can
run the tests with pytest if you like
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#793 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53ISDEJD3W5RS4HMOGMG3S2SG6RANCNFSM4WDMNIMQ>
.
|
Ah sorry - no, it does also fail. I just thought it is working for you.
|
Got it - thanks for the clarification. I had meant that the Matrix Profile
code worked locally, so these errors makes sense. Unless @tylerwmarrs can
push a quick fix I'll take a look in a bit.
…On Mon, Jan 18, 2021, 2:41 PM Nils Braun ***@***.***> wrote:
Ah sorry - no, it does also fail. I just thought it is working for you.
Concerning the error, there are actually a few ones:
- TypeError: only integer scalar arrays can be conver...: this happens
when you return the list [np.NaN]. You need to turn it into a np.array
to use the np.isfinite function properly
- NameError: name 'NoSolutionPossible' is not defined your test misses
an import :-)
- TypeError: matrix_profile() missing 1 required positional argument:
'param' I do not know what is going on here - that is probably related
to your package. This is probably also true for TypeError:
matrix_profile() got an unexpected keyword argument 'windows'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#793 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53ISAUYERWKZHDKDU76E3S2SMGVANCNFSM4WDMNIMQ>
.
|
@vanbenschoten I cannot work on this. You need to just wrap the return as Nils mentions
|
@nils-braun Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here? |
These other failures are not unrelated :-) |
Ah, got it. Ok, I'll push some updated code later today.
…On Wed, Jan 20, 2021, 12:27 AM Nils Braun ***@***.***> wrote:
@nils-braun <https://github.com/nils-braun> Ok, all three Matrix Profile
tests are passing (the failures are stemming from unrelated parts of the
codebase)! What are next steps here?
These other failures are not unrelated :-)
They all boil down to the same problem:
You did just wrap everything with a big try catch block and return NaN on
exception. While I would generally not recommend doing so (catching known
and expected exceptions like the No Solution one is fine, but in the rest
of the code an exception is unexpected and should ready be seen), you are
also breaking the return type convention: the non-exception case returns a
list of tuples - feature name to float. Now you are just returning a float.
What I would recommend is what I have implemented at the very beginning:
only do this while calculating the actual matrix profile and only catch
those exception you expect.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#793 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53ISHFAE4EOPNQXI5RUFTS2ZZV7ANCNFSM4WDMNIMQ>
.
|
Just to make sure I'm understanding correctly, the expected feature return for the No Solution case should be: [('feature_"min"_threshold_0.98', NaN), Sorry for missing this the first time! |
@nils-braun I've updated the code to return the "expected feature" listed above, but I'm not sure what's going on with the other errors listed in the pytest runs. It seems as though the "feature" key in the dictionary isn't being passed through - as an example, line 1460 in the Python 3.6 (lowest) shows: param = [{'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}] This is odd, because "feature" is being explicitly set in settings.py. Any idea what might be taking place? If it's still my NaN return case let me know and I'll adjust :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I am currently involved in a lot of different things :-/ But with the one additional line I propose below all your tests should work!
|
||
return m_p | ||
|
||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still vote for only catching the NoSolutionException, but this is up to you to decide :-)
|
||
for kwargs in param: | ||
key = convert_to_output_format(kwargs) | ||
feature = kwargs.pop('feature') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that took some time for me to debug! And unfortunately I think it was me introducing the bug in one of my previous commits :-/
The problem is a bit complicated to describe, so here is the short version:
the parameters you are using here come from the settings object given to the extract_features
function. Due to reference/pointer magic happening in python, the kwargs
you are using here is actually the exact one stored in the settings object. If you now use these setings twice in the same test (which all of those failed tests do), you actually remove features
from the original settings object and will not be present the next time :-)
So, simple fix: add kwargs = kwargs.copy()
before that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I've updated the code to reflect this :)
m_p = matrix_profiles[featureless_key] | ||
|
||
#Set all features to nan if Matrix Profile is nan (cannot be computed) | ||
if len(m_p) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the len be 1 also for "normal" cases? If that can ever happen, I would propose to make this one simpler:
do not return [np.nan]
on errors, but actually None
and here only check for if m_p is None
- I also think that is more pythonic, but that might be a matter of taste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news is that the length cannot be 1 for normal cases, otherwise I'd definitely go with your approach.
Codecov Report
@@ Coverage Diff @@
## main #793 +/- ##
==========================================
- Coverage 95.88% 95.87% -0.02%
==========================================
Files 18 18
Lines 1774 1817 +43
Branches 347 358 +11
==========================================
+ Hits 1701 1742 +41
- Misses 36 37 +1
- Partials 37 38 +1
Continue to review full report at Codecov.
|
@nils-braun tests are passing now! Thanks much for your help. What are next steps here? |
There is just a minor style error - but once you did also resolve these, I will merge:
|
Sorry, just noticed I didn't push my style updates :/ Pushing now. |
All set - @nils-braun over to you! |
Nice, its in! |
Thanks! I greatly enjoyed working together - excited to keep partnering in
the future!
…On Mon, Jan 25, 2021, 3:29 PM Nils Braun ***@***.***> wrote:
Merged #793 <#793> into main.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#793 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB53ISA3PM43WU2HVWS2QY3S3XPDXANCNFSM4WDMNIMQ>
.
|
@nils-braun @tylerwmarrs let me know what you think!