Addition of matrix_profile feature #793

vanbenschoten · 2021-01-15T02:27:46Z

@nils-braun @tylerwmarrs let me know what you think!

nils-braun

Thanks!
Do you think you can also add some small tests for this feature?
And before we can merge, we would need the conda package (do not know how far you are already with that, just want to mention it)

nils-braun · 2021-01-15T08:13:55Z

tsfresh/feature_extraction/feature_calculators.py

+                m_p = mp.compute(x,**kwargs)
+
+            else:
+                m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1]


In this case the kwargs are not used in the function call. Is this on purpose?

He should be using the threshold parameter here.

@tylerwmarrs I don't think threshold needs to be set here, as we planned to go with the default value of 0.95, and that's already set in the maximum_sequence function. That said, I'll re-insert kwargs for maximum flexibility.

nils-braun · 2021-01-15T08:15:17Z

tsfresh/feature_extraction/settings.py

@@ -152,6 +152,8 @@ def __init__(self):
            "lempel_ziv_complexity": [{"bins": x} for x in [2, 3, 5, 10, 100]],
            "fourier_entropy":  [{"bins": x} for x in [2, 3, 5, 10, 100]],
            "permutation_entropy":  [{"tau": 1, "dimension": x} for x in [3, 4, 5, 6, 7]],
+                        "matrix_profile": [{"sample_pct": 1, "threshold": 0.98, "feature": f}


Currently, as window is not included, those kwargs are never used. Should we remove them?

@nils-braun per the earlier comment, I'll remove "sample_pct" and leave "threshold" for the time being.

tylerwmarrs · 2021-01-15T10:48:57Z

requirements.txt

@@ -8,3 +8,4 @@ scikit-learn>=0.19.2
 tqdm>=4.10.0
 dask[dataframe]>=2.9.0
 distributed>=2.11.0
+matrixprofile>=1.1.6


Based on our versioning and the requirement of 1.1.7, this should be:

matrixprofile>=1.1.7<2.0.0

Good catch. Is there a reason we're specifying version < 2.0.0?

We use semantic versioning like most packages. A version less than 2 should guarantee compatibility. See

https://link.medium.com/rIKVIvX34cb

tylerwmarrs · 2021-01-15T11:00:12Z

tsfresh/feature_extraction/feature_calculators.py

+
+            else:
+                m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1]
+            return m_p[(~np.isnan(m_p)) & (~np.isinf(m_p))]


I don't think you want to return a modified version of the matrix profile at this stage. What if additional features in the future handle imputation or something?

@nils-braun based on our conversation in the matrixprofile github issue, how do you want to handle any exception vs the specific "NoSolutionPossible" case?

return m_p except mp.exceptions.NoSolutionPossible as e: warnings.warn(str(e)) return None except Exception as e: # ????????? return None

Yes, I think a warning would make sense here! The question is, do we expect any other exception? If not, let's do not catch it and let it actually fail for the user

tylerwmarrs · 2021-01-15T11:04:29Z

tsfresh/feature_extraction/feature_calculators.py

+            matrix_profiles[featureless_key] = _calculate_mp(**kwargs)
+
+        m_p = matrix_profiles[featureless_key]
+


Here you can find the finite indices and store them for functions that do not work on non-finite data.

finite_indices = np.finite(m_p)

tylerwmarrs · 2021-01-15T11:06:37Z

tsfresh/feature_extraction/feature_calculators.py

+
+
+        if feature == "min":
+            res[key] = np.min(m_p)


Here you would use the finite indices and in additional places where it makes sense. This is not really a performance hit because numpy is highly optimized when working with memory views.

res[key] = np.min(m_p[finite_indices])

@tylerwmarrs this is a good callout. I like the idea of pulling out finite data later and leaving the full MP in case other features need it later.

vanbenschoten · 2021-01-16T01:37:39Z

@nils-braun tests are in there! Let me know if I should approach them differently.

nils-braun · 2021-01-16T12:37:16Z

tsfresh/feature_extraction/feature_calculators.py

+@set_property("fctype", "combiner")
+def matrix_profile(x, param):
+    """
+    TODO: Documentation


Just to mention, documentation is still missing :-)

tests/units/feature_extraction/test_feature_calculations.py

vanbenschoten · 2021-01-16T13:08:28Z

@nils-braun thanks for the feedback! I'll update the documentation and try adding the NaN test.

Would you happen to know why the checks above are failing? The error message isn't making sense to me (I checked the Matrix Profile code, and it runs just fine).

vanbenschoten · 2021-01-16T13:32:07Z

Ok, all corrections have been made.

tylerwmarrs · 2021-01-17T15:29:58Z

@nils-braun thanks for the feedback! I'll update the documentation and try adding the NaN test.

Would you happen to know why the checks above are failing? The error message isn't making sense to me (I checked the Matrix Profile code, and it runs just fine).

I'll take a look later on. The error is basically saying that there is no valid indices from finite_indices variable. This could be that you are always returning [np.nan]. So something is always swallowing the real exception making it less obvious of what is really going on. If the dataset is the robot example that @nils-braun raised the issue in our repository about, he said every time he used a threshold and not a window, it always threw an exception because there is no correlation.

nils-braun · 2021-01-18T19:56:09Z

Just to be sure (as @vanbenschoten said it is running fine): can you also reproduce the error locally? You can run the tests with pytest if you like

vanbenschoten · 2021-01-18T20:08:59Z

@nils-braun to confirm, are you saying the code works when you run it?

…

On Mon, Jan 18, 2021, 1:56 PM Nils Braun ***@***.***> wrote: Just to be sure (as @vanbenschoten <https://github.com/vanbenschoten> said it is running fine): can you also reproduce the error locally? You can run the tests with pytest if you like — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#793 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB53ISDEJD3W5RS4HMOGMG3S2SG6RANCNFSM4WDMNIMQ> .

nils-braun · 2021-01-18T20:40:59Z

Ah sorry - no, it does also fail. I just thought it is working for you.
Concerning the error, there are actually a few ones:

TypeError: only integer scalar arrays can be conver...: this happens when you return the list [np.NaN]. You need to turn it into a np.array to use the np.isfinite function properly
NameError: name 'NoSolutionPossible' is not defined your test misses an import :-)
TypeError: matrix_profile() missing 1 required positional argument: 'param' I do not know what is going on here - that is probably related to your package. This is probably also true for TypeError: matrix_profile() got an unexpected keyword argument 'windows'

vanbenschoten · 2021-01-18T20:44:34Z

Got it - thanks for the clarification. I had meant that the Matrix Profile code worked locally, so these errors makes sense. Unless @tylerwmarrs can push a quick fix I'll take a look in a bit.

…

On Mon, Jan 18, 2021, 2:41 PM Nils Braun ***@***.***> wrote: Ah sorry - no, it does also fail. I just thought it is working for you. Concerning the error, there are actually a few ones: - TypeError: only integer scalar arrays can be conver...: this happens when you return the list [np.NaN]. You need to turn it into a np.array to use the np.isfinite function properly - NameError: name 'NoSolutionPossible' is not defined your test misses an import :-) - TypeError: matrix_profile() missing 1 required positional argument: 'param' I do not know what is going on here - that is probably related to your package. This is probably also true for TypeError: matrix_profile() got an unexpected keyword argument 'windows' — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#793 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB53ISAUYERWKZHDKDU76E3S2SMGVANCNFSM4WDMNIMQ> .

tylerwmarrs · 2021-01-18T21:05:59Z

Ah sorry - no, it does also fail. I just thought it is working for you.
Concerning the error, there are actually a few ones:

TypeError: only integer scalar arrays can be conver...: this happens when you return the list [np.NaN]. You need to turn it into a np.array to use the np.isfinite function properly

@vanbenschoten I cannot work on this. You need to just wrap the return as Nils mentions

np.array([np.nan])

vanbenschoten · 2021-01-20T02:41:33Z

@nils-braun Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here?

nils-braun · 2021-01-20T06:27:29Z

@nils-braun Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here?

These other failures are not unrelated :-)
They all boil down to the same problem:
You did just wrap everything with a big try catch block and return NaN on exception. While I would generally not recommend doing so (catching known and expected exceptions like the No Solution one is fine, but in the rest of the code an exception is unexpected and should ready be seen), you are also breaking the return type convention: the non-exception case returns a list of tuples - feature name to float. Now you are just returning a float.
What I would recommend is what I have implemented at the very beginning: only do this while calculating the actual matrix profile and only catch those exception you expect.

vanbenschoten · 2021-01-20T12:25:17Z

Ah, got it. Ok, I'll push some updated code later today.

…

On Wed, Jan 20, 2021, 12:27 AM Nils Braun ***@***.***> wrote: @nils-braun <https://github.com/nils-braun> Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here? These other failures are not unrelated :-) They all boil down to the same problem: You did just wrap everything with a big try catch block and return NaN on exception. While I would generally not recommend doing so (catching known and expected exceptions like the No Solution one is fine, but in the rest of the code an exception is unexpected and should ready be seen), you are also breaking the return type convention: the non-exception case returns a list of tuples - feature name to float. Now you are just returning a float. What I would recommend is what I have implemented at the very beginning: only do this while calculating the actual matrix profile and only catch those exception you expect. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#793 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB53ISHFAE4EOPNQXI5RUFTS2ZZV7ANCNFSM4WDMNIMQ> .

vanbenschoten · 2021-01-20T12:36:48Z

Just to make sure I'm understanding correctly, the expected feature return for the No Solution case should be:

[('feature_"min"_threshold_0.98', NaN),
('feature"max"_threshold_0.98', NaN),
('feature"mean"_threshold_0.98', NaN),
('feature"median"_threshold_0.98', NaN),
('feature"25"_threshold_0.98', NaN),
('feature"75"__threshold_0.98', NaN)]

Sorry for missing this the first time!

vanbenschoten · 2021-01-20T18:59:11Z

@nils-braun I've updated the code to return the "expected feature" listed above, but I'm not sure what's going on with the other errors listed in the pytest runs. It seems as though the "feature" key in the dictionary isn't being passed through - as an example, line 1460 in the Python 3.6 (lowest) shows:

param = [{'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}]

This is odd, because "feature" is being explicitly set in settings.py. Any idea what might be taking place? If it's still my NaN return case let me know and I'll adjust :)

nils-braun

Sorry for the late reply, I am currently involved in a lot of different things :-/ But with the one additional line I propose below all your tests should work!

nils-braun · 2021-01-20T21:51:06Z

tsfresh/feature_extraction/feature_calculators.py

+
+            return m_p
+
+        except:


I would still vote for only catching the NoSolutionException, but this is up to you to decide :-)

nils-braun · 2021-01-24T18:04:44Z

tsfresh/feature_extraction/feature_calculators.py

+
+    for kwargs in param:
+        key = convert_to_output_format(kwargs)
+        feature = kwargs.pop('feature')


Sorry, that took some time for me to debug! And unfortunately I think it was me introducing the bug in one of my previous commits :-/
The problem is a bit complicated to describe, so here is the short version:
the parameters you are using here come from the settings object given to the extract_features function. Due to reference/pointer magic happening in python, the kwargs you are using here is actually the exact one stored in the settings object. If you now use these setings twice in the same test (which all of those failed tests do), you actually remove features from the original settings object and will not be present the next time :-)
So, simple fix: add kwargs = kwargs.copy() before that.

No worries! I've updated the code to reflect this :)

nils-braun · 2021-01-24T18:07:31Z

tsfresh/feature_extraction/feature_calculators.py

+        m_p = matrix_profiles[featureless_key]
+
+        #Set all features to nan if Matrix Profile is nan (cannot be computed)
+        if len(m_p) == 1:


Can the len be 1 also for "normal" cases? If that can ever happen, I would propose to make this one simpler:
do not return [np.nan] on errors, but actually None and here only check for if m_p is None- I also think that is more pythonic, but that might be a matter of taste

The good news is that the length cannot be 1 for normal cases, otherwise I'd definitely go with your approach.

codecov-io · 2021-01-24T19:07:55Z

Codecov Report

Merging #793 (d7e3c50) into main (c071fd8) will decrease coverage by 0.01%.
The diff coverage is 95.00%.

@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   95.88%   95.87%   -0.02%     
==========================================
  Files          18       18              
  Lines        1774     1817      +43     
  Branches      347      358      +11     
==========================================
+ Hits         1701     1742      +41     
- Misses         36       37       +1     
- Partials       37       38       +1

Impacted Files	Coverage Δ
tsfresh/feature_extraction/settings.py	`100.00% <ø> (ø)`
tsfresh/feature_extraction/feature_calculators.py	`97.19% <95.00%> (-0.14%)`	⬇️
tsfresh/feature_selection/relevance.py	`95.28% <0.00%> (ø)`
tsfresh/transformers/relevant_feature_augmenter.py	`94.87% <0.00%> (+0.20%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c071fd8...d7e3c50. Read the comment docs.

vanbenschoten · 2021-01-24T19:12:39Z

@nils-braun tests are passing now! Thanks much for your help. What are next steps here?

nils-braun · 2021-01-24T19:15:18Z

There is just a minor style error - but once you did also resolve these, I will merge:

./tsfresh/feature_extraction/feature_calculators.py:2226:121: E501 line too long (151 > 120 characters) ./tsfresh/feature_extraction/feature_calculators.py:2243:121: E501 line too long (123 > 120 characters) ./tsfresh/feature_extraction/feature_calculators.py:2246:35: E231 missing whitespace after ',' ./tsfresh/feature_extraction/feature_calculators.py:2249:76: E231 missing whitespace after ',' ./tsfresh/feature_extraction/feature_calculators.py:2253:9: E722 do not use bare 'except' ./tsfresh/feature_extraction/feature_calculators.py:2263:5: E303 too many blank lines (2) ./tsfresh/feature_extraction/feature_calculators.py:2274:9: E265 block comment should start with '# ' ./tsfresh/feature_extraction/feature_calculators.py:2278:9: E265 block comment should start with '# ' ./tsfresh/feature_extraction/feature_calculators.py:2284:13: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1306:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1314:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1315:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1316:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1317:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1318:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1319:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1322:49: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1322:68: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1325:5: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1326:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1335:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1336:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1337:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1338:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1339:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1340:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1343:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1344:69: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1347:5: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1348:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1353:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1354:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1355:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1356:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1357:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1358:9: E122 continuation line missing indentation or outdented

vanbenschoten · 2021-01-24T19:18:02Z

Sorry, just noticed I didn't push my style updates :/ Pushing now.

vanbenschoten · 2021-01-24T19:31:06Z

All set - @nils-braun over to you!

nils-braun · 2021-01-25T21:29:06Z

Nice, its in!

vanbenschoten · 2021-01-25T21:49:40Z

Thanks! I greatly enjoyed working together - excited to keep partnering in the future!

…

On Mon, Jan 25, 2021, 3:29 PM Nils Braun ***@***.***> wrote: Merged #793 <#793> into main. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#793 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB53ISA3PM43WU2HVWS2QY3S3XPDXANCNFSM4WDMNIMQ> .

New code for Matrix Profile

24c47b8

nils-braun reviewed Jan 15, 2021

View reviewed changes

tylerwmarrs reviewed Jan 15, 2021

View reviewed changes

vanbenschoten added 2 commits January 15, 2021 19:14

PR revision post first comments

9b52630

Adding tests

4dd8d36

vanbenschoten added 3 commits January 15, 2021 19:47

Fixed index vs value issue

a51c840

forgot a quotation mark

f4e4f34

isfinite instead of finite

202f1d3

nils-braun reviewed Jan 16, 2021

View reviewed changes

Nan test plus documentation

8c63411

Get the right version of matrixprofile listed

faa5bb8

vanbenschoten added 2 commits January 18, 2021 15:36

Minor file updates

04ac3e5

Working tests

0c76e85

Updating Nan case to return Nan feature value

63ee5b7

nils-braun reviewed Jan 24, 2021

View reviewed changes

nils-braun mentioned this pull request Jan 24, 2021

First test implementation of a matrix_profile feature #787

Closed

Minor fiz

d7e3c50

Style updates

4979fbc

Imports are still a thing

4219122

nils-braun merged commit 04b473f into blue-yonder:main Jan 25, 2021

		matrix_profiles[featureless_key] = _calculate_mp(**kwargs)

		m_p = matrix_profiles[featureless_key]

Addition of matrix_profile feature #793

Addition of matrix_profile feature #793

Conversation

vanbenschoten commented Jan 15, 2021

nils-braun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanbenschoten commented Jan 16, 2021

Choose a reason for hiding this comment

vanbenschoten commented Jan 16, 2021

vanbenschoten commented Jan 16, 2021

tylerwmarrs commented Jan 17, 2021 • edited Loading

nils-braun commented Jan 18, 2021

vanbenschoten commented Jan 18, 2021 via email

nils-braun commented Jan 18, 2021

vanbenschoten commented Jan 18, 2021 via email

tylerwmarrs commented Jan 18, 2021

vanbenschoten commented Jan 20, 2021

nils-braun commented Jan 20, 2021

vanbenschoten commented Jan 20, 2021 via email

vanbenschoten commented Jan 20, 2021

vanbenschoten commented Jan 20, 2021

nils-braun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 24, 2021 • edited Loading

Codecov Report

vanbenschoten commented Jan 24, 2021

nils-braun commented Jan 24, 2021

vanbenschoten commented Jan 24, 2021

vanbenschoten commented Jan 24, 2021

nils-braun commented Jan 25, 2021

vanbenschoten commented Jan 25, 2021 via email

tylerwmarrs commented Jan 17, 2021 •

edited

Loading

codecov-io commented Jan 24, 2021 •

edited

Loading