Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of matrix_profile feature #793

Merged
merged 14 commits into from
Jan 25, 2021
Merged

Conversation

vanbenschoten
Copy link
Contributor

@nils-braun @tylerwmarrs let me know what you think!

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Do you think you can also add some small tests for this feature?
And before we can merge, we would need the conda package (do not know how far you are already with that, just want to mention it)

m_p = mp.compute(x,**kwargs)

else:
m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the kwargs are not used in the function call. Is this on purpose?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He should be using the threshold parameter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerwmarrs I don't think threshold needs to be set here, as we planned to go with the default value of 0.95, and that's already set in the maximum_sequence function. That said, I'll re-insert kwargs for maximum flexibility.

@@ -152,6 +152,8 @@ def __init__(self):
"lempel_ziv_complexity": [{"bins": x} for x in [2, 3, 5, 10, 100]],
"fourier_entropy": [{"bins": x} for x in [2, 3, 5, 10, 100]],
"permutation_entropy": [{"tau": 1, "dimension": x} for x in [3, 4, 5, 6, 7]],
"matrix_profile": [{"sample_pct": 1, "threshold": 0.98, "feature": f}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, as window is not included, those kwargs are never used. Should we remove them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nils-braun per the earlier comment, I'll remove "sample_pct" and leave "threshold" for the time being.

requirements.txt Outdated
@@ -8,3 +8,4 @@ scikit-learn>=0.19.2
tqdm>=4.10.0
dask[dataframe]>=2.9.0
distributed>=2.11.0
matrixprofile>=1.1.6

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our versioning and the requirement of 1.1.7, this should be:

matrixprofile>=1.1.7<2.0.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Is there a reason we're specifying version < 2.0.0?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use semantic versioning like most packages. A version less than 2 should guarantee compatibility. See

https://link.medium.com/rIKVIvX34cb


else:
m_p = mp.algorithms.maximum_subsequence(x, include_pmp=True)['pmp'][-1]
return m_p[(~np.isnan(m_p)) & (~np.isinf(m_p))]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to return a modified version of the matrix profile at this stage. What if additional features in the future handle imputation or something?

@nils-braun based on our conversation in the matrixprofile github issue, how do you want to handle any exception vs the specific "NoSolutionPossible" case?

    return m_p
except mp.exceptions.NoSolutionPossible as e:
    warnings.warn(str(e))
    return None
except Exception as e:
    # ?????????
    return None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think a warning would make sense here! The question is, do we expect any other exception? If not, let's do not catch it and let it actually fail for the user

matrix_profiles[featureless_key] = _calculate_mp(**kwargs)

m_p = matrix_profiles[featureless_key]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can find the finite indices and store them for functions that do not work on non-finite data.

finite_indices = np.finite(m_p)



if feature == "min":
res[key] = np.min(m_p)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you would use the finite indices and in additional places where it makes sense. This is not really a performance hit because numpy is highly optimized when working with memory views.

res[key] = np.min(m_p[finite_indices])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerwmarrs this is a good callout. I like the idea of pulling out finite data later and leaving the full MP in case other features need it later.

@vanbenschoten
Copy link
Contributor Author

@nils-braun tests are in there! Let me know if I should approach them differently.

@set_property("fctype", "combiner")
def matrix_profile(x, param):
"""
TODO: Documentation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention, documentation is still missing :-)

@vanbenschoten
Copy link
Contributor Author

@nils-braun thanks for the feedback! I'll update the documentation and try adding the NaN test.

Would you happen to know why the checks above are failing? The error message isn't making sense to me (I checked the Matrix Profile code, and it runs just fine).

@vanbenschoten
Copy link
Contributor Author

Ok, all corrections have been made.

@tylerwmarrs
Copy link

tylerwmarrs commented Jan 17, 2021

@nils-braun thanks for the feedback! I'll update the documentation and try adding the NaN test.

Would you happen to know why the checks above are failing? The error message isn't making sense to me (I checked the Matrix Profile code, and it runs just fine).

I'll take a look later on. The error is basically saying that there is no valid indices from finite_indices variable. This could be that you are always returning [np.nan]. So something is always swallowing the real exception making it less obvious of what is really going on. If the dataset is the robot example that @nils-braun raised the issue in our repository about, he said every time he used a threshold and not a window, it always threw an exception because there is no correlation.

@nils-braun
Copy link
Collaborator

Just to be sure (as @vanbenschoten said it is running fine): can you also reproduce the error locally? You can run the tests with pytest if you like

@vanbenschoten
Copy link
Contributor Author

vanbenschoten commented Jan 18, 2021 via email

@nils-braun
Copy link
Collaborator

Ah sorry - no, it does also fail. I just thought it is working for you.
Concerning the error, there are actually a few ones:

  • TypeError: only integer scalar arrays can be conver...: this happens when you return the list [np.NaN]. You need to turn it into a np.array to use the np.isfinite function properly
  • NameError: name 'NoSolutionPossible' is not defined your test misses an import :-)
  • TypeError: matrix_profile() missing 1 required positional argument: 'param' I do not know what is going on here - that is probably related to your package. This is probably also true for TypeError: matrix_profile() got an unexpected keyword argument 'windows'

@vanbenschoten
Copy link
Contributor Author

vanbenschoten commented Jan 18, 2021 via email

@tylerwmarrs
Copy link

Ah sorry - no, it does also fail. I just thought it is working for you.
Concerning the error, there are actually a few ones:

  • TypeError: only integer scalar arrays can be conver...: this happens when you return the list [np.NaN]. You need to turn it into a np.array to use the np.isfinite function properly

@vanbenschoten I cannot work on this. You need to just wrap the return as Nils mentions

np.array([np.nan])

@vanbenschoten
Copy link
Contributor Author

@nils-braun Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here?

@nils-braun
Copy link
Collaborator

@nils-braun Ok, all three Matrix Profile tests are passing (the failures are stemming from unrelated parts of the codebase)! What are next steps here?

These other failures are not unrelated :-)
They all boil down to the same problem:
You did just wrap everything with a big try catch block and return NaN on exception. While I would generally not recommend doing so (catching known and expected exceptions like the No Solution one is fine, but in the rest of the code an exception is unexpected and should ready be seen), you are also breaking the return type convention: the non-exception case returns a list of tuples - feature name to float. Now you are just returning a float.
What I would recommend is what I have implemented at the very beginning: only do this while calculating the actual matrix profile and only catch those exception you expect.

@vanbenschoten
Copy link
Contributor Author

vanbenschoten commented Jan 20, 2021 via email

@vanbenschoten
Copy link
Contributor Author

Just to make sure I'm understanding correctly, the expected feature return for the No Solution case should be:

[('feature_"min"_threshold_0.98', NaN),
('feature
"max"_threshold_0.98', NaN),
('feature
"mean"_threshold_0.98', NaN),
('feature
"median"_threshold_0.98', NaN),
('feature
"25"_threshold_0.98', NaN),
('feature
"75"__threshold_0.98', NaN)]

Sorry for missing this the first time!

@vanbenschoten
Copy link
Contributor Author

@nils-braun I've updated the code to return the "expected feature" listed above, but I'm not sure what's going on with the other errors listed in the pytest runs. It seems as though the "feature" key in the dictionary isn't being passed through - as an example, line 1460 in the Python 3.6 (lowest) shows:

param = [{'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}, {'threshold': 0.98}]

This is odd, because "feature" is being explicitly set in settings.py. Any idea what might be taking place? If it's still my NaN return case let me know and I'll adjust :)

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I am currently involved in a lot of different things :-/ But with the one additional line I propose below all your tests should work!


return m_p

except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still vote for only catching the NoSolutionException, but this is up to you to decide :-)


for kwargs in param:
key = convert_to_output_format(kwargs)
feature = kwargs.pop('feature')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that took some time for me to debug! And unfortunately I think it was me introducing the bug in one of my previous commits :-/
The problem is a bit complicated to describe, so here is the short version:
the parameters you are using here come from the settings object given to the extract_features function. Due to reference/pointer magic happening in python, the kwargs you are using here is actually the exact one stored in the settings object. If you now use these setings twice in the same test (which all of those failed tests do), you actually remove features from the original settings object and will not be present the next time :-)
So, simple fix: add kwargs = kwargs.copy() before that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I've updated the code to reflect this :)

m_p = matrix_profiles[featureless_key]

#Set all features to nan if Matrix Profile is nan (cannot be computed)
if len(m_p) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the len be 1 also for "normal" cases? If that can ever happen, I would propose to make this one simpler:
do not return [np.nan] on errors, but actually None and here only check for if m_p is None- I also think that is more pythonic, but that might be a matter of taste

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good news is that the length cannot be 1 for normal cases, otherwise I'd definitely go with your approach.

@codecov-io
Copy link

codecov-io commented Jan 24, 2021

Codecov Report

Merging #793 (d7e3c50) into main (c071fd8) will decrease coverage by 0.01%.
The diff coverage is 95.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   95.88%   95.87%   -0.02%     
==========================================
  Files          18       18              
  Lines        1774     1817      +43     
  Branches      347      358      +11     
==========================================
+ Hits         1701     1742      +41     
- Misses         36       37       +1     
- Partials       37       38       +1     
Impacted Files Coverage Δ
tsfresh/feature_extraction/settings.py 100.00% <ø> (ø)
tsfresh/feature_extraction/feature_calculators.py 97.19% <95.00%> (-0.14%) ⬇️
tsfresh/feature_selection/relevance.py 95.28% <0.00%> (ø)
tsfresh/transformers/relevant_feature_augmenter.py 94.87% <0.00%> (+0.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c071fd8...d7e3c50. Read the comment docs.

@vanbenschoten
Copy link
Contributor Author

@nils-braun tests are passing now! Thanks much for your help. What are next steps here?

@nils-braun
Copy link
Collaborator

There is just a minor style error - but once you did also resolve these, I will merge:

./tsfresh/feature_extraction/feature_calculators.py:2226:121: E501 line too long (151 > 120 characters) ./tsfresh/feature_extraction/feature_calculators.py:2243:121: E501 line too long (123 > 120 characters) ./tsfresh/feature_extraction/feature_calculators.py:2246:35: E231 missing whitespace after ',' ./tsfresh/feature_extraction/feature_calculators.py:2249:76: E231 missing whitespace after ',' ./tsfresh/feature_extraction/feature_calculators.py:2253:9: E722 do not use bare 'except' ./tsfresh/feature_extraction/feature_calculators.py:2263:5: E303 too many blank lines (2) ./tsfresh/feature_extraction/feature_calculators.py:2274:9: E265 block comment should start with '# ' ./tsfresh/feature_extraction/feature_calculators.py:2278:9: E265 block comment should start with '# ' ./tsfresh/feature_extraction/feature_calculators.py:2284:13: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1306:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1314:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1315:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1316:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1317:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1318:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1319:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1322:49: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1322:68: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1325:5: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1326:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1335:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1336:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1337:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1338:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1339:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1340:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1343:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1344:69: E231 missing whitespace after ',' ./tests/units/feature_extraction/test_feature_calculations.py:1347:5: E303 too many blank lines (2) ./tests/units/feature_extraction/test_feature_calculations.py:1348:9: E265 block comment should start with '# ' ./tests/units/feature_extraction/test_feature_calculations.py:1353:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1354:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1355:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1356:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1357:9: E122 continuation line missing indentation or outdented ./tests/units/feature_extraction/test_feature_calculations.py:1358:9: E122 continuation line missing indentation or outdented

@vanbenschoten
Copy link
Contributor Author

Sorry, just noticed I didn't push my style updates :/ Pushing now.

@vanbenschoten
Copy link
Contributor Author

All set - @nils-braun over to you!

@nils-braun
Copy link
Collaborator

Nice, its in!

@nils-braun nils-braun merged commit 04b473f into blue-yonder:main Jan 25, 2021
@vanbenschoten
Copy link
Contributor Author

vanbenschoten commented Jan 25, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants