-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved speed #681
Improved speed #681
Conversation
…on from Wikipedia
You have style errors. See them below. ./tsfresh/scripts/measure_execution_time.py:163:34: W292 no newline at end of file |
I posted my findings on the execution time studies here: https://nils-braun.github.io/execution-time/ |
@kempa-liehr Do you have some spare time to have a look into this PR? |
# This script extracts the execution time for | ||
# various different settings of tsfresh | ||
# using different input data | ||
# Attention: it will run for ~half a day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on how many cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On two. I have described the machine setup on my blog-post: https://nils-braun.github.io/execution-time/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe to be more clear:
it does not matter on how many cores, as the number of cores is a parameter which is set to 0, 1 and 4 for the tests (to see the scaling).
But I did my studies on a google cloud 2-core (4 threads) virtual machine.
@@ -794,7 +794,7 @@ def test_sample_entropy(self): | |||
ts = [1, 4, 5, 1, 7, 3, 1, 2, 5, 8, 9, 7, 3, 7, 9, 5, 4, 3, 9, 1, 2, 3, 4, 2, 9, 6, 7, 4, 9, 2, 9, 9, 6, 5, 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some more unit tests for the sample_entroy?
I am missing:
- short time series
- negative values in time series
- only negative values in time series
- np.NaN in time series
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more tests and some documentation on the tests in my last commit.
N = len(x) | ||
|
||
# Split time series and save all templates of length m | ||
xmi = np.array([x[i:i + m] for i in range(N - m)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we replace those lines with https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. np.split
splits without overlap.
We want to turn the array [1, 2, 3, 4]
into [1, 2], [2, 3], [3, 4]
but np.split
would only give [1, 2], [3, 4]
.
Thanks for the review @MaxBenChrist. Your comments were reasonable and I implemented your feedback. |
You have style errors. See them below. ./tsfresh/examples/driftbif_simulation.py:135:22: E741 ambiguous variable name 'l' |
IssueID #3924: v0.17.9 - Readded baseline unit tests - Revert to the original sum_of_reoccurring_values v0.4.0 method which was changed and the new feature called sum_of_reoccurring_data_points was added which results in the same value as the original v0.4.0 sum_of_reoccurring_values method. The new sum_of_reoccurring_values method introduced results in different results as per: NOT in baseline :: [['value__sum_of_reoccurring_values', '49922.0']] NOT in calculated :: [['value__sum_of_reoccurring_values', '109822.0']] - Disable estimate_friedrich_coefficients feature added in v0.6.0 - Disable friedrich_coefficients feature added in v0.6.0 - Disabled max_langevin_fixed_point added in v0.6.0 - Disabled friedrich_coefficients and max_langevin_fixed_point in settings added in v0.6.0 - Updated very minor precision changes in the following features which changed in v0.6.0 value__autocorrelation__lag_6 old: 0.5124801685138611, new: 0.5124801685138614, diff: -0.00000000000000022204 value__autocorrelation__lag_8 old: 0.3600822542968588, new: 0.3600822542968586, diff: 0.00000000000000022204 value__autocorrelation__lag_5 old: 0.46463952576506423, new: 0.46463952576506445, diff: -0.00000000000000022204 value__autocorrelation__lag_1 old: 0.5154799442499527, new: 0.5154799442499526, diff: 0.00000000000000011102 value__autocorrelation__lag_7 old: 0.6538534951469427, new: 0.6538534951469428, diff: -0.00000000000000011102 value__autocorrelation__lag_2 old: 0.36765813197781533, new: 0.36765813197781516, diff: 0.00000000000000016653 value__autocorrelation__lag_9 old: 0.21748400096837436, new: 0.21748400096837414, diff: 0.00000000000000022204 value__augmented_dickey_fuller old: -0.8041220342033505, new: -0.8041220342033477, diff: -0.00000000000000277556 value__mean_autocorrelation old: 1.1720475293977406, new: 1.1720475293977404, diff: 0.00000000000000022204 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_2" old: -40.265846960764975, new: -40.26584696076512, diff: 0.00000000000014210855 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_2" old: 5485.741180131765, new: 5485.741180131762, diff: 0.00000000000272848411 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_2" old: 7535.022844459651, new: 7535.02284445965, diff: 0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_2" old: 6017.192007927548, new: 6017.192007927546, diff: 0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_2" old: 3308.4304014332156, new: 3308.4304014332133, diff: 0.00000000000227373675 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_2" old: 1295.7433671924819, new: 1295.7433671924832, diff: -0.00000000000136424205 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_2" old: 39.916767258584514, new: 39.91676725858371, diff: 0.00000000000080291329 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_2" old: 17.955485691823014, new: 17.95548569182395, diff: -0.00000000000093436370 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_2" old: 50.259030087877306, new: 50.25903008787768, diff: -0.00000000000037658765 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_2" old: 35.90470247450105, new: 35.90470247450137, diff: -0.00000000000031974423 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_2" old: -24.14602386100944, new: -24.14602386100941, diff: -0.00000000000002842171 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_2" old: -61.88712524130847, new: -61.88712524130824, diff: -0.00000000000022737368 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_2" old: -33.668504325219715, new: -33.66850432521918, diff: -0.00000000000053290705 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_2" old: 24.20883821024688, new: 24.2088382102474, diff: -0.00000000000051869620 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_5" old: -20.257597134272146, new: -20.25759713427192, diff: -0.00000000000022737368 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_5" old: 3771.325441515319, new: 3771.32544151532, diff: -0.00000000000090949470 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_2__w_5" old: 7120.960920890311, new: 7120.960920890312, diff: -0.00000000000090949470 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_5" old: 11207.92940647991, new: 11207.929406479912, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_5" old: 11696.157551031656, new: 11696.157551031654, diff: 0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_5" old: 11253.943680982826, new: 11253.943680982822, diff: 0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_5" old: 10110.89944351567, new: 10110.899443515671, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_5" old: 8545.47382821769, new: 8545.473828217693, diff: -0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_5" old: 6826.238621617836, new: 6826.238621617837, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_5" old: 5169.353887616803, new: 5169.353887616802, diff: 0.00000000000090949470 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_5" old: 3717.969303101324, new: 3717.9693031013257, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_5" old: 2542.0196875693546, new: 2542.019687569354, diff: 0.00000000000045474735 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_5" old: 1652.101855511854, new: 1652.1018555118546, diff: -0.00000000000068212103 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_5" old: 1019.5707851504084, new: 1019.5707851504081, diff: 0.00000000000022737368 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_10" old: 836.6419785398183, new: 836.6419785398173, diff: 0.00000000000102318154 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_10" old: 3543.0796763032777, new: 3543.079676303278, diff: -0.00000000000045474735 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_3__w_10" old: 8634.724847532967, new: 8634.724847532969, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_4__w_10" old: 10876.523736377072, new: 10876.52373637707, diff: 0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_10" old: 12835.398940237148, new: 12835.39894023715, diff: -0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_6__w_10" old: 14466.10948981898, new: 14466.109489818979, diff: 0.00000000000181898940 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_10" old: 15737.72244365614, new: 15737.722443656134, diff: 0.00000000000545696821 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_10" old: 17169.076640994837, new: 17169.07664099483, diff: 0.00000000000727595761 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_10" old: 17183.302683017104, new: 17183.302683017107, diff: -0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_10" old: 15154.905872253841, new: 15154.905872253847, diff: -0.00000000000545696821 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_0__w_20" old: 18718.957258866503, new: 18718.957258866507, diff: -0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_20" old: 20645.63503140842, new: 20645.635031408423, diff: -0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_5__w_20" old: 28065.04062099347, new: 28065.040620993466, diff: 0.00000000000363797881 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_7__w_20" old: 31428.519814904776, new: 31428.519814904783, diff: -0.00000000000727595761 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_8__w_20" old: 32985.81511950059, new: 32985.8151195006, diff: -0.00000000000727595761 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_9__w_20" old: 34437.5408408601, new: 34437.54084086011, diff: -0.00000000000727595761 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_10__w_20" old: 35770.92323199827, new: 35770.923231998284, diff: -0.00000000001455191523 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_11__w_20" old: 36992.814788488264, new: 36992.81478848827, diff: -0.00000000000727595761 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_12__w_20" old: 38098.193912726434, new: 38098.19391272645, diff: -0.00000000001455191523 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_13__w_20" old: 39076.9898057395, new: 39076.98980573952, diff: -0.00000000002182787284 "value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_14__w_20" old: 39919.05725014527, new: 39919.05725014526, diff: 0.00000000000727595761 value__spkt_welch_density__coeff_2 old: 1843.821171807498, new: 1843.8211718074986, diff: -0.00000000000045474735 value__spkt_welch_density__coeff_8 old: 2536.9954700088933, new: 2536.9954700088906, diff: 0.00000000000272848411 value__ar_coefficient__k_10__coeff_0 old: 904.439185079118, new: 904.4391850794491, diff: -0.00000000033105607145 value__ar_coefficient__k_10__coeff_1 old: 0.16357894811580564, new: 0.1635789481157781, diff: 0.00000000000002753353 value__ar_coefficient__k_10__coeff_2 old: -0.04324700014744565, new: -0.0432470001474492, diff: 0.00000000000000355271 value__ar_coefficient__k_10__coeff_3 old: -0.06654237068303814, new: -0.06654237068301239, diff: -0.00000000000002575717 value__ar_coefficient__k_10__coeff_4 old: 0.2836853193919353, new: 0.2836853193919273, diff: 0.00000000000000799361 value__fft_coefficient__coeff_1 old: -0.8045103874789135, new: -0.8045103874789561, diff: 0.00000000000004263256 value__fft_coefficient__coeff_2 old: -53.13286168327596, new: -53.13286168327602, diff: 0.00000000000005684342 value__fft_coefficient__coeff_3 old: -338.00000000000006, new: -338.0, diff: -0.00000000000005684342 value__fft_coefficient__coeff_4 old: 122.44503935479224, new: 122.44503935479203, diff: 0.00000000000021316282 value__fft_coefficient__coeff_5 old: -58.930796134231116, new: -58.930796134230846, diff: -0.00000000000027000624 value__fft_coefficient__coeff_6 old: 13.000000000000057, new: 13.0, diff: 0.00000000000005684342 value__fft_coefficient__coeff_7 old: 112.23530652170982, new: 112.23530652170984, diff: -0.00000000000002842171 value__fft_coefficient__coeff_8 old: 118.18782232848393, new: 118.18782232848395, diff: -0.00000000000001421085 - Readded baseline unit tests removed in v0.7.0 - Readded large_number_of_peaks removed in v0.9.0 - Readded mean_autocorrelation removed in v0.9.0 - Reverted to original augmented_dickey_fuller that was changed in v0.9.0 - Reverted to original fft_coefficient that was changed in v0.9.0 - Readded mean_abs_change_quantiles that was removed in v0.9.0 - Readded the original time_reversal_asymmetry_statistic that was in use pre v0.9.0 - blue-yonder#198 - Readded original autocorrelation that was removed in v0.9.0 - Disabled partial_autocorrelation added in v0.10.0 - Disabled cid_ce added in v0.11.1 - Disabled fft_aggregated added in v0.11.0 - Disabled Fix agg change made to agg_autocorrelation added in v0.11.1 blue-yonder@a53fb6a - Changed to new value_count and range_count method added in v0.11.1 - Hardcoded TSFRESH_BASELINE_VERSION = '0.9.1' in tests - Disabled linear_trend_timewise added in v0.12.0 - Readded tsfresh/examples/test_tsfresh_baseline_dataset.py which was removed in v0.12.0 - Use v0.11.01 value_count and range_count method not as per v0.13.0 - Disabled count_above and count_below features that were added in v0.15.0 - Readded the original percentage_of_reoccurring_datapoints_to_all_datapoints before the feature name change to percentage_of_reoccurring_values_to_all_values implemented in v0.17.0 (feature names should be immutable) blue-yonder#725 blue-yonder@6f9c795 blue-yonder#724 - Rename the new feature percentage_of_reoccurring_values_to_all_values to v0170_percentage_of_reoccurring_values_to_all_values and disabled - Readded the original percentage_of_reoccurring_values_to_all_values before the feature name change to percentage_of_reoccurring_datapoints_to_all_datapoints implemented in v0.17.0 (feature names should be immutable) - Rename the new feature percentage_of_reoccurring_datapoints_to_all_datapoints to v0170_percentage_of_reoccurring_datapoints_to_all_datapoints and disabled - Disabled lempel_ziv_complexity,fourier_entropy and permutation_entropy features that were added in v0.17.0 - Revert to the original cwt_coefficients feature names changed in v0.16.0 - Renamed the new sample_entropy introduced in v0.16.0 to v0160_sample_entropy and readded sample_entropy from v0.15.1 as this is a breaking change as per: blue-yonder#681 and blue-yonder@ce493e5 - Configured settings for pre v0.9.0 features - Hardcoded TSFRESH_BASELINE_VERSION = '0.17.9' in tests Added: tests/baseline/tsfresh-0.1.2.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.3.0.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.3.0.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.3.1.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.3.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.4.0.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.4.0.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.5.0.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.5.0.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.5.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.6.0.py2.data.json.features.transposed.csv tests/baseline/tsfresh-0.6.0.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.6.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.7.2.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.8.2.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.9.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.10.2.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.11.3.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.12.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.13.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.14.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.15.2.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.16.1.py3.data.json.features.transposed.csv tests/baseline/tsfresh-0.17.9.py3.data.json.features.transposed.csv tests/baseline/tsfresh_features_test.py Modified: CHANGES.rst README.md tsfresh/feature_extraction/feature_calculators.py tsfresh/feature_extraction/settings.py
This PR introduces three things:
numpy.quantile
function instead of the one frompandas
because it is extremely faster. For this, we need to havenumpy >= 1.15.0
which is fine as the current version is 1.18 anyways...sample_entropy
function. Honestly, I did not really understand the function and if I compare with wikipedia, I think (think!) that the implementation was not 100% correct. Actually, wikipedia has a sample implementation in python, which uses numpy functions instead of for loops and looks much more like the formula. Unfortunately, it gives slightly different numbers (so this is a breaking change), but I trust the new numbers more. Maybe someone can comment on this?Using the code of this PR, I achieve a speed-up between 1.17 (for short time series, e.g. length 100) to 2.4 and growing (for long time series, e.g. length 5000), when using all features. I did not test, but I assume it will get even better with even larger time series.
The "sample_entropy" is not part of the efficient features, so people with runtime constraints are not using it anyhow but the "quantile" fix will help especially for short time series.
@MaxBenChrist, feel free to re-assign (e.g. to @kempa-liehr ) if you do not have time!