Fix #998: Speed up stumpi and aampi #1001

NimaSarajpoor · 2024-07-08T23:27:40Z

See #998 .

Speed up stumpi, _update_egress method
Speed up stumpi, _update method
Speed up aampi, _update_egress method
Speed up aampi, _update method

stumpy/stumpi.py

codecov · 2024-07-08T23:53:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.33%. Comparing base (fb9a125) to head (8fc35ea).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1001      +/-   ##
==========================================
+ Coverage   97.32%   97.33%   +0.01%     
==========================================
  Files          89       89              
  Lines       14964    15027      +63     
==========================================
+ Hits        14563    14626      +63     
  Misses        401      401

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stumpy/core.py

stumpy/aampi.py

stumpy/stumpi.py

tests/test_core.py

NimaSarajpoor · 2024-07-28T11:38:06Z

@seanlaw
I think it is ready. What do you think?

seanlaw · 2024-07-28T12:16:04Z

@NimaSarajpoor Please allow me some time to review it

seanlaw · 2024-07-29T11:59:24Z

@NimaSarajpoor For completeness, are you able to provide some timings for the speedup (before and after your code changes) here in the comments? I trust that the code is indeed faster but at least we can document it here.

NimaSarajpoor · 2024-07-30T00:04:50Z

@seanlaw
I checked the performance for time series with length 1000, 10_000, and 100_000.

import time
import numpy as np
import stumpy

def get_running_time(n, m=50):
    seed = 0
    np.random.seed(seed)
    T = np.random.rand(n)

    n_iter = 100
    vals = np.random.rand(n_iter)

    obj = stumpy.stumpi(T, m, egress=True)  # try with stumpi / aampi, and egress: True / False
    t_lst = []
    for val in vals:
        start = time.time()
        obj.update(val)
        t_lst.append(time.time() - start)
    
    return np.mean(t_lst[1:]), np.std(t_lst[1:])


if __name__ == "__main__":
        n = 1000  # try with 1000 / 10_000 / 100_000
        out = get_running_time(n)
        print(f'mean: {out[0]}, std: {out[1]}' )

In the following table, n is the length of time series. The results are obtained by running the script above on Apple M1 with 8G Memory. Each value provided in the running time columns is the average running time of 100 .updates() excluding the first one. The speedup percentage is provided in the right-most column.

n = 1000	running time (current version)	running time (PR's version)	Speedup (%)
stumpi(egress=True)	0.00035	0.00014	60
stumpi(egress=False)	0.00033	0.00015	54.54545455
aampi(egress=True)	0.00026	0.00004	84.61538462
aampi(egress=False)	0.00022	0.00006	72.72727273

n = 10_000	running time (current version)	running time (PR's version)	Speedup (%)
stumpi(egress=True)	0.00138	0.00022	84.05797101
stumpi(egress=False)	0.00114	0.00023	79.8245614
aampi(egress=True)	0.00147	0.00032	78.23129252
aampi(egress=False)	0.00121	0.00032	73.55371901

n = 100_000	running time (current version)	running time (PR's version)	Speedup (%)
stumpi(egress=True)	0.01162	0.00079	93.20137694
stumpi(egress=False)	0.00943	0.00088	90.66808059
aampi(egress=True)	0.01387	0.00282	79.66834895
aampi(egress=False)	0.01154	0.00285	75.30329289

seanlaw · 2024-08-01T00:45:17Z

@NimaSarajpoor Can you tell me how you are computing "Speedup %"? The numbers don't look right to me. I think perhaps the wording should be "Percent Reduction" (i.e., 100 * (new-old)/old).

I think Percent Speedup would be 100 * (old-new)/new OR you can say "X times faster" by simply doing old/new. I prefer "X times faster" for our comparison

stumpy/aampi.py

NimaSarajpoor · 2024-08-01T14:02:59Z

@seanlaw

I think perhaps the wording should be "Percent Reduction" (i.e., 100 * (new-old)/old).

Right.... that is how I calculated the numbers.

OR you can say "X times faster" by simply doing old/new. I prefer "X times faster" for our comparison.

Noted. I like this more as it is clearer. To avoid confusion for future readers who follow the comments, I am going to provide the tables with the new numbers below:

n = 1000	running time (current version)	running time (PR's version)	X times faster
stumpi(egress=True)	0.00035	0.00014	2.5
stumpi(egress=False)	0.00033	0.00015	2.2
aampi(egress=True)	0.00026	0.00004	6.5
aampi(egress=False)	0.00022	0.00006	3.7

n = 10_000	running time (current version)	running time (PR's version)	X times faster
stumpi(egress=True)	0.00138	0.00022	6.27
stumpi(egress=False)	0.00114	0.00023	4.9
aampi(egress=True)	0.00147	0.00032	4.6
aampi(egress=False)	0.00121	0.00032	3.8

n = 100_000	running time (current version)	running time (PR's version)	X times faster
stumpi(egress=True)	0.01162	0.00079	14.7
stumpi(egress=False)	0.00943	0.00088	10.7
aampi(egress=True)	0.01387	0.00282	4.9
aampi(egress=False)	0.01154	0.00285	4.0

seanlaw · 2024-08-01T20:41:47Z

@NimaSarajpoor Considering that all of the existing tests are passing and the performance is improved, I feel pretty good about merging this. Do you think it's ready? Was there anything that you had doubts about? It looks like there's refactor of the code and then njit-ing that code

NimaSarajpoor · 2024-08-03T14:10:28Z

@seanlaw

It looks like there's refactor of the code and then njit-ing that code

Right. That's it!

Was there anything that you had doubts about?

My first concern is whether the added test function is clear. My second concern is regarding the comment I added for case 2 in the test function, i.e.

    # case 2: For a given time series `T`, obtain the matrix profile `P` and
    # matrix profile indices `I` of `T[1:]` based on the matrix profile and
    # matrix profile indices of `T[:-1]`.
    # In the following test: n_appended = 1

I think the comment above is slightly wrong. I think I need to make it clear that the updated profile is different than just doing stumpy.stump(T[1:], m). So, I think it should have been something like this:

    # case 2: For a given time series `T`, obtain the matrix profile `P` and
    # matrix profile indices `I` of `T[1:]` based on the matrix profile and
    # matrix profile indices of `T[:-1]`, WITHOUT DISREGARDING THE NEAREST
    # NEIGHBOURS IN THE PROFILE THAT REFERS TO ALREADY-REMOVED DATA.
    # In the following test: n_appended = 1

seanlaw · 2024-08-04T01:37:42Z

My first concern is whether the added test function is clear. My second concern is regarding the comment I added for case 2 in the test function

Okay, I will take a closer look.

[Update]

@NimaSarajpoor For the most part, I think case 1 is fine as it is looks like it is simply updating things by adding a single new data point. Having said that, I can't understand case 2. There seems to be too much happening and your intent isn't clear even with the comment(s). Also, you refer to n_appended but it would be nice if you could leave a note to remind people why it is important or what the relevance of that variable is (it's been a long time and I can guess at it but I too have also forgotten).

WITHOUT DISREGARDING THE NEAREST NEIGHBOURS IN THE PROFILE THAT REFERS TO ALREADY-REMOVED DATA.

I think this is probably the most important thing to highlight. It sounds like this test is trying to make sure that your newly added function respects this point and does not ignore it. Is that right? And all of this is associated with n_appended?

I think it would make sense to split case 1 and case 2 into two separate tests with more specific names (i.e., the name of the first case can be kept but it seems like you are testing something more nuanced in the second case).

stumpy/core.py

NimaSarajpoor · 2024-08-06T07:37:57Z

@seanlaw

I can't understand case 2. There seems to be too much happening and your intent isn't clear even with the comment(s)

I have the same feeling regarding case 2, which represents egress=True case.

WITHOUT DISREGARDING THE NEAREST NEIGHBOURS IN THE PROFILE THAT REFERS TO ALREADY-REMOVED DATA.

I think this is probably the most important thing to highlight. It sounds like this test is trying to make sure that your newly added function respects this point and does not ignore it. Is that right? And all of this is associated with n_appended?
Right.

My main point was to help future me remember why I used a particular approach for calculating P_ref and I_ref. Now that you mention it, I think it would be good to have a test to just check that specific statement.

I think it would make sense to split case 1 and case 2 into two separate tests with more specific names (i.e., the name of the first case can be kept but it seems like you are testing something more nuanced in the second case).

Noted. Please allow me to separate the cases, and revise the code.

seanlaw · 2024-08-06T10:50:48Z

My main point was to help future me remember why I used a particular approach for calculating P_ref and I_ref. Now that you mention it, I think it would be good to have a test to just check that specific statement.

Exactly! Thank you for persisting

NimaSarajpoor · 2024-08-24T16:50:06Z

@seanlaw
I am making some minor changes. Will let you know once I am done so that you can provide with me your comments.

NimaSarajpoor · 2024-09-11T12:21:02Z

@seanlaw
I think it is ready for your review. You may want to pay closer attention to the docstring of the new function in core.py and the test functions, particularly test_update_incremental_PI_egressTrue_MemoryCheck. This test function still looks a bit weird but let's see what you think.

seanlaw · 2024-09-11T23:05:53Z

@NimaSarajpoor I will take a look

seanlaw

@NimaSarajpoor I think the docstrings are fine. Do you think we are ready to merge?

tests/test_core.py

NimaSarajpoor · 2024-09-13T19:04:33Z

I replaced random with np.random.

@seanlaw

I think the docstrings are fine. Do you think we are ready to merge?

Thanks for checking that! So, feel free to merge it once all tests pass

seanlaw · 2024-09-13T20:48:11Z

@NimaSarajpoor Thanks again for the wonderful contribution!

NimaSarajpoor added 2 commits July 8, 2024 19:21

wrap njit-decorated function around hot spot to improve performance

b4b950e

Merge branch 'main' into ENH_stumpi

7eccc26

NimaSarajpoor requested a review from seanlaw as a code owner July 8, 2024 23:27

NimaSarajpoor commented Jul 8, 2024

View reviewed changes

stumpy/stumpi.py Outdated Show resolved Hide resolved

stumpy/stumpi.py Outdated Show resolved Hide resolved

stumpy/stumpi.py Outdated Show resolved Hide resolved

stumpy/stumpi.py Outdated Show resolved Hide resolved

NimaSarajpoor added 8 commits July 20, 2024 08:39

Merge branch 'main' into ENH_stumpi

5f7b348

improve docstring

ca49a74

Move function to stumpy.core

625d1f8

Add test function

384ea32

refactored aampi._update_egress

f8a8a82

refactored stumpi._update

d823148

refactored using newly-created function

3c3e835

Rename function

c9e361d

seanlaw reviewed Jul 26, 2024

View reviewed changes

stumpy/core.py Show resolved Hide resolved

NimaSarajpoor commented Jul 27, 2024

View reviewed changes

stumpy/aampi.py Outdated Show resolved Hide resolved

stumpy/stumpi.py Outdated Show resolved Hide resolved

tests/test_core.py Outdated Show resolved Hide resolved

NimaSarajpoor added 4 commits July 27, 2024 07:25

test top-k feature in test function

43eb4ce

use name of parameter when passing value to be more explicit

b0c567e

update branch and resolve conflict

b99c3e2

add comment and new case in the test function

d97e184

NimaSarajpoor changed the title ~~[WIP] Fix #998: Speed up stumpi and aampi~~ Fix #998: Speed up stumpi and aampi Jul 28, 2024

seanlaw reviewed Aug 1, 2024

View reviewed changes

stumpy/aampi.py Show resolved Hide resolved

seanlaw reviewed Aug 1, 2024

View reviewed changes

stumpy/aampi.py Show resolved Hide resolved

seanlaw mentioned this pull request Aug 3, 2024

Tutotorials raises KeyboardInterrupt #1020

Closed

seanlaw reviewed Aug 4, 2024

View reviewed changes

stumpy/core.py Show resolved Hide resolved

NimaSarajpoor added 5 commits August 9, 2024 05:37

revise test functions

b54a9e8

minor enhancement

f766504

extra test that causes an error

906bef7

Fixed error caused by loss of precision

7e8eeb2

revise test function and improve comments

4f2c99b

NimaSarajpoor added 4 commits September 11, 2024 07:54

Revised docstring and comment

9e8a430

Fixed flake8 format

b9b9104

Minor changes

3e757cf

Fixed black format

47041d7

seanlaw requested changes Sep 13, 2024

View reviewed changes

tests/test_core.py Outdated Show resolved Hide resolved

replace random with np.random

8fc35ea

seanlaw merged commit 692c99c into TDAmeritrade:main Sep 13, 2024
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #998: Speed up stumpi and aampi #1001

Fix #998: Speed up stumpi and aampi #1001

NimaSarajpoor commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

NimaSarajpoor commented Jul 28, 2024

seanlaw commented Jul 28, 2024

seanlaw commented Jul 29, 2024

NimaSarajpoor commented Jul 30, 2024

seanlaw commented Aug 1, 2024 •

edited

Loading

NimaSarajpoor commented Aug 1, 2024

seanlaw commented Aug 1, 2024 •

edited

Loading

NimaSarajpoor commented Aug 3, 2024

seanlaw commented Aug 4, 2024 •

edited

Loading

NimaSarajpoor commented Aug 6, 2024 •

edited

Loading

seanlaw commented Aug 6, 2024

NimaSarajpoor commented Aug 24, 2024

NimaSarajpoor commented Sep 11, 2024

seanlaw commented Sep 11, 2024

seanlaw left a comment

NimaSarajpoor commented Sep 13, 2024

seanlaw commented Sep 13, 2024

Fix #998: Speed up stumpi and aampi #1001

Fix #998: Speed up stumpi and aampi #1001

Conversation

NimaSarajpoor commented Jul 8, 2024 • edited Loading

codecov bot commented Jul 8, 2024 • edited Loading

Codecov Report

NimaSarajpoor commented Jul 28, 2024

seanlaw commented Jul 28, 2024

seanlaw commented Jul 29, 2024

NimaSarajpoor commented Jul 30, 2024

seanlaw commented Aug 1, 2024 • edited Loading

NimaSarajpoor commented Aug 1, 2024

seanlaw commented Aug 1, 2024 • edited Loading

NimaSarajpoor commented Aug 3, 2024

seanlaw commented Aug 4, 2024 • edited Loading

NimaSarajpoor commented Aug 6, 2024 • edited Loading

seanlaw commented Aug 6, 2024

NimaSarajpoor commented Aug 24, 2024

NimaSarajpoor commented Sep 11, 2024

seanlaw commented Sep 11, 2024

seanlaw left a comment

Choose a reason for hiding this comment

NimaSarajpoor commented Sep 13, 2024

seanlaw commented Sep 13, 2024

NimaSarajpoor commented Jul 8, 2024 •

edited

Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

seanlaw commented Aug 1, 2024 •

edited

Loading

seanlaw commented Aug 1, 2024 •

edited

Loading

seanlaw commented Aug 4, 2024 •

edited

Loading

NimaSarajpoor commented Aug 6, 2024 •

edited

Loading