Skip to content

talib.MIN performance issue #715

@tiiiecherle

Description

@tiiiecherle

Hey,

first of all thanks a lot for this great project.

When line_profiling an indicator with @profile I stumbled across an issue. The talib.MIN calculation is a lot slower than the talib.MAX calculation

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   335                                           @profile
   337                                           
   338      9999     394194.0     39.4      3.0      highs_talib = talib.MAX(highs, timeperiod=if_length)
   341                                           
   343      9999    4080059.0    408.0     30.6      lows_talib = rolling_min_pandas(lows, if_length) 
   344      9999    6372217.0    637.3     47.9      lows_talib = talib.MIN(lows, timeperiod=if_length)

The fastest I could work around was pandas rolling function, but it is still a lot slower than talib.MAX.

The test was done with 10.000 calculations.

I am sorry as I am quite new to this I don't understand the source code well, but in my opinion with the same amount of data that is formatted the same way talib.MIN should have the same speed than talib.MAX, right?

Thanks for looking into this and for a fix in advance.

Kind regards

Activity

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

Can you share a bit about how you tested that? I don't see any meaningful difference:

In [2]: import talib as ta

In [3]: import numpy as np

In [5]: c = np.random.randn(100)

In [13]: %timeit ta.MAX(c, 14)
447 ns ± 1.99 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [14]: %timeit ta.MIN(c, 14)
470 ns ± 2.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [16]: import pandas as pd

In [17]: s = pd.Series(c)

In [20]: %timeit s.rolling(14).min()
16.3 μs ± 231 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

If your input is a pandas.Series, it is a lot slower. Is there any way that your highs is a numpy.ndarray but your lows is a Series?

In [25]: %timeit ta.MIN(s)
5.94 μs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [26]: %timeit ta.MAX(s)
6.01 μs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

In case you're curious, polars is faster than pandas, but still not as fast as numpy. I've always thought our integration could maybe be improved:

In [5]: import polars as pl

In [6]: s = pl.Series(c)

In [7]: %timeit ta.MIN(s)
2.14 μs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [8]: %timeit ta.MAX(s)
2.13 μs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

Hey @mrjbq7,

thanks for the fast and helpful feedback that brought up the right idea.

My strategy in vectorbt combines different indicators and for performance reasons I filter out duplicate indicator parameter sets before calculation. I need talib.MIN for calculating a modified version of the adx indicator. I use the option to convert all data to numpy when passing the data into the indicator factory. So both, lows and highs, were numpy.

If I use this I get the slower times posted above on talib.MIN:

parameters_if_adx = {
    "if_adx_name": ['adx'],
    "if_adx_length": custom_range(2, 10000, 1),
    "if_adx_threshold": [25],
}

Now I added some code and a variable to allow duplicate parameter sets and sent this to the indicator that calculates 10.000 times the same parameter set:

parameters_if_adx = {
    "if_adx_name": ['adx'],
    "if_adx_length": [5] * 10000,
    "if_adx_threshold": [25],
}

and the issue is gone ;)

I use line_profiler to measure the speed of the code execution and I tried so hard to find a solution around talib.MIN and now it is back to being the fastest option ;)))

   305     10648     551500.0     51.8      3.3      high_talib = talib.MAX(high, timeperiod=if_adx_length)
   306                                               #high_talib = rolling_max_pandas(high, if_adx_length)  
   307                                           
   308     10648    4069940.0    382.2     24.6      low_talib = rolling_min_pandas(low, if_adx_length) 
   309     10648     513987.0     48.3      3.1      low_talib = talib.MIN(high, timeperiod=if_adx_length)
   310     10648    1610292.0    151.2      9.7      low_talib = rolling_min_numpy_sliding_window_view(low, if_adx_length)
   311     10648    1537948.0    144.4      9.3      low_talib = rolling_min_numpy2(low, if_adx_length)
   312     10648    1502746.0    141.1      9.1      low_talib = rolling_min_cython(low, if_adx_length)
   313                                               #low_talib = rolling_min_torch(low, if_adx_length)      # slowest by far
   314     10648    2954782.0    277.5     17.8      low_talib = rolling_min_numba(low, if_adx_length)
   315     10648     660048.0     62.0      4.0      low_talib = rolling_min_vectorbt_numba(low, if_adx_length) 

It seems the different options handle greater values for timeperiod very differently in regard to performance.

I don't think I will use time periods that are this high so using relevant values for time period seems to work as expected. Strange to me is, that talib.MAX did not have the performance variation.

If you don`t want to dive into the slower performance on higher timeperiods for talib.MIN, the issue can be closed.

Thanks for your fast help.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

I'm confused, I don't know what custom_range does, and I don't know what code example can reproduce a slow-down.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

Using longer time periods, I show a 1.68x difference between MAX and MIN. Is that what you are seeing also?

In [1]: import talib as ta

In [2]: import numpy as np

In [3]: c = np.random.randn(100000)

In [4]: %timeit ta.MAX(c, 10000)
83.1 μs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit ta.MIN(c, 10000)
140 μs ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

I am sorry if I caused any confusion. custom_range is just a little function that specifies a range, so this hands 9.999 values from 2 to 10.000 to the indicator for calculation as it does not accept the value 1.

All blocks in my script are split to 10.000 values per block for RAM and performance reasons. That's why I am testing the perfocrmance for these many calculations.

And yes, I am seeing a drawdown in performance with theses values that is even bigger than your 1.68x.
In the first post I submitted the values 3.0% of the calculation time to 47.9 from MAX to MIN with this big values for timeperiod.

I hope I made it clearer now. High values for the timeperiod seem to affect talib.MIN very much and talib.MAX not at all in regard to performance.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

I have demonstrated the testable performance, and you have not shared your reproduction. I'm going to close this issue. If you'd like it investigated more, you can provide a test case. Thanks!

tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

As I said it is ok to close the issue. But I would like to add that I tried to explain the test case.

I ran talib.MIN with 10.000 times like this

talib.MIN(low, VALUES_FROM_2_TO_10.000)

low is a numpy array of 20.000 values.

I am sorry I don't know a short form to reproduce this. My setup calculates this during the indicator calculation.
Perhaps you can reproduce this easier.

But nevertheless thanks for your help.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

I'm still seeing 1.5x performance difference when calculating range(2, 10_000):

In [1]: def test_min(a):
   ...:     for i in range(2, 10_000):
   ...:         ta.MIN(a, i)
   ...:

In [2]: def test_max(a):
   ...:     for i in range(2, 10_000):
   ...:         ta.MAX(a, i)
   ...:

In [3]: import talib as ta

In [4]: import numpy as np

In [5]: c = np.random.randn(100000)

In [6]: %timeit test_max(c)
860 ms ± 3.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit test_min(c)
1.3 s ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Telling me what's slow is kinda useful, but if I can't reproduce it, it's not going to get fixed.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

I never asked, but maybe you're on a different OS / architecture than I am...

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

Testing an array of 20,000 values shows the same result:

In [8]: c = np.random.randn(20_000)

In [9]: %timeit test_min(c)
262 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test_max(c)
192 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

I am sorry that I can not help more, my knowledge is not as deep as yours. I can only repeat what I already told. It is strange that your test results do not differ that much.

I am on macOS with python 3.12.9.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

Okay, well if you ever are able to share your test script, or a test case that I can run and see the performance lost, I'm happy to investigate further.

All of the tests above were Python 3.12.9 on macOS 15.3.1, on a MacBook Pro M4.

tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

Thanks again, I really appreciate your help. But this could be the difference in the testing that I took 1 to 10.000 and not random 10.000 for the timeframe.

tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

If I try the value 5 for the timeframe 10.000 times MIN and MAX look equal performance wise. But if I use 1 to 10.000 each value the performance difference is pretty big for me.

mrjbq7

mrjbq7 commented on Mar 8, 2025

@mrjbq7
Member

Okay, well now we're getting somewhere. So, in the case of increasing values, the work that MIN has to do on each observation is a lot more and I'm seeing 8300x worse:

In [1]: import numpy as np

In [2]: c = np.array(range(1, 20_001), dtype=float)

In [3]: c
Out[3]:
array([1.0000e+00, 2.0000e+00, 3.0000e+00, ..., 1.9998e+04, 1.9999e+04,
       2.0000e+04], shape=(20000,))

In [4]: import talib as ta

In [5]: %timeit ta.MAX(c, 10000)
10.7 μs ± 311 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %timeit ta.MIN(c, 10000)
89.2 ms ± 131 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But if you reverse your array, then MAX is 3000x slower than MIN:

In [14]: c = np.array(range(20_000, 0, -1), dtype=float)

In [15]: %timeit ta.MIN(c, 10000)
14.7 μs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [16]: %timeit ta.MAX(c, 10000)
44.6 ms ± 283 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
reopened this on Mar 8, 2025
tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

Ok, I am happy I was able to explain the issue. Yes, that big difference is what I am seeing.

tiiiecherle

tiiiecherle commented on Mar 8, 2025

@tiiiecherle
Author

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mrjbq7@tiiiecherle

        Issue actions

          talib.MIN performance issue · Issue #715 · TA-Lib/ta-lib-python