-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Hey,
first of all thanks a lot for this great project.
When line_profiling an indicator with @profile
I stumbled across an issue. The talib.MIN calculation is a lot slower than the talib.MAX calculation
Line # Hits Time Per Hit % Time Line Contents
==============================================================
335 @profile
337
338 9999 394194.0 39.4 3.0 highs_talib = talib.MAX(highs, timeperiod=if_length)
341
343 9999 4080059.0 408.0 30.6 lows_talib = rolling_min_pandas(lows, if_length)
344 9999 6372217.0 637.3 47.9 lows_talib = talib.MIN(lows, timeperiod=if_length)
The fastest I could work around was pandas rolling function, but it is still a lot slower than talib.MAX.
The test was done with 10.000 calculations.
I am sorry as I am quite new to this I don't understand the source code well, but in my opinion with the same amount of data that is formatted the same way talib.MIN should have the same speed than talib.MAX, right?
Thanks for looking into this and for a fix in advance.
Kind regards
Activity
mrjbq7 commentedon Mar 8, 2025
Can you share a bit about how you tested that? I don't see any meaningful difference:
mrjbq7 commentedon Mar 8, 2025
If your input is a
pandas.Series
, it is a lot slower. Is there any way that yourhighs
is anumpy.ndarray
but yourlows
is a Series?mrjbq7 commentedon Mar 8, 2025
In case you're curious,
polars
is faster thanpandas
, but still not as fast asnumpy
. I've always thought our integration could maybe be improved:tiiiecherle commentedon Mar 8, 2025
Hey @mrjbq7,
thanks for the fast and helpful feedback that brought up the right idea.
My strategy in vectorbt combines different indicators and for performance reasons I filter out duplicate indicator parameter sets before calculation. I need talib.MIN for calculating a modified version of the adx indicator. I use the option to convert all data to numpy when passing the data into the indicator factory. So both, lows and highs, were numpy.
If I use this I get the slower times posted above on talib.MIN:
Now I added some code and a variable to allow duplicate parameter sets and sent this to the indicator that calculates 10.000 times the same parameter set:
and the issue is gone ;)
I use line_profiler to measure the speed of the code execution and I tried so hard to find a solution around talib.MIN and now it is back to being the fastest option ;)))
It seems the different options handle greater values for timeperiod very differently in regard to performance.
I don't think I will use time periods that are this high so using relevant values for time period seems to work as expected. Strange to me is, that talib.MAX did not have the performance variation.
If you don`t want to dive into the slower performance on higher timeperiods for talib.MIN, the issue can be closed.
Thanks for your fast help.
mrjbq7 commentedon Mar 8, 2025
I'm confused, I don't know what
custom_range
does, and I don't know what code example can reproduce a slow-down.mrjbq7 commentedon Mar 8, 2025
Using longer time periods, I show a 1.68x difference between MAX and MIN. Is that what you are seeing also?
tiiiecherle commentedon Mar 8, 2025
I am sorry if I caused any confusion. custom_range is just a little function that specifies a range, so this hands 9.999 values from 2 to 10.000 to the indicator for calculation as it does not accept the value 1.
All blocks in my script are split to 10.000 values per block for RAM and performance reasons. That's why I am testing the perfocrmance for these many calculations.
And yes, I am seeing a drawdown in performance with theses values that is even bigger than your 1.68x.
In the first post I submitted the values 3.0% of the calculation time to 47.9 from MAX to MIN with this big values for timeperiod.
I hope I made it clearer now. High values for the timeperiod seem to affect talib.MIN very much and talib.MAX not at all in regard to performance.
mrjbq7 commentedon Mar 8, 2025
I have demonstrated the testable performance, and you have not shared your reproduction. I'm going to close this issue. If you'd like it investigated more, you can provide a test case. Thanks!
tiiiecherle commentedon Mar 8, 2025
As I said it is ok to close the issue. But I would like to add that I tried to explain the test case.
I ran talib.MIN with 10.000 times like this
talib.MIN(low, VALUES_FROM_2_TO_10.000)
low is a numpy array of 20.000 values.
I am sorry I don't know a short form to reproduce this. My setup calculates this during the indicator calculation.
Perhaps you can reproduce this easier.
But nevertheless thanks for your help.
mrjbq7 commentedon Mar 8, 2025
I'm still seeing 1.5x performance difference when calculating
range(2, 10_000)
:Telling me what's slow is kinda useful, but if I can't reproduce it, it's not going to get fixed.
mrjbq7 commentedon Mar 8, 2025
I never asked, but maybe you're on a different OS / architecture than I am...
mrjbq7 commentedon Mar 8, 2025
Testing an array of 20,000 values shows the same result:
tiiiecherle commentedon Mar 8, 2025
I am sorry that I can not help more, my knowledge is not as deep as yours. I can only repeat what I already told. It is strange that your test results do not differ that much.
I am on macOS with python 3.12.9.
mrjbq7 commentedon Mar 8, 2025
Okay, well if you ever are able to share your test script, or a test case that I can run and see the performance lost, I'm happy to investigate further.
All of the tests above were Python 3.12.9 on macOS 15.3.1, on a MacBook Pro M4.
tiiiecherle commentedon Mar 8, 2025
Thanks again, I really appreciate your help. But this could be the difference in the testing that I took 1 to 10.000 and not random 10.000 for the timeframe.
tiiiecherle commentedon Mar 8, 2025
If I try the value 5 for the timeframe 10.000 times MIN and MAX look equal performance wise. But if I use 1 to 10.000 each value the performance difference is pretty big for me.
mrjbq7 commentedon Mar 8, 2025
Okay, well now we're getting somewhere. So, in the case of increasing values, the work that MIN has to do on each observation is a lot more and I'm seeing 8300x worse:
But if you reverse your array, then MAX is 3000x slower than MIN:
tiiiecherle commentedon Mar 8, 2025
Ok, I am happy I was able to explain the issue. Yes, that big difference is what I am seeing.
tiiiecherle commentedon Mar 8, 2025
Thanks again.