PERF cache find_stack_level #48023

MarcoGorelli · 2022-08-10T13:20:49Z

This would be a precursor to #47998 and #47828. If repeatedly calling find_stack_level, then performance takes a hit

Demo of impact: if we run the file

import cProfile
import pstats, cProfile

import re
import pandas as pd

data = pd.date_range('1990', '2000').strftime('%d/%m/%Y').tolist()

def fun():
    pd.to_datetime(data, dayfirst=False)

cProfile.runctx("fun()", globals(), locals(), "Profile.prof")

s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

On main:

#         1199 function calls (1187 primitive calls) in 0.046 seconds
#
#   Ordered by: internal time
#
#   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#        1    0.034    0.034    0.035    0.035 {built-in method pandas._libs.tslib.array_to_datetime}
#      262    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
#      2/1    0.000    0.000    0.006    0.006 series.py:342(__init__)
#        2    0.000    0.000    0.000    0.000 {built-in method numpy.array}
#        1    0.000    0.000    0.046    0.046 {built-in method builtins.exec}
#        1    0.000    0.000    0.001    0.001 sre_parse.py:493(_parse)
#        1    0.000    0.000    0.046    0.046 datetimes.py:702(to_datetime)
#      2/1    0.000    0.000    0.000    0.000 sre_compile.py:71(_compile)
#      2/1    0.000    0.000    0.001    0.001 base.py:431(__new__)
#        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.maybe_convert_objects}

Using find_stack_level in

pandas/pandas/_libs/tslibs/parsing.pyx

Line 217 in 6db95e7

stacklevel=4,

and

pandas/pandas/_libs/tslibs/parsing.pyx

Line 225 in 6db95e7

stacklevel=4,

#         224611 function calls (224599 primitive calls) in 0.582 seconds
#
#   Ordered by: internal time
#
#   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#    11060    0.106    0.000    0.342    0.000 inspect.py:655(getfile)
#    82106    0.081    0.000    0.082    0.000 {built-in method builtins.isinstance}
#     2212    0.063    0.000    0.517    0.000 _exceptions.py:30(find_stack_level)
#        1    0.045    0.045    0.572    0.572 {built-in method pandas._libs.tslib.array_to_datetime}
#    22128    0.027    0.000    0.027    0.000 {method 'startswith' of 'str' objects}
#    11060    0.023    0.000    0.035    0.000 inspect.py:64(ismodule)
#    11060    0.023    0.000    0.034    0.000 inspect.py:81(ismethod)
#    11060    0.023    0.000    0.033    0.000 inspect.py:261(iscode)
#    11060    0.023    0.000    0.034    0.000 inspect.py:237(istraceback)
#    11060    0.023    0.000    0.034    0.000 inspect.py:159(isfunction)

Using find_stack_level in the same places as above, but with caching (as in this branch):

#         7916 function calls (7904 primitive calls) in 0.065 seconds
#
#   Ordered by: internal time
#
#   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#        1    0.031    0.031    0.056    0.056 {built-in method pandas._libs.tslib.array_to_datetime}
#     2212    0.018    0.000    0.024    0.000 inspect.py:1546(currentframe)
#     2231    0.003    0.000    0.003    0.000 {built-in method builtins.hasattr}
#     2212    0.003    0.000    0.003    0.000 {built-in method sys._getframe}
#      292    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
#        1    0.000    0.000    0.065    0.065 {built-in method builtins.exec}
#      2/1    0.000    0.000    0.005    0.005 series.py:342(__init__)
#        1    0.000    0.000    0.001    0.001 sre_parse.py:493(_parse)
#        2    0.000    0.000    0.000    0.000 {built-in method numpy.array}
#    49/41    0.000    0.000    0.000    0.000 {built-in method builtins.len}

pandas/util/_exceptions.py

alimcmaster1

Nice - LGTM!

lithomas1

Generally LGTM. I'm just a little uneasy at the number of changes the change to the function signature caused.

pandas/util/_exceptions.py

…k-level

mroeschke · 2022-08-11T16:54:04Z

Thanks @MarcoGorelli

cache stacklevel

jbinary · 2022-10-12T10:02:47Z

Good day,
This change leads to some memory leaks that are really hard to debug becuase @lru_cache stores stack frames for the sake of look ups and those frames refer all the variables of that frame, so garbage collector will never remove them from memory because of that? Since pandas is very likely to be called from frames with extensive memory use, that may lead to significant memory overheads?
Thanks.

MarcoGorelli · 2022-10-12T10:08:51Z

Thanks @jbinary for the report

I'd be fine with reverting this - as of PDEP0004, we're no longer going with #47828, so repeatedly calling find_stacklevel is less of an issue

opening a PR soon

This reverts commit 2f8d0a3.

Expost · 2022-10-14T06:39:40Z

Good day, This change leads to some memory leaks that are really hard to debug becuase @lru_cache stores stack frames for the sake of look ups and those frames refer all the variables of that frame, so garbage collector will never remove them from memory because of that? Since pandas is very likely to be called from frames with extensive memory use, that may lead to significant memory overheads? Thanks.

Coincidentally. The day before yesterday I spent nearly two days to locate an OOM problem, and finally found that it was caused by the lru_cache of the find_stack_level method.

Revert "PERF cache find_stack_level (#48023)" This reverts commit 2f8d0a3. Co-authored-by: MarcoGorelli <>

Revert "PERF cache find_stack_level (pandas-dev#48023)" This reverts commit 2f8d0a3. Co-authored-by: MarcoGorelli <> (cherry picked from commit 0106c26)

MarcoGorelli · 2022-10-14T08:00:43Z

Thanks for reporting, version 1.5.1 (probably coming out Monday) will fix this

cache stacklevel

Revert "PERF cache find_stack_level (pandas-dev#48023)" This reverts commit 2f8d0a3. Co-authored-by: MarcoGorelli <>

cache stacklevel

6358e23

MarcoGorelli force-pushed the lru-cache-find-stack-level branch from b5af79e to 6358e23 Compare August 10, 2022 15:12

MarcoGorelli requested a review from lithomas1 August 10, 2022 17:46

MarcoGorelli marked this pull request as ready for review August 10, 2022 17:46

mroeschke added Warnings Warnings that appear or should be added to pandas Performance Memory or execution speed performance labels Aug 10, 2022

mroeschke reviewed Aug 10, 2022

View reviewed changes

pandas/util/_exceptions.py Show resolved Hide resolved

mroeschke approved these changes Aug 10, 2022

View reviewed changes

alimcmaster1 approved these changes Aug 10, 2022

View reviewed changes

lithomas1 reviewed Aug 10, 2022

View reviewed changes

pandas/util/_exceptions.py Show resolved Hide resolved

MarcoGorelli marked this pull request as draft August 10, 2022 21:19

MarcoGorelli marked this pull request as ready for review August 11, 2022 14:04

Merge remote-tracking branch 'upstream/main' into lru-cache-find-stac…

8ae2f44

…k-level

lithomas1 approved these changes Aug 11, 2022

View reviewed changes

lithomas1 added this to the 1.5 milestone Aug 11, 2022

mroeschke merged commit 2f8d0a3 into pandas-dev:main Aug 11, 2022

YYYasin19 pushed a commit to YYYasin19/pandas that referenced this pull request Aug 23, 2022

PERF cache find_stack_level (pandas-dev#48023)

db4b26f

cache stacklevel

MarcoGorelli mentioned this pull request Oct 12, 2022

REVERT caching in find_stack_level #49053

Merged

MarcoGorelli pushed a commit to MarcoGorelli/pandas that referenced this pull request Oct 13, 2022

Revert "PERF cache find_stack_level (pandas-dev#48023)"

07a99e6

This reverts commit 2f8d0a3.

MarcoGorelli added a commit that referenced this pull request Oct 14, 2022

REVERT caching in find_stack_level (#49053)

0106c26

Revert "PERF cache find_stack_level (#48023)" This reverts commit 2f8d0a3. Co-authored-by: MarcoGorelli <>

MarcoGorelli mentioned this pull request Oct 14, 2022

Backport PR #49053 on branch 1.5.x (REVERT caching in find_stack_level) #49079

Merged

5 tasks

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

PERF cache find_stack_level (pandas-dev#48023)

899c361

cache stacklevel

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

REVERT caching in find_stack_level (pandas-dev#49053)

077ff18

Revert "PERF cache find_stack_level (pandas-dev#48023)" This reverts commit 2f8d0a3. Co-authored-by: MarcoGorelli <>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF cache find_stack_level #48023

PERF cache find_stack_level #48023

Uh oh!

MarcoGorelli commented Aug 10, 2022 •

edited

Loading

Uh oh!

Uh oh!

alimcmaster1 left a comment

Uh oh!

lithomas1 left a comment

Uh oh!

Uh oh!

mroeschke commented Aug 11, 2022

Uh oh!

jbinary commented Oct 12, 2022

Uh oh!

MarcoGorelli commented Oct 12, 2022 •

edited

Loading

Uh oh!

Expost commented Oct 14, 2022

Uh oh!

MarcoGorelli commented Oct 14, 2022

Uh oh!

Uh oh!

Uh oh!

PERF cache find_stack_level #48023

PERF cache find_stack_level #48023

Uh oh!

Conversation

MarcoGorelli commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alimcmaster1 left a comment

Choose a reason for hiding this comment

Uh oh!

lithomas1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Aug 11, 2022

Uh oh!

jbinary commented Oct 12, 2022

Uh oh!

MarcoGorelli commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Expost commented Oct 14, 2022

Uh oh!

MarcoGorelli commented Oct 14, 2022

Uh oh!

Uh oh!

MarcoGorelli commented Aug 10, 2022 •

edited

Loading

MarcoGorelli commented Oct 12, 2022 •

edited

Loading