gh-107674: Lazy load line number to improve performance of tracing #118127

gaogaotiantian · 2024-04-20T20:29:33Z

The new monitoring mechanism has a huge performance regression when the function body (or the module body) is too long. It's not that uncommon to put a very large data structure (dict?) as a constant in a python file just because it's easy to load. This regression would make coverage and debugging miserable when the user uses libraries with such a file.

The core reason is line number calculation.

The line number for each opcode is calculated once when the monitoring line data is initialized. This part is relatively fast because it goes through the addresses linearly. So basically one pass to get the line numbers. However, as we are conservative on memory usage (for each byte code), we only have an 8-bit value to store the line number. Currently we are using a heuristic of shifting the index of the byte code and adjusting by an offset, which covers function under a couple hundred lines well. However, the performance is horrible when the heuristic does not work.

For each line event, we need to calculate two line numbers, line number of current and previous bytecode. If the heuristic does not work, we need to scan from the beginning of the address, which makes each calculate about O(N) to the length of the code object. That gives a O(N^2) complexity for really long code objects like the constant mentioned above.

For example, a dict with 3k lines takes about 0.03s to execute on my computer. With an empty trace function, it takes 0.3s. That's a 10x overhead for the mechanism itself. It would get worse when the dict is longer.

So, can we do better?

The reason we need both line numbers for the current and the previous bytecode, is that we need to compare whether the line number changed - if it does not change, we don't emit the event. However, that's some information that we already know when we initialize monitoring line data. If it's not a jump target, we have a guaranteed way to know whether that bytecode will trigger a line event.

Thus, we can add a new special value for the line_delta to indicate this bytecode must be a new line, to avoid having to calculate the previous line when the heuristic fails to work. That's 50% less workload.

Even better, considering that for sys.settrace, in many cases, we do not even need the line number, just whether it should be a new line event, we don't even have to calculate the current line number before we trigger the event. (Think of the n command of pdb, when the frame is not correct, it just continues). So we can make the line number lazy loaded, only calculate it when we need it. The mechanism is there, just need to hook it up.

I added the cache so we don't need to calculate it every time we try to access it. Fixed an issue that after exception event the f_lineno is not cleared to make it work.

After all the changes above, the dict with an empty trace function is as fast as no trace. I believe this will also improve the tracing mechanism in general whenever the heuristic does not work, with a very small cost of losing one offset.

Issue: sys.settrace dramatic slowdown in 3.12 #107674

markshannon

One minor issue, but looks good overall.

Ultimately we will need to fix the line number tables for large functions.
It causes problems not only for monitoring, but tracebacks as well.
This seems like a nice improvement until then.

markshannon · 2024-04-26T15:24:36Z

Objects/frameobject.c

-    else {
-        return PyUnstable_InterpreterFrame_GetLine(f->f_frame);
+
+    if (f->f_lineno > 0) {


If f->f_lineno == 0 we end up recalculating the line number, so if PyUnstable_InterpreterFrame_GetLine() returns -1 it will get called twice.

Yeah that makes sense.

Objects/frameobject.c

Co-authored-by: Mark Shannon <mark@hotpy.org>

markshannon

Looks good, thanks.

bedevere-bot · 2024-04-29T11:04:26Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL7 LTO 3.x has failed when building commit 375c94c.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/503/builds/4903) and take a look at the build logs.
Check if the failure is related to this commit (375c94c) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/503/builds/4903

Failed tests:

test_capi

Summary of the results of the build (if available):

==

Click to see traceback logs

remote: Enumerating objects: 22, done.        
remote: Counting objects:   4% (1/22)        
remote: Counting objects:   9% (2/22)        
remote: Counting objects:  13% (3/22)        
remote: Counting objects:  18% (4/22)        
remote: Counting objects:  22% (5/22)        
remote: Counting objects:  27% (6/22)        
remote: Counting objects:  31% (7/22)        
remote: Counting objects:  36% (8/22)        
remote: Counting objects:  40% (9/22)        
remote: Counting objects:  45% (10/22)        
remote: Counting objects:  50% (11/22)        
remote: Counting objects:  54% (12/22)        
remote: Counting objects:  59% (13/22)        
remote: Counting objects:  63% (14/22)        
remote: Counting objects:  68% (15/22)        
remote: Counting objects:  72% (16/22)        
remote: Counting objects:  77% (17/22)        
remote: Counting objects:  81% (18/22)        
remote: Counting objects:  86% (19/22)        
remote: Counting objects:  90% (20/22)        
remote: Counting objects:  95% (21/22)        
remote: Counting objects: 100% (22/22)        
remote: Counting objects: 100% (22/22), done.        
remote: Compressing objects:  10% (1/10)        
remote: Compressing objects:  20% (2/10)        
remote: Compressing objects:  30% (3/10)        
remote: Compressing objects:  40% (4/10)        
remote: Compressing objects:  50% (5/10)        
remote: Compressing objects:  60% (6/10)        
remote: Compressing objects:  70% (7/10)        
remote: Compressing objects:  80% (8/10)        
remote: Compressing objects:  90% (9/10)        
remote: Compressing objects: 100% (10/10)        
remote: Compressing objects: 100% (10/10), done.        
remote: Total 12 (delta 10), reused 4 (delta 2), pack-reused 0        
From https://github.com/python/cpython
 * branch            main       -> FETCH_HEAD
Note: checking out '375c94c75dd9eaefaddd89a7f704a031441af286'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 375c94c... gh-107674: Lazy load line number to improve performance of tracing (GH-118127)
Switched to and reset branch 'main'

In file included from Python/optimizer_analysis.c:437:
Python/optimizer_cases.c.h: In function ‘optimize_uops’:
Python/optimizer_cases.c.h:1444:23: warning: assignment to ‘_PyInterpreterFrame *’ {aka ‘struct _PyInterpreterFrame *’} from incompatible pointer type ‘_Py_UopsSymbol *’ {aka ‘struct _Py_UopsSymbol *’} [-Wincompatible-pointer-types]
             gen_frame = sym_new_not_null(ctx);
                       ^

make: *** [Makefile:2232: buildbottest] Error 2

Lazy load f_lineno to improve performance of tracing

af1ed33

gaogaotiantian requested a review from markshannon as a code owner April 20, 2024 20:29

bedevere-app bot mentioned this pull request Apr 20, 2024

sys.settrace dramatic slowdown in 3.12 #107674

Open

2 tasks

bedevere-app bot added the awaiting review label Apr 20, 2024

📜🤖 Added by blurb_it.

6acff3d

gaogaotiantian marked this pull request as draft April 20, 2024 20:34

bedevere-app bot removed the awaiting review label Apr 20, 2024

Fix the lineno cache issue

a6cee96

gaogaotiantian marked this pull request as ready for review April 20, 2024 21:09

bedevere-app bot added the awaiting review label Apr 20, 2024

markshannon reviewed Apr 26, 2024

View reviewed changes

early return if calculation fails

4fecbd4

Co-authored-by: Mark Shannon <mark@hotpy.org>

markshannon self-requested a review April 29, 2024 08:53

markshannon approved these changes Apr 29, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Apr 29, 2024

markshannon merged commit 375c94c into python:main Apr 29, 2024
37 checks passed

bedevere-app bot removed the awaiting merge label Apr 29, 2024

gaogaotiantian deleted the lazy-calculate-lineno branch April 29, 2024 17:23

gaogaotiantian mentioned this pull request Dec 15, 2024

sys.settrace suffers quadratic behavior for large dictionary literals on 3.12+ #127953

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-107674: Lazy load line number to improve performance of tracing #118127

gh-107674: Lazy load line number to improve performance of tracing #118127

gaogaotiantian commented Apr 20, 2024 •

edited by bedevere-app bot

Loading

markshannon left a comment

markshannon Apr 26, 2024

gaogaotiantian Apr 26, 2024

markshannon left a comment

bedevere-bot commented Apr 29, 2024

gh-107674: Lazy load line number to improve performance of tracing #118127

gh-107674: Lazy load line number to improve performance of tracing #118127

Conversation

gaogaotiantian commented Apr 20, 2024 • edited by bedevere-app bot Loading

markshannon left a comment

Choose a reason for hiding this comment

markshannon Apr 26, 2024

Choose a reason for hiding this comment

gaogaotiantian Apr 26, 2024

Choose a reason for hiding this comment

markshannon left a comment

Choose a reason for hiding this comment

bedevere-bot commented Apr 29, 2024

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

gaogaotiantian commented Apr 20, 2024 •

edited by bedevere-app bot

Loading