Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-107674: Lazy load line number to improve performance of tracing #118127

Merged
merged 4 commits into from
Apr 29, 2024

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Apr 20, 2024

The new monitoring mechanism has a huge performance regression when the function body (or the module body) is too long. It's not that uncommon to put a very large data structure (dict?) as a constant in a python file just because it's easy to load. This regression would make coverage and debugging miserable when the user uses libraries with such a file.

The core reason is line number calculation.

The line number for each opcode is calculated once when the monitoring line data is initialized. This part is relatively fast because it goes through the addresses linearly. So basically one pass to get the line numbers. However, as we are conservative on memory usage (for each byte code), we only have an 8-bit value to store the line number. Currently we are using a heuristic of shifting the index of the byte code and adjusting by an offset, which covers function under a couple hundred lines well. However, the performance is horrible when the heuristic does not work.

For each line event, we need to calculate two line numbers, line number of current and previous bytecode. If the heuristic does not work, we need to scan from the beginning of the address, which makes each calculate about O(N) to the length of the code object. That gives a O(N^2) complexity for really long code objects like the constant mentioned above.

For example, a dict with 3k lines takes about 0.03s to execute on my computer. With an empty trace function, it takes 0.3s. That's a 10x overhead for the mechanism itself. It would get worse when the dict is longer.

So, can we do better?

The reason we need both line numbers for the current and the previous bytecode, is that we need to compare whether the line number changed - if it does not change, we don't emit the event. However, that's some information that we already know when we initialize monitoring line data. If it's not a jump target, we have a guaranteed way to know whether that bytecode will trigger a line event.

Thus, we can add a new special value for the line_delta to indicate this bytecode must be a new line, to avoid having to calculate the previous line when the heuristic fails to work. That's 50% less workload.

Even better, considering that for sys.settrace, in many cases, we do not even need the line number, just whether it should be a new line event, we don't even have to calculate the current line number before we trigger the event. (Think of the n command of pdb, when the frame is not correct, it just continues). So we can make the line number lazy loaded, only calculate it when we need it. The mechanism is there, just need to hook it up.

I added the cache so we don't need to calculate it every time we try to access it. Fixed an issue that after exception event the f_lineno is not cleared to make it work.

After all the changes above, the dict with an empty trace function is as fast as no trace. I believe this will also improve the tracing mechanism in general whenever the heuristic does not work, with a very small cost of losing one offset.

@gaogaotiantian gaogaotiantian marked this pull request as draft April 20, 2024 20:34
@gaogaotiantian gaogaotiantian marked this pull request as ready for review April 20, 2024 21:09
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue, but looks good overall.

Ultimately we will need to fix the line number tables for large functions.
It causes problems not only for monitoring, but tracebacks as well.
This seems like a nice improvement until then.

else {
return PyUnstable_InterpreterFrame_GetLine(f->f_frame);

if (f->f_lineno > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If f->f_lineno == 0 we end up recalculating the line number, so if PyUnstable_InterpreterFrame_GetLine() returns -1 it will get called twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense.

Co-authored-by: Mark Shannon <mark@hotpy.org>
@markshannon markshannon self-requested a review April 29, 2024 08:53
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@markshannon markshannon merged commit 375c94c into python:main Apr 29, 2024
37 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL7 LTO 3.x has failed when building commit 375c94c.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/503/builds/4903) and take a look at the build logs.
  4. Check if the failure is related to this commit (375c94c) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/503/builds/4903

Failed tests:

  • test_capi

Summary of the results of the build (if available):

==

Click to see traceback logs
remote: Enumerating objects: 22, done.        
remote: Counting objects:   4% (1/22)        
remote: Counting objects:   9% (2/22)        
remote: Counting objects:  13% (3/22)        
remote: Counting objects:  18% (4/22)        
remote: Counting objects:  22% (5/22)        
remote: Counting objects:  27% (6/22)        
remote: Counting objects:  31% (7/22)        
remote: Counting objects:  36% (8/22)        
remote: Counting objects:  40% (9/22)        
remote: Counting objects:  45% (10/22)        
remote: Counting objects:  50% (11/22)        
remote: Counting objects:  54% (12/22)        
remote: Counting objects:  59% (13/22)        
remote: Counting objects:  63% (14/22)        
remote: Counting objects:  68% (15/22)        
remote: Counting objects:  72% (16/22)        
remote: Counting objects:  77% (17/22)        
remote: Counting objects:  81% (18/22)        
remote: Counting objects:  86% (19/22)        
remote: Counting objects:  90% (20/22)        
remote: Counting objects:  95% (21/22)        
remote: Counting objects: 100% (22/22)        
remote: Counting objects: 100% (22/22), done.        
remote: Compressing objects:  10% (1/10)        
remote: Compressing objects:  20% (2/10)        
remote: Compressing objects:  30% (3/10)        
remote: Compressing objects:  40% (4/10)        
remote: Compressing objects:  50% (5/10)        
remote: Compressing objects:  60% (6/10)        
remote: Compressing objects:  70% (7/10)        
remote: Compressing objects:  80% (8/10)        
remote: Compressing objects:  90% (9/10)        
remote: Compressing objects: 100% (10/10)        
remote: Compressing objects: 100% (10/10), done.        
remote: Total 12 (delta 10), reused 4 (delta 2), pack-reused 0        
From https://github.com/python/cpython
 * branch            main       -> FETCH_HEAD
Note: checking out '375c94c75dd9eaefaddd89a7f704a031441af286'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 375c94c... gh-107674: Lazy load line number to improve performance of tracing (GH-118127)
Switched to and reset branch 'main'

In file included from Python/optimizer_analysis.c:437:
Python/optimizer_cases.c.h: In function ‘optimize_uops’:
Python/optimizer_cases.c.h:1444:23: warning: assignment to ‘_PyInterpreterFrame *’ {aka ‘struct _PyInterpreterFrame *’} from incompatible pointer type ‘_Py_UopsSymbol *’ {aka ‘struct _Py_UopsSymbol *’} [-Wincompatible-pointer-types]
             gen_frame = sym_new_not_null(ctx);
                       ^

make: *** [Makefile:2232: buildbottest] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants