-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107674: Lazy load line number to improve performance of tracing #118127
gh-107674: Lazy load line number to improve performance of tracing #118127
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor issue, but looks good overall.
Ultimately we will need to fix the line number tables for large functions.
It causes problems not only for monitoring, but tracebacks as well.
This seems like a nice improvement until then.
else { | ||
return PyUnstable_InterpreterFrame_GetLine(f->f_frame); | ||
|
||
if (f->f_lineno > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If f->f_lineno == 0
we end up recalculating the line number, so if PyUnstable_InterpreterFrame_GetLine()
returns -1 it will get called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense.
Co-authored-by: Mark Shannon <mark@hotpy.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
|
The new monitoring mechanism has a huge performance regression when the function body (or the module body) is too long. It's not that uncommon to put a very large data structure (dict?) as a constant in a python file just because it's easy to load. This regression would make coverage and debugging miserable when the user uses libraries with such a file.
The core reason is line number calculation.
The line number for each opcode is calculated once when the monitoring line data is initialized. This part is relatively fast because it goes through the addresses linearly. So basically one pass to get the line numbers. However, as we are conservative on memory usage (for each byte code), we only have an 8-bit value to store the line number. Currently we are using a heuristic of shifting the index of the byte code and adjusting by an offset, which covers function under a couple hundred lines well. However, the performance is horrible when the heuristic does not work.
For each line event, we need to calculate two line numbers, line number of current and previous bytecode. If the heuristic does not work, we need to scan from the beginning of the address, which makes each calculate about O(N) to the length of the code object. That gives a O(N^2) complexity for really long code objects like the constant mentioned above.
For example, a dict with 3k lines takes about 0.03s to execute on my computer. With an empty trace function, it takes 0.3s. That's a 10x overhead for the mechanism itself. It would get worse when the dict is longer.
So, can we do better?
The reason we need both line numbers for the current and the previous bytecode, is that we need to compare whether the line number changed - if it does not change, we don't emit the event. However, that's some information that we already know when we initialize monitoring line data. If it's not a jump target, we have a guaranteed way to know whether that bytecode will trigger a line event.
Thus, we can add a new special value for the
line_delta
to indicate this bytecode must be a new line, to avoid having to calculate the previous line when the heuristic fails to work. That's 50% less workload.Even better, considering that for
sys.settrace
, in many cases, we do not even need the line number, just whether it should be a new line event, we don't even have to calculate the current line number before we trigger the event. (Think of then
command of pdb, when the frame is not correct, it just continues). So we can make the line number lazy loaded, only calculate it when we need it. The mechanism is there, just need to hook it up.I added the cache so we don't need to calculate it every time we try to access it. Fixed an issue that after exception event the
f_lineno
is not cleared to make it work.After all the changes above, the dict with an empty trace function is as fast as no trace. I believe this will also improve the tracing mechanism in general whenever the heuristic does not work, with a very small cost of losing one offset.