gh-107674: Improve performance of `sys.settrace` #117133

gaogaotiantian · 2024-03-21T18:35:05Z

This PR is reverted in #116178 because of a refleak failure. The test was proved to be wrong(not useful anymore?) with main and was removed in #116687. So this PR is back!

There is a slight change compared to the original PR thanks to @brandtbucher - in RESUME_CHECK, we should not DEOPT when eval_breaker is different than version if we are tracing. If we are not tracing, that's fine, but if we are tracing, we shoud only check the events, not the code version - otherwise we will just specialize and deopt that RESUME instruction everytime we execute it.

This introduces an extra branch check in RESUME_CHECK, which I think is fine (a check is introduced to RESUME too). There are some techniques that we can use to slightly optimize the fast path (non-tracing) by checking the eval_breaker == version first then check tracing. I don't think that helps a lot and it would make code a bit harder to read, but I can try it if that's desired.

Issue: sys.settrace dramatic slowdown in 3.12 #107674

gaogaotiantian · 2024-03-27T19:41:51Z

Hi @markshannon , could you take a look at this PR when you have some time? It's almost identical to the optimization that's merged and reverted a while ago. The only difference is mentioned above. Thanks!

brandtbucher · 2024-03-29T00:26:31Z

I ran some benchmarks on this branch, and it doesn't seem that the "coverage" benchmark (currently our slowest vs. 3.12) is affected at all by this change. Is that expected?

Benchmarks vs main.

gaogaotiantian · 2024-03-29T00:59:04Z

I believe coveragepy uses CTracer by default and the key improvement in this PR is to not instrument code while tracing. If the tracer is done completely in C, I don't believe there will be a significant change - because no Python code is executed during tracing.

gaogaotiantian · 2024-04-15T18:45:17Z

Hi @markshannon, I just fixed the conflict for the generated file. The test failure does not seem relevant. Anything you need me to do on this PR? This is basically the original PR that's reverted.

gvanrossum · 2024-04-15T18:53:30Z

I happened to look here and I've restarted the failed test -- it looks like a timing-sensitive test, so hopefully that will fix it. If it keeps failing with the same error (AssertionError: 0.015625 not less than or equal to 0.0078125 in test_denial_of_service_prevented_str_to_int in test_int.py) I recommend looking into it a bit deeper to see if the changes to the RESUME instruction could possibly have affected that specific test.

gaogaotiantian · 2024-04-15T19:19:13Z

It seemed like the test passed. I would be surprised if the changes to RESUME affects the JIT+msvc version only on a integer test. Any real issue might have a much larger impact on all platforms as it's such a commonly used bytecode.

gaogaotiantian · 2024-04-20T19:48:10Z

I have no idea why the test is failing, seems unrelated.

markshannon · 2024-04-26T15:59:36Z

I don't see the need for changes to RESUME and its variants.
RESUME and RESUME_CHECK do no monitoring, so don't seem relevant to the problem.

INSTRUMENTED_RESUME could perhaps be refactored to make the check for version updates/eval breaker fully separate from the monitoring part. Then the monitoring part could be guarded with a if (tstate->tracing == 0).

Or am I missing something?

markshannon · 2024-04-26T16:09:36Z

Now that I've read the original PR, it makes more sense why you want to change RESUME and RESUME_CHECK.

I'd be inclined to leave them alone though. For trace functions implemented in C, modifying RESUME etc makes no difference.
When monitoring is off, it adds extra code to RESUME_CHECK which is a very common instruction.

gaogaotiantian · 2024-04-26T17:41:45Z

The ability to not instrument the code object while tracing is a very significant improvement for sys.settrace. The code object could be instrumented and de-instrumented quite frequently if it was used in both tracing mode and non-tracing mode (imagine a helper function).

Yes, this has no imapct on C functions, but that's for pure C trace functions. It does not have impact on our coverage benchmark because we only used the basic features. Even for coveragepy, there could be calls to a Python function inside the C tracer - which will benefit from this change. And coveragepy is not the only tool out in the market that uses sys.settrace (arguably one of the most important ones). The improvement to tools with Python implementation is still significant (even coveragepy has a Python tracer option).

When monitoring is off, there will be one extra branch in the common path (RESUME_CHECK), which is super branch prediction friendly (always false). This adds a tiny overhead for each call, which is already not cheap anyway.

And if that's still something bothers, we can do some smart circuit shorts to make it exactly the same in non-monitoring case:

if (eval_breaker != version) {
    DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
}

That way, in the most common path, the code is exactly the same - most of the time without the monitoring eval_break == version and the branch number is the same as before.

markshannon · 2024-04-26T18:25:01Z

The code object could be instrumented and de-instrumented quite frequently if it was used in both tracing mode and non-tracing mode (imagine a helper function).

Turning tracing on and off is going to be bad for performance no matter what.

I'm not against the idea of not instrumenting tracing code, but I really don't want to negatively impact RESUME_CHECK in any way. Even this:

if (eval_breaker != version) {
    DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
}

adds bulk to the code generated by the JIT significantly.

If RESUME were to incorporate the tstate->tracing == 0, but RESUME_CHECK were not, that might work.
Execution would be slower, as it would have to deopt from RESUME_CHECK to RESUME, but it wouldn't be too bad I think.

gaogaotiantian · 2024-04-26T19:41:56Z

Turning tracing on and off is going to be bad for performance no matter what.

It's not only about turning tracing on and off, it could be code crossing the tracing boundary.

def getname(o):
    return o.name

def trace(*args):
    name = getname(o)
    return trace

def f():
    name = getname(o)

sys.settrace(trace)
for _ in range(10):
    f()

In the example above, getname will be repeatedly instrumented/deinstrumented because it is used in both tracing and non-tracing code.

If JIT is a concern, then I suppose we can put all the logic in RESUME - it would keep the same speed when it's not tracing, but the tracing case would be slightly impacted (all the RESUME_CHECK would go back to RESUME as the code versions never match). But that's still better than instrumenting the code.

markshannon · 2024-04-29T13:05:25Z

In the example above, getname will be repeatedly instrumented/deinstrumented because it is used in both tracing and non-tracing code.

Why will it be deinstrumented?
Once it is instrumented, it should stay instrumented unless the set of monitored events changes.

markshannon · 2024-04-29T13:16:50Z

I feel this is getting too complex for this one scenario.

Perhaps we should allows tools to explicitly state which functions should not be instrumented?
Something like a @nomonitoring decorator?

gaogaotiantian · 2024-04-29T17:32:01Z

Ah I was wrong about the re-instrumenting.

Okay so let's dial it back a little. Leave the RESUME_CHECK as it is and only check tracing status in RESUME, which should keep the full speed for execution without monitoring.

I know sys.monitoring is out there but there are still a large amount of tools using sys.settrace. Unless we plan to deprecate it or we see an obvious trend people starting converting their tools to it (I don't think it would happen until 3.11 is EOL), optimization to sys.settrace overall is still important.

The key thing I want to preserve here is to avoid instrumenting the code object when it's only used by tracing functions, that could save plenty of time.

gaogaotiantian · 2024-05-02T21:02:48Z

Okay I'm not sure if we have the time to merge this is - this now has conflicts with tier2 opcodes. There's an assertion in _TIER2_RESUME_CHECK that the code instrumentation version is the same as global version which is the thing we try to prevent here. I'm okay with this optimization living only in RESUME because that optimization by itself is pretty good. However, it seems like tier2 optimizer will just convert RESUME unconditionally - so our logic for tracing is not enough to prevent it from being converted which might triggered the assertion failure.

If that's still too much before the freeze, I'll live with only avoiding unnecessary calls to intrumented line callback - no changes to bytecode.c.

Basically I want to improve this in 3.13 and anything is better than nothing.

markshannon · 2024-05-03T14:03:05Z

I've fixed the assert. The assert was checking "is the instrumentation up to date?" Now it checks "is the instrumentation up to date, or are we tracing?". Which is what we want, I believe.

gaogaotiantian · 2024-05-03T16:52:21Z

Yes, changing the assertion would do as well. I'm not entirely familiar with tier2, but for RESUME_CHECK, it falls back to RESUME if it's not tracing and the code object is out of date. How about the tier2? Is there a similar mechanism to fall back to?

markshannon · 2024-05-03T18:14:50Z

For tier 2, _TIER2_RESUME_CHECK drops to tier 1 and re-executes the RESUME_CHECK, which de-opts to RESUME and the code gets instrumented.

Are you OK with me merging this now?

gaogaotiantian · 2024-05-03T18:23:41Z

Then yes I believe the code is in mergable state. Thanks!

* Check tracing in RESUME_CHECK * Only change to RESUME_CHECK if not tracing

gaogaotiantian added 2 commits March 21, 2024 10:35

pythongh-107674: Improve performance of sys.settrace (pythonGH-114986)

ce40742

Check tracing in RESUME_CHECK

104fb6c

gaogaotiantian requested a review from brandtbucher March 21, 2024 18:35

gaogaotiantian requested review from markshannon and gvanrossum as code owners March 21, 2024 18:35

bedevere-app bot added the awaiting review label Mar 21, 2024

bedevere-app bot mentioned this pull request Mar 21, 2024

sys.settrace dramatic slowdown in 3.12 #107674

Open

2 tasks

Merge branch 'main' into optimize-sys-settrace

a65be06

Merge branch 'main' into optimize-sys-settrace

5157da6

Merge branch 'main' into optimize-sys-settrace

5583b96

gaogaotiantian added 5 commits April 29, 2024 10:55

Remove the optimization in RESUME_CHECK

49545b7

Merge branch 'main' into optimize-sys-settrace

83e3934

Merge branch 'main' into optimize-sys-settrace

ee16257

Fix merge errors

0307e48

Only change to RESUME_CHECK if not tracing

d3fbd29

Fix assert

968a5f3

markshannon merged commit 9c14ed0 into python:main May 3, 2024
54 checks passed

bedevere-app bot removed the awaiting review label May 3, 2024

gaogaotiantian deleted the optimize-sys-settrace branch May 3, 2024 18:54

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024

pythongh-107674: Improve performance of sys.settrace (pythonGH-117133)

e8e745c

* Check tracing in RESUME_CHECK * Only change to RESUME_CHECK if not tracing

gaogaotiantian mentioned this pull request Jul 25, 2024

gh-122247: Move instruction instrumentation sanity check after tracing check #122251

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-107674: Improve performance of `sys.settrace` #117133

gh-107674: Improve performance of `sys.settrace` #117133

gaogaotiantian commented Mar 21, 2024 •

edited by bedevere-app bot

Loading

gaogaotiantian commented Mar 27, 2024

brandtbucher commented Mar 29, 2024

gaogaotiantian commented Mar 29, 2024

gaogaotiantian commented Apr 15, 2024

gvanrossum commented Apr 15, 2024

gaogaotiantian commented Apr 15, 2024

gaogaotiantian commented Apr 20, 2024

markshannon commented Apr 26, 2024

markshannon commented Apr 26, 2024

gaogaotiantian commented Apr 26, 2024

markshannon commented Apr 26, 2024

gaogaotiantian commented Apr 26, 2024

markshannon commented Apr 29, 2024

markshannon commented Apr 29, 2024

gaogaotiantian commented Apr 29, 2024

gaogaotiantian commented May 2, 2024

markshannon commented May 3, 2024

gaogaotiantian commented May 3, 2024

markshannon commented May 3, 2024

gaogaotiantian commented May 3, 2024

gh-107674: Improve performance of sys.settrace #117133

gh-107674: Improve performance of sys.settrace #117133

Conversation

gaogaotiantian commented Mar 21, 2024 • edited by bedevere-app bot Loading

gaogaotiantian commented Mar 27, 2024

brandtbucher commented Mar 29, 2024

gaogaotiantian commented Mar 29, 2024

gaogaotiantian commented Apr 15, 2024

gvanrossum commented Apr 15, 2024

gaogaotiantian commented Apr 15, 2024

gaogaotiantian commented Apr 20, 2024

markshannon commented Apr 26, 2024

markshannon commented Apr 26, 2024

gaogaotiantian commented Apr 26, 2024

markshannon commented Apr 26, 2024

gaogaotiantian commented Apr 26, 2024

markshannon commented Apr 29, 2024

markshannon commented Apr 29, 2024

gaogaotiantian commented Apr 29, 2024

gaogaotiantian commented May 2, 2024

markshannon commented May 3, 2024

gaogaotiantian commented May 3, 2024

markshannon commented May 3, 2024

gaogaotiantian commented May 3, 2024

gh-107674: Improve performance of `sys.settrace` #117133

gh-107674: Improve performance of `sys.settrace` #117133

gaogaotiantian commented Mar 21, 2024 •

edited by bedevere-app bot

Loading