-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107674: Improve performance of sys.settrace
#117133
gh-107674: Improve performance of sys.settrace
#117133
Conversation
Hi @markshannon , could you take a look at this PR when you have some time? It's almost identical to the optimization that's merged and reverted a while ago. The only difference is mentioned above. Thanks! |
I ran some benchmarks on this branch, and it doesn't seem that the "coverage" benchmark (currently our slowest vs. 3.12) is affected at all by this change. Is that expected? |
I believe coveragepy uses CTracer by default and the key improvement in this PR is to not instrument code while tracing. If the tracer is done completely in C, I don't believe there will be a significant change - because no Python code is executed during tracing. |
Hi @markshannon, I just fixed the conflict for the generated file. The test failure does not seem relevant. Anything you need me to do on this PR? This is basically the original PR that's reverted. |
I happened to look here and I've restarted the failed test -- it looks like a timing-sensitive test, so hopefully that will fix it. If it keeps failing with the same error ( |
It seemed like the test passed. I would be surprised if the changes to |
I have no idea why the test is failing, seems unrelated. |
I don't see the need for changes to
Or am I missing something? |
Now that I've read the original PR, it makes more sense why you want to change I'd be inclined to leave them alone though. For trace functions implemented in C, modifying |
The ability to not instrument the code object while tracing is a very significant improvement for Yes, this has no imapct on C functions, but that's for pure C trace functions. It does not have impact on our coverage benchmark because we only used the basic features. Even for coveragepy, there could be calls to a Python function inside the C tracer - which will benefit from this change. And coveragepy is not the only tool out in the market that uses When monitoring is off, there will be one extra branch in the common path ( And if that's still something bothers, we can do some smart circuit shorts to make it exactly the same in non-monitoring case: if (eval_breaker != version) {
DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
} That way, in the most common path, the code is exactly the same - most of the time without the monitoring |
Turning tracing on and off is going to be bad for performance no matter what. I'm not against the idea of not instrumenting tracing code, but I really don't want to negatively impact if (eval_breaker != version) {
DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
} adds bulk to the code generated by the JIT significantly. If |
It's not only about turning tracing on and off, it could be code crossing the tracing boundary. def getname(o):
return o.name
def trace(*args):
name = getname(o)
return trace
def f():
name = getname(o)
sys.settrace(trace)
for _ in range(10):
f() In the example above, If JIT is a concern, then I suppose we can put all the logic in |
Why will it be deinstrumented? |
I feel this is getting too complex for this one scenario. Perhaps we should allows tools to explicitly state which functions should not be instrumented? |
Ah I was wrong about the re-instrumenting. Okay so let's dial it back a little. Leave the I know The key thing I want to preserve here is to avoid instrumenting the code object when it's only used by tracing functions, that could save plenty of time. |
Okay I'm not sure if we have the time to merge this is - this now has conflicts with tier2 opcodes. There's an assertion in If that's still too much before the freeze, I'll live with only avoiding unnecessary calls to intrumented line callback - no changes to Basically I want to improve this in 3.13 and anything is better than nothing. |
I've fixed the assert. The assert was checking "is the instrumentation up to date?" Now it checks "is the instrumentation up to date, or are we tracing?". Which is what we want, I believe. |
Yes, changing the assertion would do as well. I'm not entirely familiar with tier2, but for |
For tier 2, Are you OK with me merging this now? |
Then yes I believe the code is in mergable state. Thanks! |
* Check tracing in RESUME_CHECK * Only change to RESUME_CHECK if not tracing
This PR is reverted in #116178 because of a refleak failure. The test was proved to be wrong(not useful anymore?) with
main
and was removed in #116687. So this PR is back!There is a slight change compared to the original PR thanks to @brandtbucher - in
RESUME_CHECK
, we should not DEOPT when eval_breaker is different than version if we are tracing. If we are not tracing, that's fine, but if we are tracing, we shoud only check the events, not the code version - otherwise we will just specialize and deopt thatRESUME
instruction everytime we execute it.This introduces an extra branch check in
RESUME_CHECK
, which I think is fine (a check is introduced toRESUME
too). There are some techniques that we can use to slightly optimize the fast path (non-tracing) by checking theeval_breaker == version
first then check tracing. I don't think that helps a lot and it would make code a bit harder to read, but I can try it if that's desired.