Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-107674: Improve performance of sys.settrace #117133

Merged
merged 11 commits into from
May 3, 2024

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantian gaogaotiantian commented Mar 21, 2024

This PR is reverted in #116178 because of a refleak failure. The test was proved to be wrong(not useful anymore?) with main and was removed in #116687. So this PR is back!

There is a slight change compared to the original PR thanks to @brandtbucher - in RESUME_CHECK, we should not DEOPT when eval_breaker is different than version if we are tracing. If we are not tracing, that's fine, but if we are tracing, we shoud only check the events, not the code version - otherwise we will just specialize and deopt that RESUME instruction everytime we execute it.

This introduces an extra branch check in RESUME_CHECK, which I think is fine (a check is introduced to RESUME too). There are some techniques that we can use to slightly optimize the fast path (non-tracing) by checking the eval_breaker == version first then check tracing. I don't think that helps a lot and it would make code a bit harder to read, but I can try it if that's desired.

@gaogaotiantian
Copy link
Member Author

Hi @markshannon , could you take a look at this PR when you have some time? It's almost identical to the optimization that's merged and reverted a while ago. The only difference is mentioned above. Thanks!

@brandtbucher
Copy link
Member

I ran some benchmarks on this branch, and it doesn't seem that the "coverage" benchmark (currently our slowest vs. 3.12) is affected at all by this change. Is that expected?

Benchmarks vs main.

@gaogaotiantian
Copy link
Member Author

I believe coveragepy uses CTracer by default and the key improvement in this PR is to not instrument code while tracing. If the tracer is done completely in C, I don't believe there will be a significant change - because no Python code is executed during tracing.

@gaogaotiantian
Copy link
Member Author

Hi @markshannon, I just fixed the conflict for the generated file. The test failure does not seem relevant. Anything you need me to do on this PR? This is basically the original PR that's reverted.

@gvanrossum
Copy link
Member

I happened to look here and I've restarted the failed test -- it looks like a timing-sensitive test, so hopefully that will fix it. If it keeps failing with the same error (AssertionError: 0.015625 not less than or equal to 0.0078125 in test_denial_of_service_prevented_str_to_int in test_int.py) I recommend looking into it a bit deeper to see if the changes to the RESUME instruction could possibly have affected that specific test.

@gaogaotiantian
Copy link
Member Author

It seemed like the test passed. I would be surprised if the changes to RESUME affects the JIT+msvc version only on a integer test. Any real issue might have a much larger impact on all platforms as it's such a commonly used bytecode.

@gaogaotiantian
Copy link
Member Author

I have no idea why the test is failing, seems unrelated.

@markshannon
Copy link
Member

I don't see the need for changes to RESUME and its variants.
RESUME and RESUME_CHECK do no monitoring, so don't seem relevant to the problem.

INSTRUMENTED_RESUME could perhaps be refactored to make the check for version updates/eval breaker fully separate from the monitoring part. Then the monitoring part could be guarded with a if (tstate->tracing == 0).

Or am I missing something?

@markshannon
Copy link
Member

Now that I've read the original PR, it makes more sense why you want to change RESUME and RESUME_CHECK.

I'd be inclined to leave them alone though. For trace functions implemented in C, modifying RESUME etc makes no difference.
When monitoring is off, it adds extra code to RESUME_CHECK which is a very common instruction.

@gaogaotiantian
Copy link
Member Author

The ability to not instrument the code object while tracing is a very significant improvement for sys.settrace. The code object could be instrumented and de-instrumented quite frequently if it was used in both tracing mode and non-tracing mode (imagine a helper function).

Yes, this has no imapct on C functions, but that's for pure C trace functions. It does not have impact on our coverage benchmark because we only used the basic features. Even for coveragepy, there could be calls to a Python function inside the C tracer - which will benefit from this change. And coveragepy is not the only tool out in the market that uses sys.settrace (arguably one of the most important ones). The improvement to tools with Python implementation is still significant (even coveragepy has a Python tracer option).

When monitoring is off, there will be one extra branch in the common path (RESUME_CHECK), which is super branch prediction friendly (always false). This adds a tiny overhead for each call, which is already not cheap anyway.

And if that's still something bothers, we can do some smart circuit shorts to make it exactly the same in non-monitoring case:

if (eval_breaker != version) {
    DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
}

That way, in the most common path, the code is exactly the same - most of the time without the monitoring eval_break == version and the branch number is the same as before.

@markshannon
Copy link
Member

The code object could be instrumented and de-instrumented quite frequently if it was used in both tracing mode and non-tracing mode (imagine a helper function).

Turning tracing on and off is going to be bad for performance no matter what.

I'm not against the idea of not instrumenting tracing code, but I really don't want to negatively impact RESUME_CHECK in any way. Even this:

if (eval_breaker != version) {
    DEOPT_IF(tstate->tracing == 0 || (eval_breaker & _PY_EVAL_EVENTS_MASK))
}

adds bulk to the code generated by the JIT significantly.

If RESUME were to incorporate the tstate->tracing == 0, but RESUME_CHECK were not, that might work.
Execution would be slower, as it would have to deopt from RESUME_CHECK to RESUME, but it wouldn't be too bad I think.

@gaogaotiantian
Copy link
Member Author

Turning tracing on and off is going to be bad for performance no matter what.

It's not only about turning tracing on and off, it could be code crossing the tracing boundary.

def getname(o):
    return o.name

def trace(*args):
    name = getname(o)
    return trace

def f():
    name = getname(o)

sys.settrace(trace)
for _ in range(10):
    f()

In the example above, getname will be repeatedly instrumented/deinstrumented because it is used in both tracing and non-tracing code.

If JIT is a concern, then I suppose we can put all the logic in RESUME - it would keep the same speed when it's not tracing, but the tracing case would be slightly impacted (all the RESUME_CHECK would go back to RESUME as the code versions never match). But that's still better than instrumenting the code.

@markshannon
Copy link
Member

In the example above, getname will be repeatedly instrumented/deinstrumented because it is used in both tracing and non-tracing code.

Why will it be deinstrumented?
Once it is instrumented, it should stay instrumented unless the set of monitored events changes.

@markshannon
Copy link
Member

I feel this is getting too complex for this one scenario.

Perhaps we should allows tools to explicitly state which functions should not be instrumented?
Something like a @nomonitoring decorator?

@gaogaotiantian
Copy link
Member Author

Ah I was wrong about the re-instrumenting.

Okay so let's dial it back a little. Leave the RESUME_CHECK as it is and only check tracing status in RESUME, which should keep the full speed for execution without monitoring.

I know sys.monitoring is out there but there are still a large amount of tools using sys.settrace. Unless we plan to deprecate it or we see an obvious trend people starting converting their tools to it (I don't think it would happen until 3.11 is EOL), optimization to sys.settrace overall is still important.

The key thing I want to preserve here is to avoid instrumenting the code object when it's only used by tracing functions, that could save plenty of time.

@gaogaotiantian
Copy link
Member Author

Okay I'm not sure if we have the time to merge this is - this now has conflicts with tier2 opcodes. There's an assertion in _TIER2_RESUME_CHECK that the code instrumentation version is the same as global version which is the thing we try to prevent here. I'm okay with this optimization living only in RESUME because that optimization by itself is pretty good. However, it seems like tier2 optimizer will just convert RESUME unconditionally - so our logic for tracing is not enough to prevent it from being converted which might triggered the assertion failure.

If that's still too much before the freeze, I'll live with only avoiding unnecessary calls to intrumented line callback - no changes to bytecode.c.

Basically I want to improve this in 3.13 and anything is better than nothing.

@markshannon
Copy link
Member

I've fixed the assert. The assert was checking "is the instrumentation up to date?" Now it checks "is the instrumentation up to date, or are we tracing?". Which is what we want, I believe.

@gaogaotiantian
Copy link
Member Author

Yes, changing the assertion would do as well. I'm not entirely familiar with tier2, but for RESUME_CHECK, it falls back to RESUME if it's not tracing and the code object is out of date. How about the tier2? Is there a similar mechanism to fall back to?

@markshannon
Copy link
Member

For tier 2, _TIER2_RESUME_CHECK drops to tier 1 and re-executes the RESUME_CHECK, which de-opts to RESUME and the code gets instrumented.

Are you OK with me merging this now?

@gaogaotiantian
Copy link
Member Author

Then yes I believe the code is in mergable state. Thanks!

@markshannon markshannon merged commit 9c14ed0 into python:main May 3, 2024
54 checks passed
@gaogaotiantian gaogaotiantian deleted the optimize-sys-settrace branch May 3, 2024 18:54
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
* Check tracing in RESUME_CHECK

* Only change to RESUME_CHECK if not tracing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants