Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.settrace dramatic slowdown in 3.12 #107674

Open
2 tasks done
nedbat opened this issue Aug 5, 2023 · 34 comments
Open
2 tasks done

sys.settrace dramatic slowdown in 3.12 #107674

nedbat opened this issue Aug 5, 2023 · 34 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented Aug 5, 2023

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker, and am confident this bug has not been reported before

A clear and concise description of the bug

A bug report in coverage (nedbat/coveragepy#1665) is reduced here to a pure do-nothing trace function for sys.settrace.

Reproduction:

% git clone https://github.com/nedbat/ndindex

% cd ndindex

% python3.11 -VV
Python 3.11.4 (main, Jun  7 2023, 08:42:37) [Clang 14.0.3 (clang-1403.0.22.14.1)]

% python3.11 -m venv v311

% ./v311/bin/pip install numpy

% python3.12 -VV
Python 3.12.0b4 (main, Jul 11 2023, 20:38:26) [Clang 14.0.3 (clang-1403.0.22.14.1)]

% python3.12 -m venv v312

% ./v312/bin/pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy

% # Run the code without the trace function.
% for x in 1 2 3; time ./v311/bin/python -m ndindex.tests.justdoit notrace
./v311/bin/python -m ndindex.tests.justdoit notrace  7.05s user 0.37s system 108% cpu 6.813 total
./v311/bin/python -m ndindex.tests.justdoit notrace  7.04s user 0.34s system 109% cpu 6.724 total
./v311/bin/python -m ndindex.tests.justdoit notrace  7.00s user 0.35s system 109% cpu 6.696 total

% # 3.12 is slightly faster without the trace function.
% for x in 1 2 3; time ./v312/bin/python -m ndindex.tests.justdoit notrace
./v312/bin/python -m ndindex.tests.justdoit notrace  6.56s user 0.30s system 106% cpu 6.422 total
./v312/bin/python -m ndindex.tests.justdoit notrace  6.48s user 0.31s system 106% cpu 6.359 total
./v312/bin/python -m ndindex.tests.justdoit notrace  6.38s user 0.28s system 107% cpu 6.217 total

% # 3.11 with tracing is about 3x slower.
% for x in 1 2 3; time ./v311/bin/python -m ndindex.tests.justdoit trace
./v311/bin/python -m ndindex.tests.justdoit trace  22.64s user 0.51s system 101% cpu 22.772 total
./v311/bin/python -m ndindex.tests.justdoit trace  21.92s user 0.46s system 101% cpu 21.979 total
./v311/bin/python -m ndindex.tests.justdoit trace  21.55s user 0.35s system 102% cpu 21.379 total

% # 3.12 with tracing is 7x slower.
% for x in 1 2 3; time ./v312/bin/python -m ndindex.tests.justdoit trace
./v312/bin/python -m ndindex.tests.justdoit trace  49.47s user 0.40s system 100% cpu 49.676 total
./v312/bin/python -m ndindex.tests.justdoit trace  49.53s user 0.39s system 100% cpu 49.784 total
./v312/bin/python -m ndindex.tests.justdoit trace  50.44s user 0.38s system 100% cpu 50.739 total

I don't know if the unusual numpy build has something to do with this.

Linked PRs

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label Aug 5, 2023
@neonene
Copy link
Contributor

neonene commented Aug 6, 2023

This issue can be seen starting from the commit 411b169 with usual numpy-1.25.2.

@AlexWaygood
Copy link
Member

cc. @markshannon

@nedbat
Copy link
Member Author

nedbat commented Oct 24, 2023

@markshannon Is there any news about this?

@nedbat
Copy link
Member Author

nedbat commented Oct 30, 2023

@iritkatriel perhaps you have some idea?

@iritkatriel
Copy link
Member

@iritkatriel perhaps you have some idea?

Did #107780 help?

@gaogaotiantian
Copy link
Member

Hi @nedbat , I circled back to this issue. If you are still interested, could you try the patch of #114986 and see if the result is improved?

@nedbat
Copy link
Member Author

nedbat commented Feb 15, 2024

@gaogaotiantian Thanks. I tried #114986, and it didn't run faster. I used the https://github.com/Quansight-Labs/ndindex repo test suite:
3.11: 160s
3.12: 255s
3.13 main: 248s
pr114986: 245s

I wish I had better news.

@gaogaotiantian
Copy link
Member

Oh, did you test it with an empty trace function?

@nedbat
Copy link
Member Author

nedbat commented Feb 15, 2024

I will do more tests, but the ultimate goal is to perform better in real-world scenarios. Here is more data from a subset of the test suite (-k test_chunking), which looks much more promising. It's hard to get consistent numbers, but this looks good. I would merge this change.

Python version no tracing Python trace C trace
3.11.8 1.19s 7.39s 2.46s
3.12.2 1.12s 11.57s 3.71s
3.13.0a3+ heads/main:a95b1a56bb 1.14s 10.37s 3.89s
3.13.0a3+ heads/pr/114986:cb09c55758 1.14s 6.48s 2.91s

@gaogaotiantian
Copy link
Member

Thank you for the test. Yes the PR only improves the performance of the actual trace function - which is normally the major cost for tracing tools. Good to know it helps with the more real-life benchmark. Unfortunately, due to the change of the tracing mechanism, the overhead for sys.settrace will always be larger than the old way - I did not see a quick solution to that.

@markshannon
Copy link
Member

Now that the eval_breaker is part of the thread state, here's an alternative approach that might work.

tstate->tracing only needs one bit, so use the low bit in the version part of the eval breaker.
For non-tracing functions we expect to always see the low bit set to 0.
For tracing functions we expect to always see the low bit set to 1.

If the code and global versions differ do the following:

  • If tracing bit is 0, update the version number (setting the low bit to 0), and instrument.
  • If the tracing bit is 1:
    • If the version number matches except the low bit: do nothing
    • Otherwise: update the version number (setting the low bit to 1), and clear any instrumentation.

I think this is correct, but I've not drawn up a full state diagram.

@encukou
Copy link
Member

encukou commented Feb 29, 2024

#114986 caused refleaks tests (./python -m test -R 3:3 test_frame) to fail.
If not fixed in 24 hours, I'll revert the commit to unblock buildbots. (I hope I'll get have time to investigate & come up with a proper fix, but can't promise that.)

@gaogaotiantian
Copy link
Member

@encukou See response in #116098

gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this issue Mar 1, 2024
encukou pushed a commit that referenced this issue Mar 1, 2024
…H-114986)" (GH-116178)

Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)"

This reverts commit 0a61e23.
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 23, 2024
Python 3.12 has a significant regression in coverage analysis. Related issue:
python/cpython#107674 . This patch replaces coverage
with a custom framework built using `sys.monitoring`, which is only available
as of Python 3.12.
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 24, 2024
Python 3.12 has a significant regression in coverage analysis. Related issue:
python/cpython#107674 . This patch replaces coverage
with a custom framework built using `sys.monitoring`, which is only available
as of Python 3.12.
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 24, 2024
Python 3.12 has a significant regression in coverage analysis. Related issue:
python/cpython#107674 . This patch replaces coverage
with a custom framework built using `sys.monitoring`, which is only available
as of Python 3.12.
joostvanzwieten added a commit to evalf/nutils that referenced this issue May 24, 2024
Python 3.12 has a significant regression in coverage analysis. Related issue:
python/cpython#107674 . This PR replaces coverage
with a custom framework built using `sys.monitoring`, which is only available
as of Python 3.12.
@gaogaotiantian
Copy link
Member

Hi @nedbat , do you have a chance to test the performance of coverage on 3.13b against 3.12? I made a few optimizations and I'd hope the performance issue is relieved a bit.

@nedbat
Copy link
Member Author

nedbat commented May 28, 2024

I'm having a hard time re-creating the scenarios I used before :(

@gaogaotiantian
Copy link
Member

That's okay. Does coveragepy have nightly validations for 3.13b? Is there a basic timing benchmark that we can refer to? Just a ballpark would be helpful.

@asmeurer
Copy link

You can try running the ndindex tests, which was the original thing I noticed this issue with in nedbat/coveragepy#1665. Just clone

https://github.com/Quansight-Labs/ndindex

then install the packages in requirements-dev.txt and run

time pytest

(coverage is enabled automatically). On my computer with Python 3.11.9, the tests take 2:48 and with Python 3.12.0 they take 4:21 (both with coverage 7.5.2). If you want to run the tests with coverage disabled you should remove the coverage options from pytest.ini. Note that there is a degree of variation in the test times due to the use of hypothesis. You might want to set the --hypothesis-seed to avoid this.

@nedbat
Copy link
Member Author

nedbat commented May 28, 2024

Coverage.py has an overnight job that runs its test suite on nightly builds of Python, including 3.13 and 3.14 now: https://github.com/nedbat/coveragepy/actions/runs/9265205433 There isn't a timing test in there though.

@gaogaotiantian
Copy link
Member

Okay thanks @nedbat and @asmeurer . I was hoping for some quick and existing benchmark but it's okay that we don't have any for now. Just wondering if we should close this issue now that we have implemented some optimizations.

@nedbat
Copy link
Member Author

nedbat commented May 29, 2024

I have a benchmarking framework, but it's fragile. I will try to patch it up later this week.

@nedbat
Copy link
Member Author

nedbat commented Sep 7, 2024

I ran benchmarks again on 3.11, 3.12, and 3.13:

cov proj python3.11 python3.12 python3.13 3.12 vs 3.11 3.13 vs 3.12
761 mashumaro 56.4s 62.6s 67.9s 111% 108%
761 pygments 25.1s 56.2s 38.9s 224% 69%
761 mypy 109.4s 113.8s 104.4s 104% 92%

The specific versions were the newly released 3.11.10, 3.12.6, and 3.13.0rc2. The test suites were run three times, and the median time used.

@gaogaotiantian
Copy link
Member

Thanks @nedbat . I'm a bit surprised that on some cases 3.13 is slower than 3.12 - I did not think of anything that we did to slow down the tracing. Glad to see we had some significant improvements on certain cases.

@nedbat
Copy link
Member Author

nedbat commented Feb 12, 2025

I continue to get people reporting serious performance problems on 3.12. Is there any chance of an improvement?

@junkmd
Copy link
Contributor

junkmd commented Feb 14, 2025

Considering the issue observed in enthought/comtypes#775, where execution time increased nearly tenfold, there is a performance "regression" in Python 3.12.

Only one bugfix release remains for Python 3.12:
https://pythonpeps.python.org/pep-0693/

If #107841 is not merged by then, I believe the Python community will miss an opportunity to reconsider this issue.

If "the fix patch" is merged before Python 3.12 transitions to security-fix only status, I would accept enabling coverage measurement in the comtypes CI tests for all Python versions (and waiting until 3.12.10 is released). Otherwise, only Python 3.12 will be tested without coverage measurement in the comtypes CI pipeline.

As discussed in #107841, I understand that this issue doesn't involve "broken" programs but rather a "performance improvement", and that the changes are complex to accept during the bugfix phase. However, fixing this could potentially benefit projects and the wider ecosystem running on Python 3.12.

I would like to, once again, hear the opinions of the core developers and the community on this matter.

@gaogaotiantian
Copy link
Member

Ah, @markshannon merged #128350 in main but did not backport it to 3.12 and 3.13 due to conflicts. #107841 is not the problem fixer. I believe backporting #128350 should fix the issue in 3.12 and 3.13.

junkmd added a commit to junkmd/comtypes-python-dev-compatibility that referenced this issue Feb 17, 2025
- v3.12.8
- cc90ef8
junkmd added a commit to junkmd/comtypes-python-dev-compatibility that referenced this issue Feb 18, 2025
@junkmd
Copy link
Contributor

junkmd commented Feb 18, 2025

I tested whether #107841 is the problem fixer.

I built Python from v3.12.8 and also from the 3.12 branch with 37d8b90 cherry-picked.
And I compared the execution time of comtypes test cases.

with cov 37d8b90 changes Python 3.12.8
yes 1101.106s 1254.337s
no 57.869s 46.481s

The build with 37d8b90 changes is about 153s faster than v3.12.8.
However, this performance improvement is still not significant.

I haven't tested backporting #128350 yet because I'm not sure how to resolve conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests