Further overhead reductions #203

Theelx · 2023-02-25T23:16:13Z

This PR changes the data structures used to lower overhead. Specifically, a preshmap (pre-hashed map) is used for checking whether a thread has been seen before, which is significantly faster than a simple threading.get_ident() call every time. Additionally, a significant source of overhead is reduced by eliminating one of the hpTimer() calls. Each call can take up to 50 cycles for nanosecond resolution (on Linux), and overall the two calls together summed to half of the overhead. By removing one of the calls, the accuracy hasn't been measurably reduced (since the callback is so much faster than when hpTimer() was originall added). The last major change is that the STL unordered_map container is no longer used, as it was causing significant overhead when hashing the keys, and a parallelized version can take advantage of multiple cores better. Preshed and cymem are both submodules because trying to include them from a pypi installation breaks on a pyenv virtual environment (on my system).

Another fairly minor change is that with the release of Cython 3.0.0b1, the default for cdef functions has been changed to propagate python exceptions. This imposes a 3x speed penalty, raising it to unacceptable levels, so I added noexcept to function signatures to avoid this issue, since our cdef functions don't raise python exceptions.

Here's a table showing the current state of overhead. Here's a comment explaining jsonpickle.

Kernprof Version	jsonpickle Slowdown	Worst-case Slowdown
3.5.1	3x	60x
4.0.2	~1.9x	30x
This PR	~1.5x	10x

Note: Please test this on async code! I haven't had a chance to, and I don't know if the threading handling is entirely accurate with async code.

Theelx · 2023-02-26T00:25:39Z

It seems as if there are a decent number of test failures. Please consider this a draft pull request for now, until I can get them all fixed (hopefully in the next week or so).

ta946 · 2023-04-12T10:04:17Z

hey guys, any updates on this?
Just wrote some recursive functions with heavily nested but light for loops eg: dict update stuff (i know, worst case scenario. Also loops in python 😭)
Saw a 10x slowdown compared to manual timing

might create a new branch in the autoprofiler fork and merge the current state of this pr to reduce overhead until you have some free time to spare for this

Theelx added 21 commits February 22, 2023 17:03

Lower cython overhead by changing data structures

3c990ea

Remove unnecessary submodule

5f06d36

Increase portability

383f448

Fix preshed

f38c412

Merge branch 'main' of https://github.com/Theelx/line_profiler

55a6a56

Fix cymem install error

f7cbe31

Fix runtime issues

105695f

Use raw PyObject_Hash for less overhead

4c2b3e9

Add Cython3.0.0b1 compatability + small speedups

15a12f9

Make changelog more descriptive

8ef120c

Fix submodules

5c4df70

Fix old macos builds

34cc1a4

fix macos part 2

8864c40

Fix build errors part 3

c1ca12e

Fix setup.py issues

4e6fd87

Try to fix build errors: part 5

23e803a

Fix build errors: part 6

b84a936

Fix build errors: part 7

1e07e72

Fix build errors: part 8

a8b9185

Test not including macos

d2f7756

Ensure non-intel macos system is used

0c42514

alexmv mentioned this pull request Mar 2, 2023

Add Cython3.0.0b1 compatibility. #205

Closed

Theelx mentioned this pull request Aug 12, 2023

Different approach with a few benefits #230

Open

Theelx mentioned this pull request Oct 15, 2023

Support 3.12 #246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further overhead reductions #203

Further overhead reductions #203

Theelx commented Feb 25, 2023

Theelx commented Feb 26, 2023

ta946 commented Apr 12, 2023 •

edited

Loading

Further overhead reductions #203

Are you sure you want to change the base?

Further overhead reductions #203

Conversation

Theelx commented Feb 25, 2023

Theelx commented Feb 26, 2023

ta946 commented Apr 12, 2023 • edited Loading

ta946 commented Apr 12, 2023 •

edited

Loading