bpo-47177: Replace `f_lasti` with `prev_instr` #32208

brandtbucher · 2022-03-31T01:16:16Z

A nice 2% improvement:

Slower (2):
- telco: 38.6 ms +- 2.7 ms -> 40.9 ms +- 2.6 ms: 1.06x slower
- regex_dna: 1.33 sec +- 0.01 sec -> 1.34 sec +- 0.01 sec: 1.01x slower

Faster (32):
- logging_silent: 640 ns +- 27 ns -> 582 ns +- 35 ns: 1.10x faster
- scimark_sor: 736 ms +- 16 ms -> 689 ms +- 13 ms: 1.07x faster
- pycparser: 7.52 sec +- 0.10 sec -> 7.12 sec +- 0.10 sec: 1.06x faster
- chameleon: 39.8 ms +- 2.7 ms -> 37.8 ms +- 2.4 ms: 1.05x faster
- html5lib: 417 ms +- 35 ms -> 397 ms +- 31 ms: 1.05x faster
- scimark_monte_carlo: 422 ms +- 13 ms -> 402 ms +- 11 ms: 1.05x faster
- pyflate: 2.65 sec +- 0.02 sec -> 2.53 sec +- 0.02 sec: 1.05x faster
- deltablue: 22.5 ms +- 1.3 ms -> 21.6 ms +- 1.4 ms: 1.05x faster
- unpickle_pure_python: 1.43 ms +- 0.08 ms -> 1.37 ms +- 0.07 ms: 1.04x faster
- spectral_norm: 628 ms +- 10 ms -> 604 ms +- 13 ms: 1.04x faster
- hexiom: 41.0 ms +- 3.1 ms -> 39.5 ms +- 2.4 ms: 1.04x faster
- raytrace: 1.84 sec +- 0.02 sec -> 1.78 sec +- 0.02 sec: 1.03x faster
- scimark_lu: 669 ms +- 20 ms -> 648 ms +- 20 ms: 1.03x faster
- xml_etree_generate: 474 ms +- 11 ms -> 459 ms +- 13 ms: 1.03x faster
- richards: 296 ms +- 12 ms -> 287 ms +- 14 ms: 1.03x faster
- nbody: 555 ms +- 17 ms -> 540 ms +- 15 ms: 1.03x faster
- pickle_dict: 170 us +- 8 us -> 165 us +- 7 us: 1.03x faster
- crypto_pyaes: 510 ms +- 12 ms -> 497 ms +- 13 ms: 1.03x faster
- scimark_fft: 1.99 sec +- 0.03 sec -> 1.95 sec +- 0.02 sec: 1.02x faster
- regex_compile: 819 ms +- 11 ms -> 800 ms +- 10 ms: 1.02x faster
- chaos: 423 ms +- 12 ms -> 414 ms +- 10 ms: 1.02x faster
- meteor_contest: 643 ms +- 13 ms -> 630 ms +- 12 ms: 1.02x faster
- pidigits: 1.18 sec +- 0.01 sec -> 1.16 sec +- 0.01 sec: 1.02x faster
- sympy_expand: 2.86 sec +- 0.02 sec -> 2.81 sec +- 0.02 sec: 1.02x faster
- nqueens: 498 ms +- 13 ms -> 490 ms +- 13 ms: 1.02x faster
- xml_etree_parse: 953 ms +- 20 ms -> 938 ms +- 16 ms: 1.02x faster
- 2to3: 1.60 sec +- 0.01 sec -> 1.58 sec +- 0.01 sec: 1.02x faster
- xml_etree_process: 331 ms +- 12 ms -> 326 ms +- 11 ms: 1.02x faster
- sympy_str: 1.71 sec +- 0.02 sec -> 1.69 sec +- 0.02 sec: 1.01x faster
- fannkuch: 2.42 sec +- 0.02 sec -> 2.39 sec +- 0.02 sec: 1.01x faster
- dulwich_log: 381 ms +- 11 ms -> 377 ms +- 11 ms: 1.01x faster
- xml_etree_iterparse: 624 ms +- 14 ms -> 618 ms +- 13 ms: 1.01x faster

Benchmark hidden because not significant (28): django_template, float, go, json, json_dumps, json_loads, logging_format, logging_simple, mako, pathlib, pickle, pickle_list, pickle_pure_python, python_startup, python_startup_no_site, regex_effbot, regex_v8, scimark_sparse_mat_mult, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_integrate, sympy_sum, thrift, tornado_http, unpack_sequence, unpickle, unpickle_list

Geometric mean: 1.02x faster

https://bugs.python.org/issue47177

Include/internal/pycore_frame.h

markshannon

Looks very promising.

I think there is an off by one error, and I have a few suggestions and questions.

Modules/_tracemalloc.c

Objects/frameobject.c

Objects/genobject.c

markshannon · 2022-03-31T14:22:00Z

Objects/genobject.c

-                                            code->co_filename,
-                                            PyCode_Addr2Line(frame->f_code, frame->f_lasti*sizeof(_Py_CODEUNIT)),
+        int addr = _PyInterpreterFrame_LASTI(frame) * sizeof(_Py_CODEUNIT);
+        int line = PyCode_Addr2Line(frame->f_code, addr);


Slightly off-topic, but I wonder why we are going to the expense of computing the line here, rather than just storing the instruction offset?

markshannon · 2022-03-31T14:22:11Z

Python/ceval.c

        OPCODE_EXE_INC(op); \
        _py_stats.opcode_stats[lastopcode].pair_count[op]++; \
        lastopcode = op; \
    } while (0)
 #else
-#define INSTRUCTION_START(op) frame->f_lasti = INSTR_OFFSET(); next_instr++
+#define INSTRUCTION_START(op) (frame->next_instr = ++next_instr)


Here's something to consider.

Rather that storing next_instr, we could store last_instr (strictly, next_instr_less_one) as it would break the dependency of the store on the addition.
frame->next_instr_less_one = next_instr; next_instr++
The compiler can then merge the next_instr++ with any subsequent JUMPBY(INLINE_CACHE_ENTRIES_...)

It would also simplify the changes as you wouldn't need to change all the offsets by one.

Thoughts?

Interesting. I'll try that out in another branch.

brandtbucher/cpython@lasti...brandtbucher:lasti-alt

The performance numbers show no difference between that branch and this one. The other branch does result in a net reduction of "adjust by 1" moves, so I think I slightly prefer it.

Thoughts?

I prefer the other version. It should be quicker, and feels less intrusive. I'm not surprised that there is no measurable difference in performance, I would expect it to be very slight.

Python/ceval.c

bedevere-bot · 2022-03-31T14:37:13Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

markshannon · 2022-03-31T17:07:37Z

My performance test shows the same 2% speedup but with less variation, and no 6% slowdown for telco.

brandtbucher · 2022-03-31T17:10:34Z

My performance test shows the same 2% but with less variation, and no 6% slowdown for telco.

I was initially suspicious of the broad improvement too, but I was able to reproduce it with two fresh runs:

Slower (5):
- pidigits: 1.14 sec +- 0.01 sec -> 1.21 sec +- 0.01 sec: 1.07x slower
- nbody: 552 ms +- 15 ms -> 577 ms +- 22 ms: 1.05x slower
- pycparser: 7.12 sec +- 0.10 sec -> 7.41 sec +- 0.09 sec: 1.04x slower
- pickle_dict: 166 us +- 8 us -> 169 us +- 9 us: 1.02x slower
- xml_etree_parse: 927 ms +- 15 ms -> 942 ms +- 19 ms: 1.02x slower

Faster (31):
- richards: 301 ms +- 11 ms -> 276 ms +- 12 ms: 1.09x faster
- scimark_sor: 739 ms +- 13 ms -> 682 ms +- 14 ms: 1.08x faster
- nqueens: 523 ms +- 14 ms -> 484 ms +- 12 ms: 1.08x faster
- logging_silent: 647 ns +- 25 ns -> 604 ns +- 40 ns: 1.07x faster
- spectral_norm: 633 ms +- 14 ms -> 592 ms +- 14 ms: 1.07x faster
- unpack_sequence: 338 ns +- 28 ns -> 318 ns +- 28 ns: 1.06x faster
- go: 873 ms +- 14 ms -> 827 ms +- 13 ms: 1.05x faster
- deltablue: 22.7 ms +- 1.1 ms -> 21.6 ms +- 1.3 ms: 1.05x faster
- raytrace: 1.85 sec +- 0.01 sec -> 1.77 sec +- 0.02 sec: 1.05x faster
- html5lib: 415 ms +- 31 ms -> 396 ms +- 30 ms: 1.05x faster
- pickle_pure_python: 1.90 ms +- 0.13 ms -> 1.82 ms +- 0.15 ms: 1.05x faster
- pyflate: 2.65 sec +- 0.02 sec -> 2.54 sec +- 0.02 sec: 1.05x faster
- scimark_fft: 2.03 sec +- 0.02 sec -> 1.95 sec +- 0.04 sec: 1.04x faster
- chaos: 423 ms +- 10 ms -> 407 ms +- 10 ms: 1.04x faster
- unpickle_pure_python: 1.42 ms +- 0.07 ms -> 1.37 ms +- 0.06 ms: 1.04x faster
- json_loads: 172 us +- 19 us -> 166 us +- 8 us: 1.04x faster
- hexiom: 41.3 ms +- 2.7 ms -> 39.8 ms +- 2.5 ms: 1.04x faster
- scimark_lu: 672 ms +- 15 ms -> 649 ms +- 15 ms: 1.03x faster
- float: 460 ms +- 11 ms -> 446 ms +- 10 ms: 1.03x faster
- scimark_monte_carlo: 417 ms +- 12 ms -> 404 ms +- 12 ms: 1.03x faster
- chameleon: 38.8 ms +- 2.6 ms -> 37.6 ms +- 3.1 ms: 1.03x faster
- regex_compile: 824 ms +- 13 ms -> 800 ms +- 14 ms: 1.03x faster
- xml_etree_process: 333 ms +- 14 ms -> 323 ms +- 14 ms: 1.03x faster
- sympy_expand: 2.88 sec +- 0.02 sec -> 2.81 sec +- 0.04 sec: 1.02x faster
- sympy_str: 1.72 sec +- 0.01 sec -> 1.69 sec +- 0.02 sec: 1.02x faster
- sympy_sum: 969 ms +- 13 ms -> 951 ms +- 14 ms: 1.02x faster
- dulwich_log: 383 ms +- 10 ms -> 377 ms +- 10 ms: 1.02x faster
- meteor_contest: 641 ms +- 14 ms -> 634 ms +- 16 ms: 1.01x faster
- 2to3: 1.60 sec +- 0.01 sec -> 1.58 sec +- 0.01 sec: 1.01x faster
- crypto_pyaes: 505 ms +- 14 ms -> 500 ms +- 11 ms: 1.01x faster
- sqlalchemy_declarative: 841 ms +- 15 ms -> 834 ms +- 23 ms: 1.01x faster

Benchmark hidden because not significant (26): django_template, fannkuch, json, json_dumps, logging_format, logging_simple, mako, pathlib, pickle, pickle_list, python_startup, python_startup_no_site, regex_dna, regex_effbot, regex_v8, scimark_sparse_mat_mult, sqlalchemy_imperative, sqlite_synth, sympy_integrate, telco, thrift, tornado_http, unpickle, unpickle_list, xml_etree_iterparse, xml_etree_generate

Geometric mean: 1.02x faster

So now we're triply sure.

Python/frame.c

Include/internal/pycore_frame.h

markshannon · 2022-04-06T17:19:20Z

Did you get the version with prev_instr ready for review/merging?

brandtbucher · 2022-04-06T17:25:04Z

It's ready, just slipped off my radar. Would you prefer a separate PR, or just to bring those commits over to this branch?

markshannon · 2022-04-06T18:02:30Z

Whatever is easier. I don't mind

brandtbucher · 2022-04-06T21:12:30Z

I brought the commits over here.

bedevere-bot · 2022-04-07T09:07:06Z

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 670b94b 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

markshannon · 2022-04-07T09:07:11Z

Looks good.
This is quite an intrusive change, so I'm running the buildbots before merging.

stratakis · 2022-04-07T12:26:41Z

It seems there are many reference leaks caught by various buildbots. e.g.: https://buildbot.python.org/all/#/builders/766/builds/117

brandtbucher · 2022-04-07T15:50:20Z

Looking into it.

brandtbucher · 2022-04-07T15:52:06Z

Oh wait, it looks like those are failing on other PRs, too.

brandtbucher · 2022-04-07T15:54:05Z

https://bugs.python.org/issue47250

markshannon · 2022-04-07T16:31:47Z

Feel free to merge once you are confident that all the failures are unrelated. I think they are unrelated, but I've not checked them individually.

brandtbucher · 2022-04-07T19:30:40Z

The refleaks disappear when I apply the fix in #32403. So not my fault. 🙂

vstinner · 2022-04-07T21:30:43Z

The coverage project uses the PyFrameObject.f_frame.f_lasti field and so is broken by this PR. I wrote nedbat/coveragepy#1331 to read the Python attribute, but the coverage maintainer has worries about worse performance. Maybe adding a new PyFrame_GetLastInstr() getter function would help.

brandtbucher · 2022-04-07T21:55:51Z

The PR adds a _PyInterpreterFrame_LASTI macro. Does that help?

markshannon · 2022-04-08T10:47:48Z

#32413

brandtbucher added 7 commits March 25, 2022 11:29

Use a last_instr pointer instead of a lasti offset

f7662cd

Use next_instr instead of last_instr

840e14c

Clean things up a bit

804d95d

More cleanup

16695dc

Catch up with main

664e3eb

Catch up with main

2de283b

Move parentheses

cd1aab7

brandtbucher added the performance Performance or resource usage label Mar 31, 2022

brandtbucher requested review from iritkatriel, markshannon and 1st1 as code owners March 31, 2022 01:16

bedevere-bot added the awaiting core review label Mar 31, 2022

the-knights-who-say-ni added the CLA signed label Mar 31, 2022

📜🤖 Added by blurb_it.

1d77eb3

iritkatriel reviewed Mar 31, 2022

View reviewed changes

Include/internal/pycore_frame.h Show resolved Hide resolved

markshannon requested changes Mar 31, 2022

View reviewed changes

bedevere-bot removed the awaiting core review label Mar 31, 2022

bedevere-bot added the awaiting changes label Mar 31, 2022

brandtbucher mentioned this pull request Mar 31, 2022

Replace f_lasti with next_instr faster-cpython/ideas#347

Closed

Add _PyInterpreterFrame_GetLine

1dd3ec5

arhadthedev reviewed Mar 31, 2022

View reviewed changes

Python/frame.c Outdated Show resolved Hide resolved

Apply feedback from PR review

725d34c

sweeneyde reviewed Mar 31, 2022

View reviewed changes

Include/internal/pycore_frame.h Outdated Show resolved Hide resolved

More PR review

86ed96b

brandtbucher added awaiting change review and removed awaiting changes labels Mar 31, 2022

next_instr -> prev_instr

5251e0a

brandtbucher requested a review from markshannon April 1, 2022 00:01

Fix whitespace

90a4c5a

Update NEWS

670b94b

brandtbucher changed the title ~~bpo-47177: Replace f_lasti with next_instr~~ bpo-47177: Replace f_lasti with prev_instr Apr 6, 2022

markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 7, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 7, 2022

brandtbucher merged commit ef6a482 into python:main Apr 7, 2022

bedevere-bot removed the awaiting change review label Apr 7, 2022

jouve mentioned this pull request Apr 7, 2022

build: MyFrame_lasti() uses PyObject_GetAttrString() nedbat/coveragepy#1331

Closed

brandtbucher deleted the lasti branch July 21, 2022 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-47177: Replace `f_lasti` with `prev_instr` #32208

bpo-47177: Replace `f_lasti` with `prev_instr` #32208

brandtbucher commented Mar 31, 2022 •

edited by bedevere-bot

Loading

markshannon left a comment

markshannon Mar 31, 2022

markshannon Mar 31, 2022

brandtbucher Mar 31, 2022

brandtbucher Apr 1, 2022

brandtbucher Apr 1, 2022

markshannon Apr 4, 2022 •

edited

Loading

bedevere-bot commented Mar 31, 2022

markshannon commented Mar 31, 2022 •

edited

Loading

brandtbucher commented Mar 31, 2022

markshannon commented Apr 6, 2022

brandtbucher commented Apr 6, 2022

markshannon commented Apr 6, 2022

brandtbucher commented Apr 6, 2022

bedevere-bot commented Apr 7, 2022

markshannon commented Apr 7, 2022

stratakis commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

markshannon commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

vstinner commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

markshannon commented Apr 8, 2022

bpo-47177: Replace f_lasti with prev_instr #32208

bpo-47177: Replace f_lasti with prev_instr #32208

Conversation

brandtbucher commented Mar 31, 2022 • edited by bedevere-bot Loading

markshannon left a comment

Choose a reason for hiding this comment

markshannon Mar 31, 2022

Choose a reason for hiding this comment

markshannon Mar 31, 2022

Choose a reason for hiding this comment

brandtbucher Mar 31, 2022

Choose a reason for hiding this comment

brandtbucher Apr 1, 2022

Choose a reason for hiding this comment

brandtbucher Apr 1, 2022

Choose a reason for hiding this comment

markshannon Apr 4, 2022 • edited Loading

Choose a reason for hiding this comment

bedevere-bot commented Mar 31, 2022

markshannon commented Mar 31, 2022 • edited Loading

brandtbucher commented Mar 31, 2022

markshannon commented Apr 6, 2022

brandtbucher commented Apr 6, 2022

markshannon commented Apr 6, 2022

brandtbucher commented Apr 6, 2022

bedevere-bot commented Apr 7, 2022

markshannon commented Apr 7, 2022

stratakis commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

markshannon commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

vstinner commented Apr 7, 2022

brandtbucher commented Apr 7, 2022

markshannon commented Apr 8, 2022

bpo-47177: Replace `f_lasti` with `prev_instr` #32208

bpo-47177: Replace `f_lasti` with `prev_instr` #32208

brandtbucher commented Mar 31, 2022 •

edited by bedevere-bot

Loading

markshannon Apr 4, 2022 •

edited

Loading

markshannon commented Mar 31, 2022 •

edited

Loading