bpo-43684: Add ADD_INT opcode #25090

gvanrossum · 2021-03-30T00:08:11Z

ADD_INT combines LOAD_CONST with BINARY_ADD, but only when the constant is an integer in range(256). It adds a new internal function to longobject.c that is called from ceval.c to deal with the PyLong internals. This optimizes relatively common code expressions like x + 1.

I'm creating this PR in draft mode because I haven't validated the speedup yet (there definitely is some in a micro-benchmark) ~~and because I have ideas for more opcodes that could be added this way. In particular, the following seem promising:~~

~~LOAD_CONST(None) + IS_OP + POP_JUMP_IF_{TRUE,FALSE}~~
~~(LOAD_CONST or LOAD_FAST) + RETURN_VALUE~~
~~(CALL_METHOD or CALL_FUNCTION) + (POP_TOP or RETURN_VALUE)~~
~~GET_ITER + FOR_ITER~~

~~All these occur somewhat commonly as opcode pairs in an app I ran with DXPAIRS enabled, and have the following desirable properties:~~

~~At most one of the opcodes has an argument. (For this purpose, we treat IS_OP(0) and IS_OP(1) as different opcodes.)~~
~~Neither opcode seems a good candidate for type specialization (e.g. inline caching).~~

We also need to take care that we don't combine instructions on different line numbers.

TODO

Get benchmark numbers
Fix failure in test_buffer (items[i] = items[i+1] # IndexError: list index out of range)
Rename INT_ADD to ADD_INT
Adopt Mark's suggestion in longobject.c
Fix other test failures (e.g. test_dis)
Create bpo issue and add here
Add news blurb
~~Decide which other opcodes to do~~

https://bugs.python.org/issue43684

Include/longobject.h

Python/ceval.c

Lib/opcode.py

Objects/longobject.c

Python/compile.c

Fidget-Spinner · 2021-03-30T17:26:58Z

Hi Guido, I ran some preliminary benchmarks with the main branch against the current implementation (some tests errored out with an error message which looks like self[args[i]] = args[i + 1] # IndexError: list index out of range, this looks like the same error raised in test_buffer in the test suite)

pyperformance on average see no change (please take these benchmarks with a grain of salt, I'm quite unhappy with how much change there is for benchmarks which shouldn't even be affected :( ):

pyperformance compare output


### 2to3 ###
Mean +- std dev: 974 ms +- 9 ms -> 968 ms +- 9 ms: 1.01x faster
Not significant

### chameleon ###
Mean +- std dev: 28.3 ms +- 0.5 ms -> 28.2 ms +- 0.4 ms: 1.00x faster
Not significant

### chaos ###
Mean +- std dev: 330 ms +- 3 ms -> 323 ms +- 3 ms: 1.02x faster
Significant (t=11.94)

### crypto_pyaes ###
Mean +- std dev: 349 ms +- 4 ms -> 344 ms +- 3 ms: 1.01x faster
Not significant

### deltablue ###
Mean +- std dev: 24.6 ms +- 0.4 ms -> 24.6 ms +- 0.5 ms: 1.00x slower
Not significant

### dulwich_log ###
Mean +- std dev: 199 ms +- 6 ms -> 196 ms +- 2 ms: 1.01x faster
Not significant

### fannkuch ###
Mean +- std dev: 1.43 sec +- 0.01 sec -> 1.42 sec +- 0.01 sec: 1.01x faster
Not significant

### float ###
Mean +- std dev: 333 ms +- 6 ms -> 330 ms +- 4 ms: 1.01x faster
Not significant

### genshi_text ###
Mean +- std dev: 84.8 ms +- 1.4 ms -> 85.3 ms +- 1.2 ms: 1.01x slower
Not significant

### genshi_xml ###
Mean +- std dev: 188 ms +- 3 ms -> 184 ms +- 3 ms: 1.02x faster
Not significant

### go ###
Mean +- std dev: 705 ms +- 11 ms -> 704 ms +- 7 ms: 1.00x faster
Not significant

### hexiom ###
Mean +- std dev: 29.8 ms +- 0.4 ms -> 29.6 ms +- 0.4 ms: 1.01x faster
Not significant

### json_dumps ###
Mean +- std dev: 39.3 ms +- 0.7 ms -> 40.2 ms +- 1.8 ms: 1.02x slower
Significant (t=-3.34)

### json_loads ###
Mean +- std dev: 75.9 us +- 0.9 us -> 76.8 us +- 2.0 us: 1.01x slower
Not significant

### logging_format ###
Mean +- std dev: 30.8 us +- 0.5 us -> 30.5 us +- 0.4 us: 1.01x faster
Not significant

### logging_silent ###
Mean +- std dev: 582 ns +- 16 ns -> 573 ns +- 16 ns: 1.02x faster
Not significant

### logging_simple ###
Mean +- std dev: 28.0 us +- 0.3 us -> 27.6 us +- 0.4 us: 1.01x faster
Not significant

### mako ###
Mean +- std dev: 48.8 ms +- 0.7 ms -> 48.7 ms +- 0.5 ms: 1.00x faster
Not significant

### meteor_contest ###
Mean +- std dev: 318 ms +- 3 ms -> 311 ms +- 5 ms: 1.02x faster
Significant (t=8.49)

### nbody ###
Mean +- std dev: 413 ms +- 6 ms -> 420 ms +- 7 ms: 1.02x slower
Not significant

### nqueens ###
Mean +- std dev: 307 ms +- 3 ms -> 305 ms +- 4 ms: 1.00x faster
Not significant

### pathlib ###
Mean +- std dev: 57.8 ms +- 1.0 ms -> 58.1 ms +- 1.5 ms: 1.01x slower
Not significant

### pickle ###
Mean +- std dev: 31.7 us +- 0.6 us -> 31.4 us +- 0.7 us: 1.01x faster
Not significant

### pickle_dict ###
Mean +- std dev: 74.1 us +- 0.7 us -> 73.6 us +- 0.7 us: 1.01x faster
Not significant

### pickle_list ###
Mean +- std dev: 11.3 us +- 1.0 us -> 10.9 us +- 0.2 us: 1.03x faster
Significant (t=2.82)

### pickle_pure_python ###
Mean +- std dev: 1.41 ms +- 0.02 ms -> 1.41 ms +- 0.02 ms: 1.00x faster
Not significant

### pidigits ###
Mean +- std dev: 477 ms +- 4 ms -> 476 ms +- 3 ms: 1.00x faster
Not significant

### pyflate ###
Mean +- std dev: 2.05 sec +- 0.02 sec -> 2.01 sec +- 0.02 sec: 1.02x faster
Not significant

### python_startup ###
Mean +- std dev: 23.3 ms +- 0.2 ms -> 23.3 ms +- 0.2 ms: 1.00x slower
Not significant

### python_startup_no_site ###
Mean +- std dev: 15.5 ms +- 0.3 ms -> 15.6 ms +- 0.2 ms: 1.00x slower
Not significant

### raytrace ###
Mean +- std dev: 1.59 sec +- 0.03 sec -> 1.60 sec +- 0.02 sec: 1.00x slower
Not significant

### regex_compile ###
Mean +- std dev: 520 ms +- 6 ms -> 509 ms +- 8 ms: 1.02x faster
Significant (t=7.76)

### regex_dna ###
Mean +- std dev: 500 ms +- 5 ms -> 505 ms +- 4 ms: 1.01x slower
Not significant

### regex_effbot ###
Mean +- std dev: 8.42 ms +- 0.10 ms -> 8.57 ms +- 0.20 ms: 1.02x slower
Not significant

### regex_v8 ###
Mean +- std dev: 68.5 ms +- 1.0 ms -> 67.6 ms +- 1.0 ms: 1.01x faster
Not significant

### richards ###
Mean +- std dev: 244 ms +- 2 ms -> 241 ms +- 3 ms: 1.01x faster
Not significant

### scimark_fft ###
Mean +- std dev: 1.27 sec +- 0.01 sec -> 1.26 sec +- 0.01 sec: 1.00x faster
Not significant

### scimark_lu ###
Mean +- std dev: 539 ms +- 10 ms -> 551 ms +- 35 ms: 1.02x slower
Significant (t=-2.78)

### scimark_monte_carlo ###
Mean +- std dev: 340 ms +- 4 ms -> 335 ms +- 6 ms: 1.02x faster
Not significant

### scimark_sor ###
Mean +- std dev: 642 ms +- 15 ms -> 628 ms +- 11 ms: 1.02x faster
Significant (t=5.63)

### scimark_sparse_mat_mult ###
Mean +- std dev: 18.0 ms +- 0.2 ms -> 18.1 ms +- 0.2 ms: 1.01x slower
Not significant

### spectral_norm ###
Mean +- std dev: 447 ms +- 3 ms -> 429 ms +- 4 ms: 1.04x faster
Significant (t=26.99)

### sqlalchemy_declarative ###
Mean +- std dev: 459 ms +- 10 ms -> 459 ms +- 9 ms: 1.00x slower
Not significant

### sqlalchemy_imperative ###
Mean +- std dev: 85.9 ms +- 2.0 ms -> 84.1 ms +- 1.8 ms: 1.02x faster
Significant (t=5.21)

### sqlite_synth ###
Mean +- std dev: 8.17 us +- 0.20 us -> 8.46 us +- 0.25 us: 1.04x slower
Significant (t=-7.08)

### telco ###
Mean +- std dev: 21.0 ms +- 0.8 ms -> 21.6 ms +- 0.9 ms: 1.03x slower
Significant (t=-3.59)

### tornado_http ###
Mean +- std dev: 439 ms +- 6 ms -> 439 ms +- 7 ms: 1.00x faster
Not significant

### unpack_sequence ###
Mean +- std dev: 148 ns +- 3 ns -> 147 ns +- 3 ns: 1.01x faster
Not significant

### unpickle ###
Mean +- std dev: 42.1 us +- 1.3 us -> 41.6 us +- 0.7 us: 1.01x faster
Not significant

### unpickle_list ###
Mean +- std dev: 12.8 us +- 0.1 us -> 13.0 us +- 0.2 us: 1.02x slower
Not significant

### unpickle_pure_python ###
Mean +- std dev: 1.00 ms +- 0.02 ms -> 1.01 ms +- 0.02 ms: 1.01x slower
Not significant

### xml_etree_generate ###
Mean +- std dev: 295 ms +- 4 ms -> 293 ms +- 4 ms: 1.01x faster
Not significant

### xml_etree_iterparse ###
Mean +- std dev: 314 ms +- 4 ms -> 311 ms +- 4 ms: 1.01x faster
Not significant

### xml_etree_parse ###
Mean +- std dev: 428 ms +- 5 ms -> 432 ms +- 6 ms: 1.01x slower
Not significant

### xml_etree_process ###
Mean +- std dev: 237 ms +- 4 ms -> 236 ms +- 4 ms: 1.00x faster
Not significant

Skipped 5 benchmarks only in master_pyperformance.json: django_template, sympy_expand, sympy_integrate, sympy_str, sympy_sum

micro benchmarks have some noticeable speedups:

# pyperf timeit -s "x = 1" "x + 1"
Mean +- std dev: [master_x+1] 46.4 ns +- 1.0 ns -> [addint_x+1] 35.2 ns +- 1.2 ns: 1.32x faster

# pyperf timeit "for x in range(255): x + 1"
Mean +- std dev: [master_range255] 13.8 us +- 0.3 us -> [addint_range255] 11.8 us +- 0.3 us: 1.16x faster

gvanrossum · 2021-03-30T19:02:27Z

Hi Guido, I ran some preliminary benchmarks with the main branch against the current implementation

Thanks @Fidget-Spinner!

(some tests errored out with an error message which looks like self[args[i]] = args[i + 1] # IndexError: list index out of range, this looks like the same error raised in test_buffer in the test suite)

That's disturbing, I will look into that.

pyperformance on average see no change (please take these benchmarks with a grain of salt, I'm quite unhappy with how much change there is for benchmarks which shouldn't even be affected :( ):

Yeah, this is what I've experienced running pyperformance as well. It's really hard to move the needle, and there's so much noise that for improvements that are expected to give less than 5% improvement in the general case the best we can hope from the benchmark is give us confidence we haven't accidentally pessimized things. Your timeit-based microbenchmarks look like what I expected.

Maybe we'll be able to move the needle by adding a few other similar changes to this same PR (that saves bumps in the pyc magic number as well :-).

gvanrossum · 2021-03-30T23:46:23Z

(FWIW the reason I didn't finish generating bytecode using RETURN_CONST and RETURN_NONE is that this causes a large number of failures in test_dis.py. I need to strategize on what to do about those.)

gvanrossum · 2021-03-31T00:17:20Z

Python/compile.c

@@ -6910,14 +6910,14 @@ optimize_basic_block(basicblock *bb, PyObject *consts)
                            }
                        }
                        break;
-                    # if 0
+                    # if 1


I get a scary error in test_sys_settrace.py:

PS C:\Users\gvanrossum\cpython> .\PCbuild\amd64\python.exe -m test test_sys_settrace 0:00:00 Run tests sequentially 0:00:00 [1/1] test_sys_settrace Fatal Python error: PyFrame_BlockPop: block stack underflow Python runtime state: initialized Current thread 0x00005704 (most recent call first): File "C:\Users\gvanrossum\cpython\lib\test\test_sys_settrace.py", line 1550 in test_jump_over_return_in_try_finally_block File "C:\Users\gvanrossum\cpython\lib\test\test_sys_settrace.py", line 1179 in run_test File "C:\Users\gvanrossum\cpython\lib\test\test_sys_settrace.py", line 1207 in test File "C:\Users\gvanrossum\cpython\lib\unittest\case.py", line 549 in _callTestMethod File "C:\Users\gvanrossum\cpython\lib\unittest\case.py", line 592 in run File "C:\Users\gvanrossum\cpython\lib\unittest\case.py", line 652 in __call__ File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 122 in run File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 84 in __call__ File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 122 in run File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 84 in __call__ File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 122 in run File "C:\Users\gvanrossum\cpython\lib\unittest\suite.py", line 84 in __call__ File "C:\Users\gvanrossum\cpython\lib\test\support\testresult.py", line 169 in run File "C:\Users\gvanrossum\cpython\lib\test\support\__init__.py", line 959 in _run_suite File "C:\Users\gvanrossum\cpython\lib\test\support\__init__.py", line 1082 in run_unittest File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\runtest.py", line 210 in _test_module File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\runtest.py", line 246 in _runtest_inner2 File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\runtest.py", line 282 in _runtest_inner File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\runtest.py", line 154 in _runtest File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\runtest.py", line 194 in runtest File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\main.py", line 423 in run_tests_sequential File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\main.py", line 521 in run_tests File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\main.py", line 694 in _main File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\main.py", line 641 in main File "C:\Users\gvanrossum\cpython\lib\test\libregrtest\main.py", line 719 in main File "C:\Users\gvanrossum\cpython\lib\test\__main__.py", line 2 in <module> File "C:\Users\gvanrossum\cpython\lib\runpy.py", line 86 in _run_code File "C:\Users\gvanrossum\cpython\lib\runpy.py", line 196 in _run_module_as_main Extension modules: _testcapi, _overlapped (total: 2)

@markshannon You changed that file last. Should I just give up on RETURN_CONST/RETURN_NONE for now?

In this PR, yes. See my https://github.com/python/cpython/pull/25090/files#r604752931 for a fix.

Note: _overlapped is listed as a 3rd party C extension, whereas it's part of the stdlib. I forgot this one and I created #25122 to add it to sys.stdlib_module_names :-)

markshannon · 2021-03-31T09:39:41Z

I don't think adding a bunch of new opcodes to one PR is a good idea. It makes it difficult to review and get merged.

markshannon · 2021-03-31T09:46:06Z

Python/compile.c

+                            cnt = PyList_GET_ITEM(consts, oparg);
+                            inst->i_opcode = RETURN_CONST;
+                                // cnt == Py_None ? RETURN_NONE : RETURN_CONST;
+                            bb->b_instr[i+1].i_opcode = NOP;


This makes the last instruction of the BB a non-terminator. Either put the NOP first or shorten the BB by one.
You'll need to add RETURN_CONST as a BB terminator to all the code that cares about such things.

gvanrossum · 2021-03-31T17:11:31Z

I was hoping to get a bunch of opcodes in so that together they would show an improvement in the benchmark, but I think I've bitten off more than I could chew in one PR, so I'll remove the new opcodes other than ADD_INT later today.

smontanaro · 2021-03-31T22:21:58Z

How many potential new opcodes are in the pipeline? My register vm with squished the stack opcodes down towards zero, with my new stuff being that. Obviously, if you get much post 128 opcodes I will have some extra thinking to do. :-)

This builds on top of python#25090

gvanrossum · 2021-03-31T22:29:20Z

How many potential new opcodes are in the pipeline? My register vm with squished the stack opcodes down towards zero, with my new stuff being that. Obviously, if you get much post 128 opcodes I will have some extra thinking to do. :-)

I've got 6 in the works so far, but potentially dozens more. However for 3.10 I don't expect to be doing more than 4-6.

gvanrossum · 2021-03-31T22:36:53Z

Hm. GitHub shows that I pushed e8e979f , but that's in a different branch (trying to save some of the work I rolled back here). Looking at the File Changes tab, none of the changes in e8e979f are included in this PR. Seems a GitHub bug that it shows up on the Conversation page here. Please ignore it.

Python/ceval.c

methane · 2021-04-03T04:09:40Z

Include/longobject.h

@@ -212,6 +212,7 @@ PyAPI_FUNC(PyObject *) _PyLong_GCD(PyObject *, PyObject *);
 #ifndef Py_LIMITED_API
 PyAPI_FUNC(PyObject *) _PyLong_Rshift(PyObject *, size_t);
 PyAPI_FUNC(PyObject *) _PyLong_Lshift(PyObject *, size_t);
+PyAPI_FUNC(PyObject *) _PyLong_AddInt(PyLongObject *, int);


Should this API public?
If no, please move it to Include/internal.
If yes, how about using Py_ssize_t instead of int?

It should not be public. I will move it. I am still not used to what goes where now that we've got several different subdirectories of header files, plus "LIMITED_API" defines.

Misc/NEWS.d/next/Core and Builtins/2021-03-31-17-47-25.bpo-43684.hHmyyt.rst

Fidget-Spinner · 2021-04-03T18:12:57Z

Guido (and anyone else on this thread), do you think emitting ADD_INT for not just x + 1 but also 1 + x sounds reasonable? It may help reach more code, and also it seems a little funny that x + 1 is now magically way faster than 1 + x 😉 . I made a small patch to add that in.

I'm a little wary of this though, because it combines across instructions (though not lines!) and slightly changes the instruction order. With the patch, 1 + x and x + 1 produces the exact same bytecode.

Running the count_opcodes script against ./Lib brings the number from 529 to only 531. I'm honestly surprised at how little that number increased :(. I don't think this will increase performance of long addition chains by much either - the current implementation already does a good job capturing those.

patch

diff --git a/Python/compile.c b/Python/compile.c
index b6fbb500d2..b7f44654ae 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -6835,6 +6835,7 @@ optimize_basic_block(basicblock *bb, PyObject *consts)
         struct instr *inst = &bb->b_instr[i];
         int oparg = inst->i_oparg;
         int nextop = i+1 < bb->b_iused ? bb->b_instr[i+1].i_opcode : 0;
+        int nextnextop = i+2 < bb->b_iused ? bb->b_instr[i+2].i_opcode : 0;
         if (is_jump(inst)) {
             /* Skip over empty basic blocks. */
             while (inst->i_target->b_iused == 0) {
@@ -6848,6 +6849,7 @@ optimize_basic_block(basicblock *bb, PyObject *consts)
         switch (inst->i_opcode) {
             /* Remove LOAD_CONST const; conditional jump */
             /* Also optimize LOAD_CONST(small_int) + BINARY_ADD */
+            /* Also optimize LOAD_CONST(small_int) + LOAD_NAME + BINARY_ADD*/
             case LOAD_CONST:
             {
                 PyObject* cnt;
@@ -6904,6 +6906,23 @@ optimize_basic_block(basicblock *bb, PyObject *consts)
                             }
                         }
                         break;
+                    case LOAD_NAME:
+                        if (nextnextop != BINARY_ADD) {
+                            break;
+                        }
+                        cnt = PyList_GET_ITEM(consts, oparg);
+                        if (PyLong_CheckExact(cnt) &&
+                            inst->i_lineno == bb->b_instr[i+2].i_lineno) {
+                            int ovf = 0;
+                            long val = PyLong_AsLongAndOverflow(cnt, &ovf);
+                            if (ovf == 0 && val >= 0 && val < 256) {
+                                bb->b_instr[i+2].i_opcode = ADD_INT;
+                                bb->b_instr[i+2].i_oparg = val;
+                                inst->i_opcode = NOP;
+                                break;
+                            }
+                        }
+                        break;
                 }
                 break;
             }

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

sweeneyde · 2021-04-04T03:59:48Z

Idea: interpret the ADD_INT oparg as a signed int in [-128..127] to get a two-behaviors-for-one-opcode deal, with an equally-fast fast-path:

PyObject *left = TOP();
PyObject *result, *right;
int signed_oparg = (int8_t)oparg;
if (PyLong_CheckExact(left)) {
    result = _PyLong_AddInt((PyLongObject *)left, signed_oparg);
    Py_DECREF(left);
    SET_TOP(result);
    if (result == NULL) {
        goto error;
    }
    DISPATCH();
}
if (signed_oparg >= 0) {
    right = PyLong_FromLongLong(signed_oparg);
    if (right == NULL) {
        goto error;
    }
    result = PyNumber_Add(left, right);
}
else {
    right = PyLong_FromLongLong(-signed_oparg);
    if (right == NULL) {
        goto error;
    }
    result = PyNumber_Subtract(left, right);
}
Py_DECREF(left);
Py_DECREF(right);
SET_TOP(result);
if (result == NULL) {
    goto error;
}
DISPATCH();

Then you could compile x - 10 as ADD_INT -10. Technically couldn't optimize x - 0, but I would imagine that is uncommon.

But I also imagine that being able to optimize things like while i <= n - 1: could be nice.

gvanrossum · 2021-04-04T15:29:34Z

Clever, I will look whether this makes more occurrences.

gvanrossum · 2021-04-05T20:51:04Z

Well, I think I'm going to close this PR. I haven't found any example programs that execute a significant number of ADD_INT operations (measured at runtime this time). The typical percentage seems around 0.5%. Thanks to everyone who thought of refinements:

(@Fidget-Spinner) Also do this for "1 + x" -- that would not be correct if x isn't an int (or float), since the order in which __add__ and __radd__ are tried matters. For a crude example, the error messages for ""+1 and 1+"" are different.
(@sweeneyde) Not a bad idea (and you handle the fallback case correctly) but I doubt this would pull this over the line.

I have some similar ideas for if x is None and if x is not None but I'll do some more research on how many of these I can expect to execute first...

vstinner · 2021-04-06T12:54:12Z

Well, I think I'm going to close this PR. I haven't found any example programs that execute a significant number of ADD_INT operations (measured at runtime this time). The typical percentage seems around 0.5%.

Sorry for you. So my comment remains relevant :-)

A good approach to optimize int+int is to use Cython or anything else to specialize a function for int+int operations, and implement the addition in C rather than using Python bytecode. The bytecode evaluation loop cost is too high compared to the cost of doing int+int. PyPy is able to specialize the code using C number types and handles integer overflow (the tricky part).

Another idea to experiment is to add a cache for the "virtual table lookup". Something like https://bugs.python.org/issue14757 "INCA: Inline Caching meets Quickening in Python 3.3". One part of the PyNumber_Add() cost comes from binary_op1() which needs to look into the type of both objects to decide which function should be called. The ceval.c "OPCACHE" (per code object opcode cache) might be used, but I don't know if it can be modified for this specific optimization.

gvanrossum · 2021-04-06T16:36:40Z

A good approach to optimize int+int is to use Cython or anything else to specialize a function for int+int operations, and implement the addition in C rather than using Python bytecode.

I don't understand what you're saying. Are you saying that if you have a lot of additions you're better off using Cython etc.? I can't argue with that (and numpy comes to mind :-).

The bytecode evaluation loop cost is too high compared to the cost of doing int+int. PyPy is able to specialize the code using C number types and handles integer overflow (the tricky part).

It wasn't the eval loop cost. In a micro-benchmark I saw a considerable speedup for ADD_INT compared to LOAD_CONST + BINARY_ADD. It's just that in most code x+1 just doesn't occur frequently enough to make a difference.

I'm aware of the Quickening approach and we'll probably do something along these lines eventually.

vstinner · 2021-04-06T16:58:57Z

Are you saying that if you have a lot of additions you're better off using Cython etc.?

Yes, it's an existing solution until CPython is optimized. Sadly, using Cython requires to change the code.

github-actions · 2021-06-03T00:22:19Z

This PR is stale because it has been open for 30 days with no activity.

gvanrossum · 2021-12-15T01:09:57Z

This PR was abandoned but not closed. Hopefully @brandtbucher can sort out what we should do -- close it, merge it, or do something else.

kumaraditya303 · 2022-08-01T17:30:04Z

ceval.c has specialized bytecode for int operations now so isn't needed anymore, can we close it now?

gvanrossum · 2022-08-01T20:36:54Z

Yeah, this is superseded by more recent changes.

gvanrossum added 5 commits March 29, 2021 11:18

Add INT_ADD opcode

bab6b8e

Move the code that reaches into longs representation out of ceval.c

8b7e016

Make it compile on Mac

6041514

Bump magic number

09407e8

Regenerated Python/importlib_external.h

c918e60

gvanrossum requested a review from markshannon March 30, 2021 00:08

the-knights-who-say-ni added the CLA signed label Mar 30, 2021

bedevere-bot added the awaiting core review label Mar 30, 2021

gvanrossum mentioned this pull request Mar 30, 2021

Super-instructions faster-cpython/ideas#16

Closed

Don't optimize across lines

f5e1f75

willingc reviewed Mar 30, 2021

View reviewed changes

Include/longobject.h Outdated Show resolved Hide resolved

corona10 reviewed Mar 30, 2021

View reviewed changes

Python/ceval.c Show resolved Hide resolved

markshannon reviewed Mar 30, 2021

View reviewed changes

Lib/opcode.py Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Python/compile.c Show resolved Hide resolved

Rename INT_ADD to ADD_INT

a95a0cb

gvanrossum added 2 commits March 30, 2021 12:25

Use Mark's suggestion in _PyLong_AddInt, it is more correct

c3276da

Fix test_dis.py

195e532

gvanrossum changed the title ~~Add specialized INT_ADD opcode~~ Add specialized opcodes Mar 30, 2021

gvanrossum commented Mar 31, 2021

View reviewed changes

markshannon reviewed Mar 31, 2021

View reviewed changes

gvanrossum mentioned this pull request Mar 31, 2021

Combined instructions faster-cpython/ideas#36

Closed

gvanrossum force-pushed the int-add branch from 86d227c to 195e532 Compare March 31, 2021 21:15

gvanrossum added a commit to faster-cpython/cpython that referenced this pull request Mar 31, 2021

Add POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE opcodes

e8e979f

This builds on top of python#25090

vstinner reviewed Apr 1, 2021

View reviewed changes

Python/ceval.c Show resolved Hide resolved

gvanrossum added 2 commits April 2, 2021 15:25

Add script to count static occurrences of ADD_INT

01af55b

Delete Victor's comment for BINARY_ADD

6807bb2

methane reviewed Apr 3, 2021

View reviewed changes

willingc reviewed Apr 3, 2021

View reviewed changes

gvanrossum and others added 5 commits April 3, 2021 13:40

Update blurb

1199dac

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

Remove redundant text from blurb

8778f68

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

Tweak comment

a55bf88

Move _PyLong_AddInt() to pycore_long.h

23aae2d

Merge remote-tracking branch 'origin/master' into int-add

3136591

Improve count_opcodes.py script slightly

2612164

github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 3, 2021

gvanrossum mentioned this pull request Dec 15, 2021

Get the add-opcodes branch ready to merge upstream. faster-cpython/ideas#185

Closed

3 tasks

ezio-melotti removed the CLA signed label Jul 13, 2022

kumaraditya303 added pending The issue will be closed if no feedback is provided and removed stale Stale PR or inactive for long period of time. labels Aug 1, 2022

gvanrossum closed this Aug 1, 2022

gvanrossum deleted the int-add branch August 1, 2022 20:37

gvanrossum mentioned this pull request Sep 10, 2022

Add combined opcodes #87850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-43684: Add ADD_INT opcode #25090

bpo-43684: Add ADD_INT opcode #25090

gvanrossum commented Mar 30, 2021 •

edited

Loading

Fidget-Spinner commented Mar 30, 2021 •

edited

Loading

gvanrossum commented Mar 30, 2021

gvanrossum commented Mar 30, 2021

gvanrossum Mar 31, 2021

markshannon Mar 31, 2021

vstinner Mar 31, 2021

markshannon commented Mar 31, 2021

markshannon Mar 31, 2021

gvanrossum commented Mar 31, 2021

smontanaro commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

methane Apr 3, 2021

gvanrossum Apr 3, 2021

Fidget-Spinner commented Apr 3, 2021

sweeneyde commented Apr 4, 2021 •

edited

Loading

gvanrossum commented Apr 4, 2021

gvanrossum commented Apr 5, 2021

vstinner commented Apr 6, 2021

gvanrossum commented Apr 6, 2021

vstinner commented Apr 6, 2021

github-actions bot commented Jun 3, 2021

gvanrossum commented Dec 15, 2021

kumaraditya303 commented Aug 1, 2022

gvanrossum commented Aug 1, 2022

@@ @@ -6910,14 +6910,14 @@ optimize_basic_block(basicblock *bb, PyObject *consts) @@
                                           }
                                       }
                                       break;
-                                  # if 0
+                                  # if 1

bpo-43684: Add ADD_INT opcode #25090

bpo-43684: Add ADD_INT opcode #25090

Conversation

gvanrossum commented Mar 30, 2021 • edited Loading

TODO

Fidget-Spinner commented Mar 30, 2021 • edited Loading

gvanrossum commented Mar 30, 2021

gvanrossum commented Mar 30, 2021

gvanrossum Mar 31, 2021

Choose a reason for hiding this comment

markshannon Mar 31, 2021

Choose a reason for hiding this comment

vstinner Mar 31, 2021

Choose a reason for hiding this comment

markshannon commented Mar 31, 2021

markshannon Mar 31, 2021

Choose a reason for hiding this comment

gvanrossum commented Mar 31, 2021

smontanaro commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

methane Apr 3, 2021

Choose a reason for hiding this comment

gvanrossum Apr 3, 2021

Choose a reason for hiding this comment

Fidget-Spinner commented Apr 3, 2021

sweeneyde commented Apr 4, 2021 • edited Loading

gvanrossum commented Apr 4, 2021

gvanrossum commented Apr 5, 2021

vstinner commented Apr 6, 2021

gvanrossum commented Apr 6, 2021

vstinner commented Apr 6, 2021

github-actions bot commented Jun 3, 2021

gvanrossum commented Dec 15, 2021

kumaraditya303 commented Aug 1, 2022

gvanrossum commented Aug 1, 2022

gvanrossum commented Mar 30, 2021 •

edited

Loading

Fidget-Spinner commented Mar 30, 2021 •

edited

Loading

sweeneyde commented Apr 4, 2021 •

edited

Loading