Single instruction "superinstructions" #585

markshannon · 2023-05-19T10:49:09Z

Superinstructions can be problematic, as they break the one-instruction-at-a-time model that instrumentation relies on, and are likely to cause problems in optimizers that make the same one-instruction-at-a-time assumption.

Instrumentation rewrites all superinstructions back to the simpler form.
Super instructions also prevent specialization, so superinstructions cannot include any specializable instruction.
#584 proposes a specialization of LOAD_CONST, so that leaves only LOAD_FAST and STORE_FAST to be combined.

If we remove LOAD_CONST we have only:

super(LOAD_FAST__LOAD_FAST) = LOAD_FAST + LOAD_FAST;
super(STORE_FAST__LOAD_FAST)  = STORE_FAST + LOAD_FAST;
super(STORE_FAST__STORE_FAST) = STORE_FAST + STORE_FAST;

That is still quite a lot of instructions, dynamically, so we don't want to just remove them.
Instead of removing them, we can combine them into a single instruction.
Given that most locals will have a index in range(16) we combine the operations into a single instruction.

inst(LOAD_FAST_LOAD_FAST_COMPACT, ( -- val1, val2)) {
    val1 = GETLOCAL(oparg&15);
    val1 = GETLOCAL(oparg>>4);
    Py_INCREF(val1);
    Py_INCREF(val2);
}

There will be fewer LOAD_FAST_LOAD_FAST_COMPACT than LOAD_FAST__LOAD_FAST, but they should be a little faster (fewer memory reads). I would expect the performance to be about the same.

The text was updated successfully, but these errors were encountered:

brandtbucher · 2023-05-19T19:13:51Z

Just a note from our previous discussions for anyone else reading: this is only legal if both instructions are on the same line. When I tried a similar idea out before, this caused almost all of the STORE_FAST__LOAD_FAST superinstructions to disappear (since they basically always span more than one line).

markshannon · 2023-05-22T10:43:04Z

Instructions don't really have lines, effects do. Since LOAD_FAST has no observable side effects, it can be moved from line to line.
Consider

a = 1
return b

which compiles to:

1     LOAD_CONST               1 (1)
      STORE_FAST               0 (a)

2     LOAD_FAST                1 (b)
      RETURN_VALUE

Which implies that we cannot merge the STORE_FAST and LOAD_FAST. But as LOAD_FAST has no observable side effects, this is a legal translation:

1     LOAD_CONST               1 (1)
      STORE_FAST               0 (a)

      LOAD_FAST                1 (b)
2     RETURN_VALUE

Which can be transformed to:

1     LOAD_CONST               1 (1)
      STORE_FAST_LOAD_FAST_COMPACT  0 (a), 1 (b)

2     RETURN_VALUE

markshannon · 2023-06-02T09:28:46Z

Unfortunately, having superinstructions straddling lines prevents jumping to the line in question in a debugger, as the stack is inconsistent, which rules out the transformation above.

However, it appears that the reduction in code size more than compensates.

carljm · 2023-06-02T20:49:43Z

which rules out the transformation above.

Does this mean that in general we have to consider the state of the value stack as an "observable side effect" when making optimizations in the compiler? That seems new (relative to my previous understanding of what was safe to optimize), and unfortunate.

brandtbucher · 2023-06-02T20:56:24Z

Well, I would assume that jumping to a line would jump just prior to execution of every instruction on that line. It would be weird, I think, if a local variable load or store didn't happen after jumping to a line and and stepping through it.

brandtbucher · 2023-06-02T21:00:10Z

In the example above, I would expect that if I had two lines foo = 42 and return foo, jumping over the first line would return the prior value of foo (or None if unbound). You think that frame_setlineno should be allowed to refuse the jump?

carljm · 2023-06-02T21:27:35Z

I'm not proposing anything in particular; frame_setlineno is code I hadn't looked at before and didn't realize existed :) I've never used the capability to jump directly to a line in a debugger. So I'm just newly wrapping my head around the implications for optimization (and thinking through whether I know of any existing compiler optimizations that could already break this.)

It seems this does mean that LOAD_FAST (or anything that pushes to stack) isn't generally safe to shift to a different line. So for example this seems to invalidate the proposed superoptimizer design in python/cpython#102869

brandtbucher · 2023-06-02T21:42:17Z

frame_setlineno is code I hadn't looked at before and didn't realize existed :)

Lucky! ;)

iritkatriel · 2023-06-02T21:51:24Z

Do we know anyone who uses this evil function?

JelleZijlstra · 2023-06-02T21:59:03Z

Here's a third-party debugger that uses it: https://github.com/web2py/web2py/blob/7685d373474378e93132f8916145fb11f84cec71/gluon/contrib/dbg.py#L307. pdb also supports it (https://github.com/python/cpython/blob/e01b04c9075c6468ed57bc883693ec2a06a6dd8e/Lib/pdb.py#L1202).

Seems like we should seriously consider deprecating support for this feature though; it feels very rarely useful and very hard to support.

brandtbucher · 2023-06-02T21:59:08Z

I mean, I'm pretty sure every Python debugger has a jump command, so we'd probably hear from them first if we removed it.

If we convinced them to drop the functionality too, then maybe we'd find out how many users are really relying on it. There are at least a handful that have filed various bug reports for CPython over the years.

I'm stuck somewhere between "I can see how it could be useful" and "it's too much pain to maintain and doesn't always work"... though, I have admittedly used jump commands in pdb a few times since learning about them (mostly if I mess up my debugging session somehow and don't want to re-run everything, but also sometimes to take a different branch or something).

iritkatriel · 2023-06-02T22:02:22Z

I know debuggers support it. But do we know of any people who use it?

brandtbucher · 2023-06-02T22:27:44Z

Not counting people who use the feature via debuggers?

I mean, you can only do it from within a trace function, so probably just a handful of people writing toys like this:

import sys

def goto(line: int) -> None:
    def jumper(frame, event, arg) -> None:
        if event == "line":
            frame.f_lineno = line
            frame.f_trace = None
    frame = sys._getframe(1)
    frame.f_trace = jumper
    sys.settrace(jumper)

# Prints "B" in an infinite loop:
goto(15)    # 13
print("A")  # 14
print("B")  # 15
goto(13)    # 16
print("C")  # 17

gvanrossum · 2023-06-02T23:20:02Z

There are already many complicated prohibitions in the f_lineno setattr -- there is no guarantee that it will work. So I don't believe it ought to be used as a reason to skip line-fusing optimizations.

Additionally, stepping through code already frequently jumps back and forth in ways that are hard to grasp intuitively. Again, if optimizations cause the sequence of observed "line events" to change, that feels like a minor inconvenience at best. (And even though PEP 626 claims otherwise, its motivation feels arbitrary to me and its specification is rather vague: "Line events and the f_lineno attribute should act as an experienced Python user would expect.")

markshannon · 2023-06-13T10:35:04Z

Let's not remove jumping in debuggers, or other features, that make our work more ~~difficult~~ interesting.

Many features that prevent optimizations in the bytecode compiler can be handled in the tier 1 optimizer.
Many features that prevent optimizations in the tier 1 optimizer, can be handled in the tier 2 optimizer.
In the tier 2 optimizer any call to a builtin function is a potential deoptimization. A single call to PyObject_GetAttr can do anything, so it makes almost no difference what features are available.

markshannon mentioned this issue Jun 2, 2023

Remove superinstructions python/cpython#105229

Closed

markshannon mentioned this issue Jun 2, 2023

GH-105229: Replace some superinstructions with single instruction equivalent. python/cpython#105230

Merged

carljm mentioned this issue Jun 2, 2023

Superoptimize sequences of instructions without observable side-effects in the bytecode compiler. python/cpython#102869

Open

markshannon closed this as completed Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single instruction "superinstructions" #585

Single instruction "superinstructions" #585

markshannon commented May 19, 2023 •

edited

Loading

brandtbucher commented May 19, 2023

markshannon commented May 22, 2023

markshannon commented Jun 2, 2023

carljm commented Jun 2, 2023

brandtbucher commented Jun 2, 2023 •

edited

Loading

brandtbucher commented Jun 2, 2023

carljm commented Jun 2, 2023 •

edited

Loading

brandtbucher commented Jun 2, 2023

iritkatriel commented Jun 2, 2023

JelleZijlstra commented Jun 2, 2023

brandtbucher commented Jun 2, 2023 •

edited

Loading

iritkatriel commented Jun 2, 2023

brandtbucher commented Jun 2, 2023

gvanrossum commented Jun 2, 2023

markshannon commented Jun 13, 2023

Single instruction "superinstructions" #585

Single instruction "superinstructions" #585

Comments

markshannon commented May 19, 2023 • edited Loading

brandtbucher commented May 19, 2023

markshannon commented May 22, 2023

markshannon commented Jun 2, 2023

carljm commented Jun 2, 2023

brandtbucher commented Jun 2, 2023 • edited Loading

brandtbucher commented Jun 2, 2023

carljm commented Jun 2, 2023 • edited Loading

brandtbucher commented Jun 2, 2023

iritkatriel commented Jun 2, 2023

JelleZijlstra commented Jun 2, 2023

brandtbucher commented Jun 2, 2023 • edited Loading

iritkatriel commented Jun 2, 2023

brandtbucher commented Jun 2, 2023

gvanrossum commented Jun 2, 2023

markshannon commented Jun 13, 2023

markshannon commented May 19, 2023 •

edited

Loading

brandtbucher commented Jun 2, 2023 •

edited

Loading

carljm commented Jun 2, 2023 •

edited

Loading

brandtbucher commented Jun 2, 2023 •

edited

Loading