Skip to content

Move const folding to the peephole optimizer #126835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Eclips4 opened this issue Nov 14, 2024 · 122 comments
Open

Move const folding to the peephole optimizer #126835

Eclips4 opened this issue Nov 14, 2024 · 122 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@Eclips4
Copy link
Member

Eclips4 commented Nov 14, 2024

Feature or enhancement

Proposal:

For additional context see #126830 (comment).

Flow graph optimizer has more information and can do better job.

The problem is that we need to convert from UNARY_OP(-, CONST(1)) to CONST(-1), still before the code generation phase, because this leads to a few problems, one of which is shown below.

x = 1

match x:
    case -0:
        y = 0
eclips4@nixos ~/p/p/cpython (remove-ast-optimizer)> ./python example.py
  File "/home/eclips4/programming/programming-languages/cpython/example.py", line 4
    case -0:
         ^^
SyntaxError: patterns may only match literals and attribute lookups

cc @markshannon

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@Eclips4 Eclips4 added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 14, 2024
@Eclips4 Eclips4 self-assigned this Nov 15, 2024
@Eclips4
Copy link
Member Author

Eclips4 commented Dec 21, 2024

We had a discussion with @tomasr8 and decided that splitting this into parts would be good. Initially, I tried to do it in one PR, but realized that it would be impossible to review, as it would do a lot of unrelated things.

To begin, we decided to start by moving tuple folding from AST optimizer to CFG.

Problem: Currently compiler emits a warning if is/is not operators are used with constants1 (because it can evaluate them to True which is an implementation detail of CPython, other implementations might evaluate const is const to False).
So, the compiler (especially the codegen part) assumes that tuples are already folded, and has (in AST terms) Constant_kind. This breaks when we fold tuples later and they have a Tuple_kind at the codegen phase.

Possible solution: Perform this in a CFG after folding. This will be slower (I have no information on how slower it will be, but I assume it will) than performing this in codegen phase.

Footnotes

  1. https://github.com/python/cpython/blob/2a66dd33dfc0b845042da9bb54aaa4e890733f54/Python/codegen.c#L1678-L1691

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

Is there a plan what parts/in what order should be moved from ast optimizer to cfg? Does it make sense to create sub issues to make it more granular (instead of just PRs linked to this issue)? This issue Remove the AST optimizer seems too general and broad IMO.

Eclips4 added a commit that referenced this issue Feb 1, 2025
…en to CFG (#129426)

Codegen phase has an optimization that transforms
```
LOAD_CONST x
LOAD_CONST y
LOAD_CONXT z
BUILD_LIST/BUILD_SET (3)
```
->
```
BUILD_LIST/BUILD_SET (0)
LOAD_CONST (x, y, z)
LIST_EXTEND/SET_UPDATE 1
```
This optimization has now been moved to CFG phase to make #128802 work.


Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-authored-by: Yan Yanchii <yyanchiy@gmail.com>
@tomasr8
Copy link
Member

tomasr8 commented Feb 1, 2025

I'm currently looking into moving unaryop/binop folding (though I'm at Fosdem atm so It might take a bit longer) Anything else is up for grabs I'd say, unless Kirill is working on something currently?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

I am looking into binop folding as well (almost done). Do you want to pass it or should I take something else?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 1, 2025

Problem with binop I discovered is that after moving it to cfg, we, for instance, run into problem with case matching which @Eclips4 mentioned in issue description. For example:

match x:
    case 0 + 0j:
        pass

raises

File "/Users/yyanchii/Desktop/cpython/internal/t.py", line 2
    case 0 + 0j:
         ^^^^^^
SyntaxError: patterns may only match literals and attribute lookups

It is raised from:

cpython/Python/codegen.c

Lines 6022 to 6025 in 89fe067

if (!MATCH_VALUE_EXPR(value)) {
const char *e = "patterns may only match literals and attribute lookups";
return _PyCompile_Error(c, LOC(p), e);
}

So at codegen stage, it already expects (in this case) binops to be folded. @tomasr8 did you run into this already? We would need to add more logic in here which doesn't look good to me. On the other hand, how would we move binop folding to CFG then?

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 2, 2025

@tomasr8 @WolframAlph

I think the current priority is to move unaryop folding from AST optimizer to codegen phase.
I'm looking into it.

Then we should have to think about binaryop folding :)

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 2, 2025

Placing this here, not to forget in future: #129568 (comment). Maybe in future we get rid of emitting LOAD_SMALL_INT in codegen and let CFG do it instead

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 3, 2025

cc @iritkatriel
We need to decide on what to do with the optimize parameter of ast.parse and the PyCF_OPTIMIZED_AST flag.
If we remove the AST optimizer, these two will become obsolete.
Should we deprecate them in 3.14 (after the AST optimizer is removed) and remove them in 3.19?

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 3, 2025

@Eclips4 as I am reading https://docs.python.org/3/library/ast.html#ast.parse, it is not clear to me whether feature_version also works for constant folding case.

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 3, 2025

But I guess Best-effort part makes it clear that one cannot rely on the consistent output.

@WolframAlph
Copy link
Contributor

Sorry, ignore previous comments. I just realized it is controlled by optimize argument.

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 3, 2025

Sorry, ignore previous comments. I just realized it is controlled by optimize argument.

I was also confused after reading "best effort", but I strongly feel that it does not have anything to do with constant folding.

@iritkatriel
Copy link
Member

cc @iritkatriel We need to decide on what to do with the optimize parameter of ast.parse and the PyCF_OPTIMIZED_AST flag. If we remove the AST optimizer, these two will become obsolete. Should we deprecate them in 3.14 (after the AST optimizer is removed) and remove them in 3.19?

Yes, or just deprecate now and not worry about scheduling removal. We'll just document that they don't do anything anymore.

Eclips4 pushed a commit that referenced this issue Feb 4, 2025
Move folding of constant subscription from AST optimizer to CFG.

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
@tomasr8
Copy link
Member

tomasr8 commented Feb 4, 2025

I'm just now reading your messages - @WolframAlph I checked your binop PR and it's pretty much identical to what I had 👍
I don't know what to do about the problem with match..case though :/

@WolframAlph
Copy link
Contributor

WolframAlph commented Feb 4, 2025

@Eclips4 @tomasr8 I have noticed that CFG already does tuple folding:

cpython/Python/flowgraph.c

Lines 1357 to 1395 in 0664c1a

static int
fold_tuple_on_constants(PyObject *const_cache,
cfg_instr *inst,
int n, PyObject *consts)
{
/* Pre-conditions */
assert(PyDict_CheckExact(const_cache));
assert(PyList_CheckExact(consts));
assert(inst[n].i_opcode == BUILD_TUPLE);
assert(inst[n].i_oparg == n);
if (!is_constant_sequence(inst, n)) {
return SUCCESS;
}
/* Buildup new tuple of constants */
PyObject *newconst = PyTuple_New(n);
if (newconst == NULL) {
return ERROR;
}
for (int i = 0; i < n; i++) {
int op = inst[i].i_opcode;
int arg = inst[i].i_oparg;
PyObject *constant = get_const_value(op, arg, consts);
if (constant == NULL) {
return ERROR;
}
PyTuple_SET_ITEM(newconst, i, constant);
}
int index = add_const(newconst, consts, const_cache);
if (index < 0) {
return ERROR;
}
for (int i = 0; i < n; i++) {
INSTR_SET_OP0(&inst[i], NOP);
}
INSTR_SET_OP1(&inst[n], LOAD_CONST, index);
return SUCCESS;
}

So folding it in ast optimizer:

cpython/Python/ast_opt.c

Lines 558 to 568 in 0664c1a

static int
fold_tuple(expr_ty node, PyArena *arena, _PyASTOptimizeState *state)
{
PyObject *newval;
if (node->v.Tuple.ctx != Load)
return 1;
newval = make_const_tuple(node->v.Tuple.elts);
return make_const(node, newval, arena);
}

can be just removed and CFG is already prepared for that. Indeed, when I remove tuple folding from ast optimizer, I still get folded tuple when I dis some constant tuples. So CFG already handles that.

@Eclips4
Copy link
Member Author

Eclips4 commented Feb 4, 2025

@Eclips4 @tomasr8 I have noticed that CFG already does tuple folding

Yes, that's true. You can check #128802, which essentially removes the fold_tuple from AST optimizer. However, there are still a couple of issues that I'll resolve soon.

@tomasr8
Copy link
Member

tomasr8 commented Feb 16, 2025

We'd need to be able to dynamically resize the basic block while running the optimizations

Or creating new block with enough space to optimize and then re-link old and new. Also ugly IMO.

Yeah either way I think it adds a lot of complexity for little benefit.. but I guess that's fine, not everything has to necessarily go
to flowgraph

@WolframAlph
Copy link
Contributor

WolframAlph commented Mar 10, 2025

I would like to address how we are searching & constructing constant sequences for constant foldings. Currently we collect constant sequence with:

cpython/Python/flowgraph.c

Lines 1354 to 1386 in 990ad27

static int
get_constant_sequence(basicblock *bb, int start, int size,
PyObject *consts, PyObject **seq)
{
assert(start < bb->b_iused);
*seq = NULL;
PyObject *res = PyTuple_New((Py_ssize_t)size);
if (res == NULL) {
return ERROR;
}
for (; start >= 0 && size > 0; start--) {
cfg_instr *instr = &bb->b_instr[start];
if (instr->i_opcode == NOP) {
continue;
}
if (!loads_const(instr->i_opcode)) {
break;
}
PyObject *constant = get_const_value(instr->i_opcode, instr->i_oparg, consts);
if (constant == NULL) {
Py_DECREF(res);
return ERROR;
}
PyTuple_SET_ITEM(res, --size, constant);
}
if (size > 0) {
Py_DECREF(res);
}
else {
*seq = res;
}
return SUCCESS;
}

Benefit is (arguably) that we need single pass and the code is simpler, but the drawback is that regardless of whether it will or it will not be a constant sequence, we always pre-allocate tuple of requested size. Considering that most of the cases it won't be a constant sequence, this seems to be quite wasteful.

This becomes even more concerning when it comes to (for example) binary operations. Now, this would pre-allocate tuple for every single binary operation existing, regardless of whether it is constant or not. Splitting this logic into 2 parts for 1: counting and 2: constructing resulting tuple would improve the situation a lot I believe. But there is another thing to consider.

Some of the foldings directly use resulting tuple as final constant, but some (binop, unaryop) use it only to get their operands and then deallocate it anyway, which also seems to be quite unnecessary (even if the event of folding constant is much less likely than otherwise).

Therefore I would like to propose to change the way how we search for constant sequence. Instead, we could do single pass to collect indices of instructions that load constants, something like this:

static bool
get_const_instruction_sequence(basicblock *bb, int start, int size, int *indices)
{
    assert(start < bb->b_iused);
    assert(size <= STACK_USE_GUIDLINE);

    for (; start >= 0 && size > 0; start--) {
        cfg_instr *instr = &bb->b_instr[start];
        if (instr->i_opcode == NOP) {
            continue;
        }
        if (!loads_const(instr->i_opcode)) {
            return false;
        }
        indices[--size] = start;
    }

    return size == 0;
}

And calling function would need to do something like this (binop for example):

int indices[2];
if (!get_const_instruction_sequence(bb, i-1, 2, indices)) {
    return SUCCESS;
}

Calling function would be responsible to make sure it calls get_const_instruction_sequence with correct size and enough space for indices. The obvious problem with this is that for tuples/lists/sets we don't know how many consts it would consume. But considering that codegen stage limits subsequent stack pushes to STACK_USE_GUIDLINE, we know that constant sequences beyond this length are impossible. Therefore all variable length constant foldings would need to return early if sequence_length > STACK_USE_GUIDLINE and create int indices[STACK_USE_GUIDLINE] to make sure all constants fit in if sequence_length < STACK_USE_GUIDLINE.

As a result, we would construct constant tuples only when necessary. This would also make noping out instructions faster as we already know indices and don't have to make unnecessary iterations as we do now. WDYT @iritkatriel @Eclips4 @tomasr8 ?

@iritkatriel
Copy link
Member

Makes sense to me.

@WolframAlph
Copy link
Contributor

Little update. All expressions foldings (except for string formatting and __debug__) are now happening in the peepholer.

@WolframAlph
Copy link
Contributor

WolframAlph commented Mar 25, 2025

@iritkatriel what is left to be done on this? One thing coming to my mind is cleaning up test cases, as there are cases IIRC when different tests test the same thing/some outdated tests etc... But not sure if it is related to this task.

@iritkatriel
Copy link
Member

Cleaning up the tests is a good idea, we can do it on this issue.

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@WolframAlph
Copy link
Contributor

Tests cleanup PR: #131826

@WolframAlph
Copy link
Contributor

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

@iritkatriel
Copy link
Member

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

Which ones? There is some stuff that is about optimization, so case by case basis.

@WolframAlph
Copy link
Contributor

Another thing is renaming ast_opt.c and _PyAST_Optimize, because they are not really about optimization.
Maybe something like _PyAST_PreCompile or _PyAST_PostProcess?

@iritkatriel I suppose all structs/functions using ast optimize naming should be renamed as well then?

Which ones? There is some stuff that is about optimization, so case by case basis.

For instance:

cpython/Python/ast_opt.c

Lines 18 to 26 in 3796884

typedef struct {
PyObject *filename;
int optimize;
int ff_features;
int syntax_check_only;
_Py_c_array_t cf_finally; /* context for PEP 678 check */
int cf_finally_used;
} _PyASTOptimizeState;

/* AST optimizations */
extern int _PyCompile_AstOptimize(
struct _mod *mod,
PyObject *filename,
PyCompilerFlags *flags,
int optimize,
struct _arena *arena,
int syntax_check_only);
extern int _PyAST_Optimize(
struct _mod *,
struct _arena *arena,
PyObject *filename,
int optimize,
int ff_features,
int syntax_check_only);

@iritkatriel
Copy link
Member

Those can be renamed.

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 1, 2025
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
Eclips4 pushed a commit that referenced this issue May 4, 2025
… after moving const folding to the peephole optimizier (#131830)
diegorusso added a commit to diegorusso/cpython that referenced this issue May 4, 2025
* origin/main: (111 commits)
  pythongh-91048: Add filename and line number to external inspection routines (pythonGH-133385)
  pythongh-131178: Add tests for `ast` command-line interface (python#133329)
  Regenerate pcbuild.sln in Visual Studio 2022 (python#133394)
  pythongh-133042: disable HACL* HMAC on Emscripten (python#133064)
  pythongh-133351: Fix remote PDB's multi-line block tab completion (python#133387)
  pythongh-109700: Improve stress tests for interpreter creation (pythonGH-109946)
  pythongh-81793: Skip tests for os.link() to symlink on Android (pythonGH-133388)
  pythongh-126835: Rename `ast_opt.c` to `ast_preprocess.c` and related stuff after moving const folding to the peephole optimizier (python#131830)
  pythongh-91048: Relax test_async_global_awaited_by to fix flakyness (python#133368)
  pythongh-132457: make staticmethod and classmethod generic (python#132460)
  pythongh-132805: annotationlib: Fix handling of non-constant values in FORWARDREF (python#132812)
  pythongh-132426: Add get_annotate_from_class_namespace replacing get_annotate_function (python#132490)
  pythongh-81793: Always call linkat() from os.link(), if available (pythonGH-132517)
  pythongh-122559: Synchronize C and Python implementation of the io module about pickling (pythonGH-122628)
  pythongh-69605: Add PyREPL import autocomplete feature to 'What's New' (python#133358)
  bpo-44172: Keep reference to original window in curses subwindow objects (pythonGH-26226)
  pythonGH-133231: Changes to executor management to support proposed `sys._jit` module (pythonGH-133287)
  pythongh-133363: Fix Cmd completion for lines beginning with `! ` (python#133364)
  pythongh-132983: Introduce `_zstd` bindings module (pythonGH-133027)
  pythonGH-91048: Add utils for printing the call stack for asyncio tasks (python#133284)
  ...
@Eclips4
Copy link
Member Author

Eclips4 commented May 5, 2025

I guess the last thing we should do is add a "What's New" entry about this and then we can close this. 🚀

@WolframAlph
Copy link
Contributor

@Eclips4 does this change deserve having "What's New" entry? Seems like not very interesting to the end user. Or am I wrong?

@Eclips4
Copy link
Member Author

Eclips4 commented May 5, 2025

@Eclips4 does this change deserve having "What's New" entry? Seems like not very interesting to the end user. Or am I wrong?

I guess some users may have been relying on the behavior of the previous ast optimizer, but now it's gone.
There was no explanation of exactly what ast.parse(optimize=1) does, but even with that some users may still have used it.
So, I think we should at least "warn" them in "What's new".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants