Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-106812: Refactor to allow uops with array stack effects #107564

Merged
merged 39 commits into from
Aug 4, 2023

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 2, 2023

This adds a new file, stacking.py, which tracks pushes and pops across the uops comprising a macro. Instruction writing for non-macro instructions is also unified with this.

The generated files look quite different, but I have carefully verified that everything works. (And usually if it doesn't, it won't even build. :-)

TODO:

  • (If possible) Clean up circular import between instructions.py and stacking.py
  • Pass Analyzer around and turn a few asserts into error messages
  • Remove redundant Analyzer.check_macro_consistency (fold into write_components)
  • Remove errors from Analyzer.stack_analysis that are no longer errors
  • Remove commented-out lines from ``Formatter.assign`
  • Change StackItem so the effect itself is included in deep/high
  • (Maybe) Change StackItem to have a StackOffset member instead of inheriting it
  • (Probably not) Change StackOffset operations to use __add__, __sub__ etc.

@gvanrossum gvanrossum changed the title Refactor to allow uops with array stack effects gh-106812: Refactor to allow uops with array stack effects Aug 2, 2023
@gvanrossum
Copy link
Member Author

gvanrossum commented Aug 3, 2023

@Fidget-Spinner I am tempted to just merge this without fixing everything I listed above -- we can improve things iteratively, and I feel it's more important to go back to gh-106581 (this started as a giant yak to shave for that). What do you think?

@Fidget-Spinner
Copy link
Member

I am tempted to just merge this without fixing everything I listed above -- we can improve things iteratively, and I feel it's more important to go back to gh-106581 (this started as a giant yak to shave for that). What do you think?

I'd prefer we work on this iteratively as well. It's easier to review that way too.

Also, surprisingly kenjin is an actual taken GH user, which isn't me, but bears the same name.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general

if vars[eff.name] != eff:
self.error(
f"Instruction {instr.name!r} has "
f"inconsistent types for variable {eff.name!r}: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inconsistent types or just inconsistent in general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it could mean that either type, cond or size is inconsistent. I'll change it.

Comment on lines 319 to 329
for name, eff in vars.items():
if name in all_vars:
if all_vars[name] != eff:
self.warning(
f"Macro {mac.name!r} has"
f"inconsistent types for variable {name!r}: "
f"{all_vars[name]} vs {eff} in {part.instr.name!r}",
mac.macro,
)
else:
all_vars[name] = eff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this block ever warn? Wouldn't it all be consistent as it's already checked in get_var_names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks for inconsistency across different instructions. E.g.

op(A, (args[oparg] --)) { ... }
op(B, (args[oparg+1] --)) { ... }
macro(M) = A + B;



@dataclasses.dataclass
class StackOffset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be represented instead as an index from the TOS, then a capture of the "stack"?

E.g. for a PEEK(1) it would be index=-2, stack=[item1, item2, item3, item4].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... That's closer to the old way of doing this, where the stack was represented (implicitly) by variables _tmp_1, _tmp_2, etc., and the stack offset of an effect was represented as being mapped to one of those variables (using the old input_mapping and output_mapping members of Component). I ran into some problems there when an effect is conditional, and managed to hack that in (but only for the output effects of the last component). But with array effects for guard instructions like we will need for CALL guards I just couldn't hack it any more, and instead I came up with this abstraction, which can handle conditional and array effects anywhere in a macro.

The only thing this cannot handle is a situation where a value is temporarily pushed onto the stack, stays there for the next uop, and then is popped later:

op(A, (-- temp)) { ... }  // stack: [] -> [temp]
op(B, (--)) { ... }       // stack: [temp] -> [temp]
op(C, (temp --)) { ... }  // stack: [temp] -> []
macro(M1) = A + C;
macro(M2) = A + B + C;

Here both M1 and M2 have a net stack effect of 0, n_popped is 0, and n_pushed is 0, and we cannot express using n_popped and n_pushed that this uses one temporary stack item. For M1 that's not a problem, because the algorithm translates the push in A and the pop in C into a copy, which doesn't require stack space. (Also, the copy disappears because the variable name is the same.) But for M2 the push and pop are not adjacent so they are not optimized away like that.

I could improve the algorithm to recognize this situation and use a copy for M2, but it's more complicated and it's unlikely that we'll need this. (Most likely in a real case there would be a result pushed onto the stack at the end, so the problem of phantom stack space wouldn't occur.) I ought to at least detect it and warn, but I'd rather do that in a future PR, since this one is complex enough as it is.

Note: the generated code for M2 clearly shows the problem:

        TARGET(M2) {
            PyObject *temp;
            // A
            {
                 ...
            }
            stack_pointer[0] = temp;
            // B
            {
                 ...
            }
            // C
            temp = stack_pointer[0];
            {
                 ...
            }
            DISPATCH();
        }

Note that stack_pointer[0] is an invalid stack item, pointing just above the current stack top. (The actual top is stack_pointer[-1].)

(stack_analysis.py was no longer being called!)
if vars[eff.name] != eff:
self.error(
f"Instruction {instr.name!r} has "
f"inconsistent types for variable {eff.name!r}: "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it could mean that either type, cond or size is inconsistent. I'll change it.

Comment on lines 319 to 329
for name, eff in vars.items():
if name in all_vars:
if all_vars[name] != eff:
self.warning(
f"Macro {mac.name!r} has"
f"inconsistent types for variable {name!r}: "
f"{all_vars[name]} vs {eff} in {part.instr.name!r}",
mac.macro,
)
else:
all_vars[name] = eff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks for inconsistency across different instructions. E.g.

op(A, (args[oparg] --)) { ... }
op(B, (args[oparg+1] --)) { ... }
macro(M) = A + B;

@@ -371,7 +439,7 @@ def stack_analysis(
eff.size for eff in instr.input_effects + instr.output_effects
):
# TODO: Eventually this will be needed, at least for macros.
self.error(
self.warning(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the checks in this function are no longer needed, and in fact it's not called any more, so I'm deleting it. :-)



@dataclasses.dataclass
class StackOffset:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... That's closer to the old way of doing this, where the stack was represented (implicitly) by variables _tmp_1, _tmp_2, etc., and the stack offset of an effect was represented as being mapped to one of those variables (using the old input_mapping and output_mapping members of Component). I ran into some problems there when an effect is conditional, and managed to hack that in (but only for the output effects of the last component). But with array effects for guard instructions like we will need for CALL guards I just couldn't hack it any more, and instead I came up with this abstraction, which can handle conditional and array effects anywhere in a macro.

The only thing this cannot handle is a situation where a value is temporarily pushed onto the stack, stays there for the next uop, and then is popped later:

op(A, (-- temp)) { ... }  // stack: [] -> [temp]
op(B, (--)) { ... }       // stack: [temp] -> [temp]
op(C, (temp --)) { ... }  // stack: [temp] -> []
macro(M1) = A + C;
macro(M2) = A + B + C;

Here both M1 and M2 have a net stack effect of 0, n_popped is 0, and n_pushed is 0, and we cannot express using n_popped and n_pushed that this uses one temporary stack item. For M1 that's not a problem, because the algorithm translates the push in A and the pop in C into a copy, which doesn't require stack space. (Also, the copy disappears because the variable name is the same.) But for M2 the push and pop are not adjacent so they are not optimized away like that.

I could improve the algorithm to recognize this situation and use a copy for M2, but it's more complicated and it's unlikely that we'll need this. (Most likely in a real case there would be a result pushed onto the stack at the end, so the problem of phantom stack space wouldn't occur.) I ought to at least detect it and warn, but I'd rather do that in a future PR, since this one is complex enough as it is.

Note: the generated code for M2 clearly shows the problem:

        TARGET(M2) {
            PyObject *temp;
            // A
            {
                 ...
            }
            stack_pointer[0] = temp;
            // B
            {
                 ...
            }
            // C
            temp = stack_pointer[0];
            {
                 ...
            }
            DISPATCH();
        }

Note that stack_pointer[0] is an invalid stack item, pointing just above the current stack top. (The actual top is stack_pointer[-1].)

@gvanrossum gvanrossum marked this pull request as ready for review August 3, 2023 20:50
Copy link
Member Author

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just me mumbling to myself.)

res = f"stack_pointer[{index}]"
if not lax:
# Check that we're not reading or writing above stack top.
# Skip this for output variable initialization (lax=True).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pondering if I can add a working check for writing arrays above stack level. This is tricky because output array variables are initialized before the body of the opcode, at a point where stack_pointer is still low. E.g.

        inst(UNPACK_SEQUENCE_TUPLE, (unused/1, seq -- values[oparg])) {
            ...
        }

which produces output like this:

        TARGET(UNPACK_SEQUENCE_TUPLE) {
            PyObject *seq;
            PyObject **values;
            seq = stack_pointer[-1];
            values = stack_pointer - 1;
            ...
            STACK_SHRINK(1);
            STACK_GROW(oparg);
            next_instr += 1;
            DISPATCH();
        }

The assert would trigger, because the code (...) may well write above stack_pointer, but all is well because of the following STACK_GROW(oparg). Hence the lax flag, which disables the assert for output array variables (L372 below). It's hard to conceive of a realistic example where the failing assert would not be a false positive. A theoretical example would be:

op(A, (-- temp[oparg])) { ... }
op(B, (temp[oparg] --) { ... }
macro(M) = A + B;

But I don't expect we'll ever write such code.

@gvanrossum gvanrossum merged commit 400835e into python:main Aug 4, 2023
15 checks passed
@gvanrossum gvanrossum deleted the stacking branch August 5, 2023 04:21
gvanrossum added a commit that referenced this pull request Aug 5, 2023
This fixes two tiny defects in analysis.py that I didn't catch on time in #107564:

- `get_var_names` in `check_macro_consistency` should skip `UNUSED` names.
- Fix an occurrence of `is UNUSED` (should be `==`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants