Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] register machine #100276

Closed
wants to merge 77 commits into from
Closed

[draft] register machine #100276

wants to merge 77 commits into from

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Dec 15, 2022

  • Each instruction is written to the bytecode as two code units, one for opcode + oparg1, the second for oparg2 + oparg3. (oparg2,3 are 0 for now).
  • Add a copy of the consts to the end of the fast locals array, which now consists of [localsplus, the stack, consts].
  • Register versions of the UNARY_* ops, with tmp registers added to the localsplus as "$i".

@iritkatriel
Copy link
Member Author

I have a test failing (locally, will probably fail here too) due to the per-instruction stack depth calculating dipping below 0 (there's an assertion that it doesn't).

I think it could be due to the stack effect of CALL being off by 1 - it depends on the value of is_meth which is not known to stack_effect().

@iritkatriel
Copy link
Member Author

I have a test failing (locally, will probably fail here too) due to the per-instruction stack depth calculating dipping below 0 (there's an assertion that it doesn't).

I think it could be due to the stack effect of CALL being off by 1 - it depends on the value of is_meth which is not known to stack_effect().

That's not what it was - I saw a test pass the first time and then fail the stackdepth>0 assertion only second time I ran it. Now my local build is broken and nothing seems to help (I tried distclean/clean/touching all the stdlib/bumping magic number).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found my first bug. :-)

@iritkatriel
Copy link
Member Author

The tests all pass on Mac now. On windows there's a problem with test_import's test_unencodable_filename, which runs in a subprocess.

@iritkatriel
Copy link
Member Author

The tests all pass on Mac now. On windows there's a problem with test_import's test_unencodable_filename, which runs in a subprocess.

I can reproduce this on windows, it does:

Fatal Python error: _Py_CheckSlotResult: Slot |= of type set failed without setting an exception
Python runtime state: initialized

I managed to get it to print:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 71: surrogates not allowed

@iritkatriel
Copy link
Member Author

It enters set_ior with an exception set. I don't know if that's supposed to happen.

@gvanrossum
Copy link
Member

I'm sure it's not supposed to.

@@ -2172,7 +2172,7 @@ def test_patma_204(self):
def f(w):
match w:
case 42:
out = locals()
out = {k : v for k,v in locals().items() if not k.startswith('$')}
Copy link
Member

@gvanrossum gvanrossum Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the exception for $ should be implemented in locals() rather than in its callers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know if we want to keep the $ vars. we did this because I couldn’t get it to work with a dedicated range for temps, but now we fixed it for consts so it should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want these for debugging purposes. Perhaps locals/frame.f_locals should omit them, and we add frame._f_registers (or something) which is a list of temporaries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe relevant: the unnamed iterator used in comprehensions is currently visible:

>>> [locals() for _ in [None]]
[{'.0': <tuple_iterator object at 0x7f1a741370a0>, '_': None}]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $ registers will be visible in a lot more cases than the .0 iterator, and they are a detail of the implementation that will mostly just confuse users. If we're going to pull the consistency card I'd vote for not revealing that iterator either.

I could live with the registers showing in f_locals but not in locals() -- frame attributes are more of an implementation detail anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to filter locals(), but that's no good it breaks locals()["x"] = 43.

(see test.test_scope.ScopeTests.testClassNamespaceOverridesClosure)

Copy link
Member

@markshannon markshannon Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding all these "$" variables to the locals?
We don't expose the stack, why expose the temporaries?

It looks like a considerable amount of the changes to both code and tests are supporting/working around these additional, artificial local variables.

@iritkatriel
Copy link
Member Author

iritkatriel commented Dec 30, 2022

def fac(n):
    if n == 0:
        return 1
    prod = n * fac(n-1)
    return prod

This currently becomes the code below (CALL is not a register op yet). I got this with simple peephole style cleanup of the registers (we could probably still reduce the number of register used, my peephole is just about eliminating LOAD/STORE/COPY instructions). I think this could end up being a simpler approach than re-architecting the code-gen visitors to pass inputs and outputs up and down the recursion.

  2           0 RESUME                   0

  3           4 COMPARE_OP_R             2 (==)
             14 JUMP_IF_FALSE_R          2 (to 22)

  4          18 RETURN_VALUE_R          20

  5     >>   22 LOAD_FAST_R              0 (n)
             26 LOAD_GLOBAL              1 (NULL + fac)
             40 BINARY_OP_R             10 (-)
             48 LOAD_FAST_R              9 ($7)
             52 CALL                     1                       <--- need to implement CALL_R
             64 STORE_FAST_R            11
             68 STORE_FAST_R            10
             72 BINARY_OP_R              5 (*)
             80 LOAD_FAST_R             12 ($10)                   <--- need to cancel out this with the next STORE_FAST
             84 STORE_FAST               1 (prod)

  6          88 RETURN_VALUE_R           1

@iritkatriel
Copy link
Member Author

To summarise today's adventures:

When can optimize away a STORE_FAST_R whose target is a tmp, but not a local. My last commit did this, but I reverted it because it has issues with refcounts. Basically, a COPY_R does an INCREF, and when we optimise one away we lose a reference, so this also needs to be done selectively. I'll figure out when it does or doesn't work tomorrow.

@iritkatriel
Copy link
Member Author

Starting again at #101078.

@iritkatriel
Copy link
Member Author

Starting again at #101078.

Actually probably here #101208.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants