Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily create code object co_code attribute. #85

Closed
markshannon opened this issue Sep 15, 2021 · 3 comments
Closed

Lazily create code object co_code attribute. #85

markshannon opened this issue Sep 15, 2021 · 3 comments

Comments

@markshannon
Copy link
Member

markshannon commented Sep 15, 2021

When code is quickened we have two copies of the bytecode instructions.

  1. code->co_code (a bytearray object)
  2. code->co_firstinstr (an array of instructions).

Apart from its length (which would be trivial to add to the code object), code->co_code can be reconstructed from code->co_firstinstr.

I propose changing the code object as follows:

Current:

    ...
    _Py_CODEUNIT *co_firstinstr;
    ...
    PyObject *co_code; /* Points to bytes object */
    ...
    union _cache_or_instruction *co_quickened;
}

Proposed:

    int code_length;
    ...
    /* Remove pointer to first instruction */
    ...
    PyObject *co_code; /* Initially NULL, lazily initialized */
    ...
    _Py_CODEUNIT co_instructions[1];
}

Apart from the obvious memory saving, this also allows us to streamline the interpreter a bit.

Pseudo code for generating co_code:

co_code = bytearray(len=co.code_length*2)
index = 0
for inst in co.co_instructions:
     opcode = OPCODE_TO_BASE_MAP[inst.opcode]
     if has_specialized_cache(inst.opcode):
           oparg = get_specialized_cache(inst).original_oparg
      elif has_oparg(inst.opcode):
           oparg = inst.oparg
      else:
           oparg = 0
      co_code[index] = opcode
      co_code[index+1] = oparg
      index += 1
return bytes(co_code)
@markshannon
Copy link
Member Author

I suspect that it might be quite a lot of work to get code.replace() to work. The marshal format will also need to change.

@gvanrossum
Copy link
Collaborator

I think you meant index += 1 at the end of the loop (bytes vs. instruction offsets bite again ;-).

If we expect that most code is never specialized, we could avoid reconstructing co_code from the specialized instructions and instead just materialize co_code when we first specialize a code object -- this will cost the same as the current cost of specialization (it has to copy the instructions to a new array which it can modify), and will slightly reduce deallocation cost compared to today (since we wouldn't have to separately free the new instructions array).

I don't see big problems for code.replace() -- if the co_code key is present we'll have to do it one way, otherwise we'll have to do it another way, no big deal. Similarly for marshal, we can change the format easily to support this.

The key API to change is actually _PyCode_New() -- the simplest thing may just be to give it two different ways to pass in the instructions array (either as a bytes object or as a raw pointer + length). Alternatively we could change it so that its PyObject *code is allowed to be any object supporting the buffer API, and we'll pull the data out of the buffer. If it's an actual bytes object we can use that to pre-populate co_code.

What do you see as the biggest wins here?

  • We could save memory for a second copy of the instructions array for code objects that are specialized. But then we would have to materialize co_code on request per your pseudo code, not use my shortcut above, and this only matters if we expect to be specializing a lot of code objects.
  • We could make storing the last instruction (pointer/offset) faster by storing a pointer rather than computing the offset (which requires keeping the start pointer in the L1 cache, e.g. in an extra local variable).
  • Any other wins I missed?

@gramster gramster moved this to Todo in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Other to Todo in Fancy CPython Board Jan 24, 2022
@markshannon
Copy link
Member Author

Implemented in python/cpython#31888

Repository owner moved this from Todo to Done in Fancy CPython Board Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants