Skip to content

bpo-43693: Add _PyCode_New() and do some related cleanup. #26258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
dce5124
Re-organize the _PyCodeObject fields.
ericsnowcurrently May 18, 2021
e416918
Spell out the PyCode_New*() signatures.
ericsnowcurrently May 18, 2021
6740572
Add a TODO to code.h.
ericsnowcurrently May 18, 2021
4381869
Fix a memory leak.
ericsnowcurrently May 18, 2021
19cfb8b
Add _PyCode_New().
ericsnowcurrently May 18, 2021
8f10f5d
Factor out _PyCode_GetFastlocalOffsetId().
ericsnowcurrently May 18, 2021
0c42221
Factor out _PyCode_HasFastlocals().
ericsnowcurrently May 18, 2021
3315b28
Factor out _PyCode_CellForLocal().
ericsnowcurrently May 18, 2021
2ac48ba
Add macros to hide away some of the PyCodeObject fields.
ericsnowcurrently May 19, 2021
02e15a4
Move the fast locals section down.
ericsnowcurrently May 19, 2021
ad70f99
Store more counts on PyCodeObject.
ericsnowcurrently May 19, 2021
e38643f
Update a comment.
ericsnowcurrently May 19, 2021
09fbe71
Factor out macros related to co_code.
ericsnowcurrently May 14, 2021
9b4ca77
Fix some smelly names.
ericsnowcurrently May 20, 2021
f00133f
Fix whitespace.
ericsnowcurrently May 20, 2021
acef742
Fix co_nlocals in code.replace().
ericsnowcurrently May 20, 2021
559e408
Stop using statics in PyCode_NewEmpty().
ericsnowcurrently May 20, 2021
a923b5b
Actually fix whitespace.
ericsnowcurrently May 20, 2021
afcc944
Fix a typo.
ericsnowcurrently May 20, 2021
d304aba
Replace macros with inline functions.
ericsnowcurrently May 20, 2021
23e5485
Unify all the PyCodeObject validation code.
ericsnowcurrently May 20, 2021
81c4a49
Fix the logged function name.
ericsnowcurrently May 20, 2021
af51553
Fix the GDB script.
ericsnowcurrently May 21, 2021
cc3f435
-_PyCode_GetFastlocalOffsetId -> _PyCode_FastOffsetFromId().
ericsnowcurrently May 21, 2021
fcf4ea3
Fix CO_FAST_KWONLY.
ericsnowcurrently May 21, 2021
cb82a6a
Fix a typo.
ericsnowcurrently May 21, 2021
672a75f
Make sure int is used for fast offsets and PyCodeObject counts.
ericsnowcurrently May 22, 2021
32907b1
Fix the comment about fields in hash/comparision.
ericsnowcurrently May 22, 2021
aee3640
Drop co_nlocals from marshaled code objects.
ericsnowcurrently May 22, 2021
fd50f62
Regen a couple more files.
ericsnowcurrently May 22, 2021
9736ff5
Fix test_ctypes.
ericsnowcurrently May 22, 2021
c95314c
Factor out CO_FAST_ARG.
ericsnowcurrently May 24, 2021
b419060
Fix a typo.
ericsnowcurrently May 24, 2021
6195184
Add an assert for the return value of frame_nslots().
ericsnowcurrently May 24, 2021
3c4ec94
Add a comment about using an arguments struct for _PyCode_New().
ericsnowcurrently May 24, 2021
ffc4b54
Drop some outdated TODO comments.
ericsnowcurrently May 24, 2021
292b646
Fix a typo.
ericsnowcurrently May 24, 2021
cd9e34a
Add _PyCode_FastInfoFromOffset().
ericsnowcurrently May 24, 2021
cd4f587
Add _PyCode_GetFastFreevar().
ericsnowcurrently May 24, 2021
a4e6dd3
Use _PyCode_FastInfoFromOffset() elsewhere in ceval.c.
ericsnowcurrently May 24, 2021
f93b6df
Add _PyCode_OffsetFromIndex().
ericsnowcurrently May 24, 2021
f779627
_PyCode_FastInfoFromOffset() -> _PyCode_FastInfoFromOparg().
ericsnowcurrently May 24, 2021
292e457
Add _PyCode_FastInfoFromOffset() and use it in frameobject.c.
ericsnowcurrently May 24, 2021
f6bafb4
Fix warnings.
ericsnowcurrently May 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 100 additions & 30 deletions Include/cpython/code.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# error "this header file must not be included directly"
#endif

// XXX Much of the code here should move to pycore_code.h.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be moved? Is this just about the contents of the PyCodeObject struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should move anything that folks shouldn't even think about if they aren't dealing with the internals. I'd say nearly everything there falls in that category. I'd especially make PyCodeObject opaque in the public API, like it is in the limited API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, anybody looking at code objects from C code is probably dealing with internals... What would be legitimate reasons to look at some fields here? I guess Pyjion and PyDev?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. I'm not going to tackle this part right now though. It's going to take at least some discussion on python-dev.


typedef uint16_t _Py_CODEUNIT;

#ifdef WORDS_BIGENDIAN
Expand All @@ -17,51 +19,89 @@ typedef struct _PyOpcache _PyOpcache;
/* Bytecode object */
struct PyCodeObject {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_posonlyargcount; /* #positional only arguments */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */

/* Note only the following fields are used in hash and/or comparisons
*
* - co_name
* - co_argcount
* - co_posonlyargcount
* - co_kwonlyargcount
* - co_nlocals
* - co_stacksize
* - co_flags
* - co_firstlineno
* - co_code
* - co_consts
* - co_names
* - co_varnames
* - co_freevars
* - co_cellvars
*
* This is done to preserve the name and line number for tracebacks
* and debuggers; otherwise, constant de-duplication would collapse
* identical functions/lambdas defined on different lines.
*/

/* metadata */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */

/* the code */
PyObject *co_code; /* instruction opcodes */
int co_firstlineno; /* first source line number */
PyObject *co_linetable; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */

/* used by the code */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

co_code, co_consts and co_names are the hottest fields in the code object. Could make sure that they are near the start of the object. Moving the metadata section after this section should do.

/* mapping frame offsets to information */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest aren't used in either hash or comparisons, except for co_name,
used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
PyObject *co_linetable; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
int co_nlocalsplus; /* Number of locals + free + cell variables */
PyObject *co_freevars; /* tuple of strings (free variable names) */

/* args (within co_varnames) */
int co_argcount; /* #arguments, except *args */
int co_posonlyargcount; /* #positional only arguments */
int co_kwonlyargcount; /* #keyword only arguments */

/* needed to create the frame */
int co_stacksize; /* #entries needed for evaluation stack */

/* used by the eval loop */
PyObject *co_exceptiontable; /* Byte string encoding exception handling table */

/* other */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
/* Scratch space for extra data relating to the code object.
Type is a void* to keep the format private in codeobject.c to force
people to go through the proper APIs. */
void *co_extra;

/* internal */
int *co_cell2arg; /* Maps cell vars which are arguments. */
/* Per opcodes just-in-time cache
*
* To reduce cache size, we use indirect mapping from opcode index to
* cache object:
* cache = co_opcache[co_opcache_map[next_instr - first_instr] - 1]
*/

// co_opcache_map is indexed by (next_instr - first_instr).
_PyOpcache *co_opcache;
// co_opcache.map is indexed by (next_instr - first_instr).
// * 0 means there is no cache for this opcode.
// * n > 0 means there is cache in co_opcache[n-1].
// * n > 0 means there is cache in co_opcache.cache[n-1].
unsigned char *co_opcache_map;
_PyOpcache *co_opcache;
int co_opcache_flag; // used to determine when create a cache.
unsigned char co_opcache_size; // length of co_opcache.

/* redundant data */
int co_nfastlocals; /* #entries needed for locals
(nlocals + ncellvars + nfreevars) */
int co_nlocals; /* #local variables */
int co_ncellvars; /* #escaping local variables */
int co_nfreevars; /* #closed variables (escaping or not) */
};

/* Masks for co_flags above */
Expand Down Expand Up @@ -112,19 +152,49 @@ struct PyCodeObject {
PyAPI_DATA(PyTypeObject) PyCode_Type;

#define PyCode_Check(op) Py_IS_TYPE(op, &PyCode_Type)
#define PyCode_GetNumFree(op) (PyTuple_GET_SIZE((op)->co_freevars))
#define PyCode_GetNumFree(op) ((op)->co_nfreevars)


/* Public interface */

PyAPI_FUNC(PyCodeObject *) PyCode_New(
int, int, int, int, int, PyObject *, PyObject *,
PyObject *, PyObject *, PyObject *, PyObject *,
PyObject *, PyObject *, int, PyObject *, PyObject *);
int argcount,
int kwonlyargcount,
int nlocals,
int stacksize,
int flags,
PyObject *code,
PyObject *consts,
PyObject *names,
PyObject *varnames,
PyObject *freevars,
PyObject *cellvars,
PyObject *filename,
PyObject *name,
int firstlineno,
PyObject *linetable,
PyObject *exceptiontable
);

PyAPI_FUNC(PyCodeObject *) PyCode_NewWithPosOnlyArgs(
int, int, int, int, int, int, PyObject *, PyObject *,
PyObject *, PyObject *, PyObject *, PyObject *,
PyObject *, PyObject *, int, PyObject *, PyObject *);
/* same as struct above */
int argcount,
int posonlyargcount,
int kwonlyargcount,
int nlocals,
int stacksize,
int flags,
PyObject *code,
PyObject *consts,
PyObject *names,
PyObject *varnames,
PyObject *freevars,
PyObject *cellvars,
PyObject *filename,
PyObject *name,
int firstlineno,
PyObject *linetable,
PyObject *exceptiontable
);

/* Creates a new empty code object with the specified source location. */
PyAPI_FUNC(PyCodeObject *)
Expand Down
143 changes: 142 additions & 1 deletion Include/internal/pycore_code.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
#ifdef __cplusplus
extern "C" {
#endif


#include <stdbool.h>


typedef struct {
PyObject *ptr; /* Cached pointer (borrowed reference) */
uint64_t globals_ver; /* ma_version of global dict */
Expand All @@ -24,9 +27,147 @@ struct _PyOpcache {
char optimized;
};


// We would use an enum if C let us specify the storage type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still use an enum to define the values though, even if you store them in a char.

typedef unsigned char _PyFastLocalKind;
// Note that these all fit within _PyFastLocalKind, as do combinations.
#define CO_FAST_POSONLY 0x01
#define CO_FAST_POSORKW 0x02
#define CO_FAST_VARARGS 0x04
#define CO_FAST_KWONLY 0x08
#define CO_FAST_VARKWARGS 0x10
#define CO_FAST_LOCALONLY 0x20
#define CO_FAST_CELL 0x40
#define CO_FAST_FREE 0x80

#define CO_FAST_ARG (CO_FAST_POSONLY | CO_FAST_POSORKW | CO_FAST_VARARGS | \
CO_FAST_KWONLY | CO_FAST_VARKWARGS)
#define CO_FAST_LOCAL (CO_FAST_ARG | CO_FAST_LOCALONLY)
#define CO_FAST_ANY (CO_FAST_LOCAL | CO_FAST_CELL | CO_FAST_FREE)


struct _PyCodeConstructor {
/* metadata */
PyObject *filename;
PyObject *name;
int flags;

/* the code */
PyObject *code;
int firstlineno;
PyObject *linetable;

/* used by the code */
PyObject *consts;
PyObject *names;

/* mapping frame offsets to information */
PyObject *varnames;
PyObject *cellvars;
PyObject *freevars;

/* args (within varnames) */
int argcount;
int posonlyargcount;
int kwonlyargcount;

/* needed to create the frame */
int stacksize;

/* used by the eval loop */
PyObject *exceptiontable;
};

// Using an "arguments struct" like this is helpful for maintainability
// in a case such as this with many parameters. It does bear a risk:
// if the struct changes and callers are not updated properly then the
// compiler will not catch problems (like a missing argument). This can
// cause hard-to-debug problems. The risk is mitigated by the use of
// check_code() in codeobject.c. However, we may decide to switch
// back to a regular function signature. Regardless, this approach
// wouldn't be appropriate if this weren't a strictly internal API.
// (See the comments in https://github.com/python/cpython/pull/26258.)
PyAPI_FUNC(PyCodeObject *) _PyCode_New(struct _PyCodeConstructor *);


/* Private API */

int _PyCode_InitOpcache(PyCodeObject *co);

PyAPI_FUNC(bool) _PyCode_HasFastlocals(PyCodeObject *, _PyFastLocalKind);
PyAPI_FUNC(void) _PyCode_FastInfoFromOffset(PyCodeObject *, int,
PyObject **, _PyFastLocalKind *);
PyAPI_FUNC(void) _PyCode_FastInfoFromOparg(PyCodeObject *, int,
_PyFastLocalKind,
PyObject **, _PyFastLocalKind *);
PyAPI_FUNC(int) _PyCode_OffsetFromIndex(PyCodeObject *, int, _PyFastLocalKind);
PyAPI_FUNC(int) _PyCode_CellForLocal(PyCodeObject *, int);

/* This does not fail. A negative result means "no match". */
PyAPI_FUNC(int) _PyCode_FastOffsetFromId(PyCodeObject *,
_Py_Identifier *,
_PyFastLocalKind);

static inline PyObject *
_PyCode_GetLocalvar(PyCodeObject *co, int index)
{
assert(index >= 0 && index < co->co_nlocals);
return PyTuple_GetItem(co->co_varnames, index);
}

static inline PyObject *
_PyCode_GetCellvar(PyCodeObject *co, int index)
{
assert(index >= 0 && index < co->co_ncellvars);
return PyTuple_GetItem(co->co_cellvars, index);
}

static inline PyObject *
_PyCode_GetFreevar(PyCodeObject *co, int index)
{
assert(index >= 0 && index < co->co_nfreevars);
return PyTuple_GetItem(co->co_freevars, index);
}

static inline bool
_PyCode_CodeIsValid(PyCodeObject *co)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check for validity, at all. A full validity check would be large and slow. Maybe rename to reflect what it does?

{
return PyBytes_Check(co->co_code) &&
PyBytes_GET_SIZE(co->co_code) <= INT_MAX &&
PyBytes_GET_SIZE(co->co_code) % sizeof(_Py_CODEUNIT) == 0 &&
_Py_IS_ALIGNED(PyBytes_AS_STRING(co->co_code),
sizeof(_Py_CODEUNIT));
}

static inline _Py_CODEUNIT *
_PyCode_GetInstructions(PyCodeObject *co)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be misleading with quickening, so could you drop it for now.
Plus these accessor functions have an annoying tendency to end up in the public API, historically.

{
return (_Py_CODEUNIT *)PyBytes_AS_STRING(co->co_code);
}

static inline Py_ssize_t
_PyCode_NumInstructions(PyCodeObject *co)
{
return PyBytes_Size(co->co_code) / sizeof(_Py_CODEUNIT);
}

/* Speed hacks for very specific performance-sensitive locations. */

// For _PyEval_MakeFrameVector() in ceval.c.
static inline PyObject **
_PyCode_LocalvarsArray(PyCodeObject *co)
{
return ((PyTupleObject *)(co->co_varnames))->ob_item;
}

// For LOAD_CLASSDEREF in ceval.c.
// Normally we'd just use _PyCode_FastInfoFromOffset().
static inline PyObject *
_PyCode_GetFastFreevar(PyCodeObject *co, int offset)
{
return _PyCode_GetFreevar(co, offset - co->co_ncellvars);
}


#ifdef __cplusplus
}
Expand Down
6 changes: 3 additions & 3 deletions Lib/ctypes/test/test_values.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ class struct_frozen(Structure):
continue
items.append((entry.name.decode("ascii"), entry.size))

expected = [("__hello__", 142),
("__phello__", -142),
("__phello__.spam", 142),
expected = [("__hello__", 138),
("__phello__", -138),
("__phello__.spam", 138),
]
self.assertEqual(items, expected, "PyImport_FrozenModules example "
"in Doc/library/ctypes.rst may be out of date")
Expand Down
3 changes: 2 additions & 1 deletion Lib/importlib/_bootstrap_external.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ def _write_atomic(path, data, mode=0o666):
# Python 3.10b1 3439 (Add ROT_N)
# Python 3.11a1 3450 Use exception table for unwinding ("zero cost" exception handling)
# Python 3.11a1 3451 (Add CALL_METHOD_KW)
# Python 3.11a1 3452 (drop co_nlocals from marshaled code objects)

#
# MAGIC must change whenever the bytecode emitted by the compiler may no
Expand All @@ -363,7 +364,7 @@ def _write_atomic(path, data, mode=0o666):
# Whenever MAGIC_NUMBER is changed, the ranges in the magic_values array
# in PC/launcher.c must also be updated.

MAGIC_NUMBER = (3451).to_bytes(2, 'little') + b'\r\n'
MAGIC_NUMBER = (3452).to_bytes(2, 'little') + b'\r\n'
_RAW_MAGIC_NUMBER = int.from_bytes(MAGIC_NUMBER, 'little') # For import.c

_PYCACHE = '__pycache__'
Expand Down
Loading