Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-89279: In ceval.c, redefine some macros for speed #32387

Merged
merged 27 commits into from
Apr 22, 2022
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
0be9d62
TEMPORARY hacks to show inline decisions
gvanrossum Apr 6, 2022
abd9118
In ceval.c, redefine some macros for speed
gvanrossum Apr 7, 2022
1724056
Merge remote-tracking branch 'origin/main' into inline
gvanrossum Apr 7, 2022
50e3d7b
Victor's review
gvanrossum Apr 7, 2022
59da3bf
Merge branch 'python:main' into inline
gvanrossum Apr 7, 2022
ab6660e
Add _Py_atomic_load_32bit_impl and use cast macro
gvanrossum Apr 7, 2022
ae6b53c
Move _Py_atomic_load_32bit_impl out of '#if'
gvanrossum Apr 7, 2022
01cff81
Go back to full PGO test suite
gvanrossum Apr 8, 2022
69da380
Longer comment; improve _Py_atomic_load_relaxed
gvanrossum Apr 8, 2022
2894f6e
Only specialize atomic-load-relaxed if MSC
gvanrossum Apr 9, 2022
4b70976
Try _Py_atomic_load_relaxed_int32
gvanrossum Apr 9, 2022
5617fcd
MSC is better off with the default Py_UNREACHABLE()
gvanrossum Apr 11, 2022
a901b91
Update bpo numbers to GH issue numbers
gvanrossum Apr 11, 2022
cceb531
Split accidentally merged lines
gvanrossum Apr 13, 2022
26e8107
Delete temporary /d2inlinelogfull option
gvanrossum Apr 13, 2022
11084c0
Merge branch 'main' into inline
gvanrossum Apr 13, 2022
b7dfcdc
📜🤖 Added by blurb_it.
blurb-it[bot] Apr 13, 2022
9a5c57c
Use static_assert() in _Py_atomic_load_relaxed_int32
gvanrossum Apr 19, 2022
a8cba6e
Revert "MSC is better off with the default Py_UNREACHABLE()"
gvanrossum Apr 19, 2022
c43ec56
Add comment referring to GH-89279 for is_method()
gvanrossum Apr 19, 2022
9a15194
static_assert() is only a C++ feature
gvanrossum Apr 19, 2022
5c992d2
Merge remote-tracking branch 'origin/main' into inline
gvanrossum Apr 20, 2022
7b342e4
Also redefine _Py_DECREF_SPECIALIZED as a macro
gvanrossum Apr 20, 2022
3b21c52
Reformat macros for readability
gvanrossum Apr 21, 2022
41cb067
Try to silence warnings
gvanrossum Apr 21, 2022
87aaf81
Update Python/ceval.c
vstinner Apr 21, 2022
3508f45
Update Python/ceval.c
vstinner Apr 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions Include/pymacro.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,6 @@
# define Py_UNREACHABLE() __builtin_unreachable()
#elif defined(__clang__) || defined(__INTEL_COMPILER)
# define Py_UNREACHABLE() __builtin_unreachable()
#elif defined(_MSC_VER)
# define Py_UNREACHABLE() __assume(0)
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
#else
# define Py_UNREACHABLE() \
Py_FatalError("Unreachable C code path reached")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve interpreter performance on Windows by inlining a few specific macros.
49 changes: 36 additions & 13 deletions Python/ceval.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,32 @@
# error "ceval.c must be build with Py_BUILD_CORE define for best performance"
#endif

#ifndef Py_DEBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only MSVC is affected, only redefine these macros for MSVC, no?

Suggested change
#ifndef Py_DEBUG
#if defined(_MSC_VER) && !defined(Py_DEBUG)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer us running the same code on all platforms if possible, so that it's less likely that someone accidentally breaks the Windows version by want appears to be an innocent change on Linux.

(I have to make an exception for load_relaxed because it is heavily platform-specific.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be extra safe, you may also check Py_REF_DEBUG, even if Py_REF_DEBUG should not be defined if Py_DEBUG is defined.

Suggested change
#ifndef Py_DEBUG
#if !defined(Py_DEBUG) && !defined(Py_REF_DEBUG)

Or maybe object.h should fail with #error with Py_REF_DEBUG is defined whereas Py_DEBUG is not.

// GH-89279: The MSVC compiler does not inline these static inline functions
// in PGO build in _PyEval_EvalFrameDefault(), because this function is over
// the limit of PGO, and that limit cannot be configured.
// Define them as macros to make sure that they are always inlined by the
// preprocessor.

#undef Py_DECREF
#define Py_DECREF(arg) do { PyObject *op = _PyObject_CAST(arg); if (--op->ob_refcnt == 0) { destructor d = Py_TYPE(op)->tp_dealloc; (*d)(op); } } while (0)
Copy link
Member

@vstinner vstinner Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such long line is one of the reason why I wrote PEP 670 :-) I propose to make the code more readable by putting one statement per line, as done in static inline functions:

#ifndef Py_DEBUG
// GH-89279: The MSVC compiler does not inline these static inline functions
// in PGO build in _PyEval_EvalFrameDefault(), because this function is over
// the limit of PGO, and that limit cannot be configured.
// Define them as macros to make sure that they are always inlined by the
// preprocessor.

#undef Py_DECREF
#define Py_DECREF(arg) \
    do { \
        PyObject *op = _PyObject_CAST(arg); \
        if (--op->ob_refcnt == 0) { \
            destructor dealloc = Py_TYPE(op)->tp_dealloc; \
            (*dealloc)(op); \
        } \
    } while (0)

#undef Py_XDECREF
#define Py_XDECREF(arg) \
    do { \
        PyObject *xop = _PyObject_CAST(arg); \
        if (xop != NULL) { \
            Py_DECREF(xop); \
        } \
    } while (0)

#undef Py_IS_TYPE
#define Py_IS_TYPE(ob, type) \
    (_PyObject_CAST_CONST(ob)->ob_type == (type))

#endif

// GH-89279: Similar to above, force inlining by using a macro.
#if defined(_MSC_VER) && SIZEOF_INT == 4
#define _Py_atomic_load_relaxed_int32(ATOMIC_VAL) \
    (assert(sizeof((ATOMIC_VAL)->_value) == 4), \
     *((volatile int*)&((ATOMIC_VAL)->_value)))
#else
#define _Py_atomic_load_relaxed_int32(ATOMIC_VAL) \
    _Py_atomic_load_relaxed(ATOMIC_VAL)
#endif

I declared Py_XDECREF just after Py_DECREF. I also renamed "op1" to "xop" in Py_XDECREF.

UPDATE: Oh, and to copy/paste Py_IS_TYPE(), I used _PyObject_CAST_CONST().


#undef Py_IS_TYPE
#define Py_IS_TYPE(ob, type) (_PyObject_CAST(ob)->ob_type == (type))

#undef Py_XDECREF
#define Py_XDECREF(arg) do { PyObject *op1 = _PyObject_CAST(arg); if (op1 != NULL) { Py_DECREF(op1); } } while (0)

#endif

// GH-89279: Similar to above, force inlining by using a macro.
#if defined(_MSC_VER) && SIZEOF_INT == 4
#define _Py_atomic_load_relaxed_int32(ATOMIC_VAL) (assert(sizeof((ATOMIC_VAL)->_value) == 4), *((volatile int*)&((ATOMIC_VAL)->_value)))
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
#else
#define _Py_atomic_load_relaxed_int32(ATOMIC_VAL) _Py_atomic_load_relaxed(ATOMIC_VAL)
#endif


/* Forward declarations */
static PyObject *trace_call_function(
PyThreadState *tstate, PyObject *callable, PyObject **stack,
Expand Down Expand Up @@ -130,10 +156,10 @@ COMPUTE_EVAL_BREAKER(PyInterpreterState *interp,
struct _ceval_state *ceval2)
{
_Py_atomic_store_relaxed(&ceval2->eval_breaker,
_Py_atomic_load_relaxed(&ceval2->gil_drop_request)
| (_Py_atomic_load_relaxed(&ceval->signals_pending)
_Py_atomic_load_relaxed_int32(&ceval2->gil_drop_request)
| (_Py_atomic_load_relaxed_int32(&ceval->signals_pending)
&& _Py_ThreadCanHandleSignals(interp))
| (_Py_atomic_load_relaxed(&ceval2->pending.calls_to_do)
| (_Py_atomic_load_relaxed_int32(&ceval2->pending.calls_to_do)
&& _Py_ThreadCanHandlePendingCalls())
| ceval2->pending.async_exc);
}
Expand Down Expand Up @@ -678,7 +704,7 @@ _Py_FinishPendingCalls(PyThreadState *tstate)

struct _pending_calls *pending = &tstate->interp->ceval.pending;

if (!_Py_atomic_load_relaxed(&(pending->calls_to_do))) {
if (!_Py_atomic_load_relaxed_int32(&(pending->calls_to_do))) {
return;
}

Expand Down Expand Up @@ -1125,22 +1151,22 @@ eval_frame_handle_pending(PyThreadState *tstate)
struct _ceval_runtime_state *ceval = &runtime->ceval;

/* Pending signals */
if (_Py_atomic_load_relaxed(&ceval->signals_pending)) {
if (_Py_atomic_load_relaxed_int32(&ceval->signals_pending)) {
if (handle_signals(tstate) != 0) {
return -1;
}
}

/* Pending calls */
struct _ceval_state *ceval2 = &tstate->interp->ceval;
if (_Py_atomic_load_relaxed(&ceval2->pending.calls_to_do)) {
if (_Py_atomic_load_relaxed_int32(&ceval2->pending.calls_to_do)) {
if (make_pending_calls(tstate->interp) != 0) {
return -1;
}
}

/* GIL drop request */
if (_Py_atomic_load_relaxed(&ceval2->gil_drop_request)) {
if (_Py_atomic_load_relaxed_int32(&ceval2->gil_drop_request)) {
/* Give another thread a chance */
if (_PyThreadState_Swap(&runtime->gilstate, NULL) != tstate) {
Py_FatalError("tstate mix-up");
Expand Down Expand Up @@ -1290,7 +1316,7 @@ eval_frame_handle_pending(PyThreadState *tstate)

#define CHECK_EVAL_BREAKER() \
_Py_CHECK_EMSCRIPTEN_SIGNALS_PERIODICALLY(); \
if (_Py_atomic_load_relaxed(eval_breaker)) { \
if (_Py_atomic_load_relaxed_int32(eval_breaker)) { \
goto handle_eval_breaker; \
}

Expand Down Expand Up @@ -1571,10 +1597,7 @@ typedef struct {
PyObject *kwnames;
} CallShape;

static inline bool
is_method(PyObject **stack_pointer, int args) {
return PEEK(args+2) != NULL;
}
#define is_method(stack_pointer, args) (PEEK((args)+2) != NULL)
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved

#define KWNAMES_LEN() \
(call_shape.kwnames == NULL ? 0 : ((int)PyTuple_GET_SIZE(call_shape.kwnames)))
Expand Down Expand Up @@ -1722,7 +1745,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
PREDICTED(RESUME_QUICK);
assert(tstate->cframe == &cframe);
assert(frame == cframe.current_frame);
if (_Py_atomic_load_relaxed(eval_breaker) && oparg < 2) {
if (_Py_atomic_load_relaxed_int32(eval_breaker) && oparg < 2) {
goto handle_eval_breaker;
}
DISPATCH();
Expand Down