Skip to content

gh-96143: Add some comments and minor fixes missed in the original PR #96433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,9 @@ active since the start of the Python interpreter, you can use the `-Xperf` optio

$ python -Xperf my_script.py

You can also set the :envvar:`PYTHONPERFSUPPORT` to a nonzero value to actiavate perf
profiling mode globally.

There is also support for dynamically activating and deactivating the perf
profiling mode by using the APIs in the :mod:`sys` module:

Expand Down
2 changes: 2 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,8 @@ Miscellaneous options
.. versionadded:: 3.11
The ``-X frozen_modules`` option.

.. versionadded:: 3.12
The ``-X perf`` option.


Options you shouldn't use
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_perf_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def baz():
script = make_script(script_dir, "perftest", code)
with subprocess.Popen(
[sys.executable, "-Xperf", script],
universal_newlines=True,
text=True,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE,
) as process:
Expand Down
11 changes: 11 additions & 0 deletions Objects/perf_trampoline.c
Original file line number Diff line number Diff line change
Expand Up @@ -284,12 +284,23 @@ new_code_arena(void)
void *start = &_Py_trampoline_func_start;
void *end = &_Py_trampoline_func_end;
size_t code_size = end - start;
// TODO: Check the effect of alignment of the code chunks. Initial investigation
// showed that this has no effect on performance in x86-64 or aarch64 and the current
// version has the advantage that the unwinder in GDB can unwind across JIT-ed code.
//
// We should check the values in the future and see if there is a
// measurable performance improvement by rounding trampolines up to 32-bit
// or 64-bit alignment.

size_t n_copies = mem_size / code_size;
for (size_t i = 0; i < n_copies; i++) {
memcpy(memory + i * code_size, start, code_size * sizeof(char));
}
// Some systems may prevent us from creating executable code on the fly.
// TODO: Call icache invalidation intrinsics if available:
// __builtin___clear_cache/__clear_cache (depending if clang/gcc). This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel would clear the icache for us. Not sure if this would ever be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct in the general case. You are supposed to call this in aarch64 if you modify the code in some executable segment after it was read at least once. This doesn't apply for us but is not true that in all systems the kernel will clear it for us

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion in the previous pr

// technically not necessary but we could be missing something so better be
// safe.
int res = mprotect(memory, mem_size, PROT_READ | PROT_EXEC);
if (res == -1) {
PyErr_SetFromErrno(PyExc_OSError);
Expand Down