-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove inspect.stack() from @rule parsing to fix import time regression in py3 #7447
remove inspect.stack() from @rule parsing to fix import time regression in py3 #7447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phenomenal! Great work figuring this out.
Some suggestions for PR description:
- Add command you ran to generate the snakeviz visualization. Thanks for including the diff already.
- With the proportional comment, reference the section "Foil in this hypothesis: importing directly is fine" from V2 rule resolution causing substantial startup performance hit at import time #7442. I think this proportional idea explains what is going on.
@Eric-Arellano I believe I have addressed both of your suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…on in py3 (#7447) ### Problem Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245 In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time: <img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png"> This profile was generated via applying the following diff: ```diff diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py index 91a4ca1..7510e6e 100644 --- a/src/python/pants/bin/pants_loader.py +++ b/src/python/pants/bin/pants_loader.py @@ -101,7 +101,9 @@ class PantsLoader(object): def load_and_execute(entrypoint): assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`' module_path, func_name = entrypoint.split(':', 1) - module = importlib.import_module(module_path) + from pants.util.contextutil import maybe_profiled + with maybe_profiled('import_module_profile.prof'): + module = importlib.import_module(module_path) entrypoint_main = getattr(module, func_name) assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint) entrypoint_main() ``` then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason. ### Solution - Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing. ### Result `./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged. ### Explanation @Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it. @jsirois ran a quick experiment: ``` $ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done python2.7: 2.63740515709 python3.4: 3.8067204039980425 python3.5: 4.471806422996451 python3.6: 4.527370049007004 python3.7: 3.3335486440046225 ``` Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly. A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889): > per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.* Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change. ### TODO Make a python bug about this regression if it isn't already known.
This test no longer fails with Python 3. It was likely fixed by #7447, as the test was timing out originally. All Python 3 integration tests were ran three times to ensure we aren't adding a flaky test.
…ctor their BUILD entries to be more granular (#7463) ### Problem Currently, you can only run `./pants test tests/python/pants_test/backend/python/tasks:tasks` or `./pants test tests/python/pants_test/backend/python/tasks:integration`. This lack of granularity makes it difficult to develop in this folder, as the former has a test timeout of 10 minutes and the latter of 40 minutes. While you can use `./pants test.pytest -- -k test_name`, this does not get the specific file but rather the names of individual tests. It also makes it harder to tell which tests are causing the long timeouts for this folder. ### Solution Each test file gets its own BUILD entry, as we do in most folders. #### Also remove tests from Py3 blacklist All tests now pass with Python 3. This is hypothesized to be thanks to: 1) Performance improvement thanks to #7447. As we recall, some of the tests were timing out, and this performance fix reduces the risk of timeouts. 2) Skipping Pexrc-related tests thanks to #7285. Note all integration tests were ran four times in CI to check for flakiness. #### Also refactor `test_python_run_integration.py`'s interpreter selection Deduplicate the constraints and update the constraints to reflect our modern interpreter usage. For example, currently Py37 may not be chosen, which means our cron job would potentially fail.
Problem
Fixes #7373, fixes #7442.
inspect.stack()
in the@rule
decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line:pants/src/python/pants/engine/rules.py
Line 245 in 6739a81
In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an
inspect.stack()
in the@rule
decorator was taking up 66% of the import time:This profile was generated via applying the following diff:
then running
./pants goals
to produceimport_module_profile.prof
, then runningsnakeviz import_module_profile.prof
.snakeviz
must be installed withpip3
for some reason.Solution
inspect.stack()
, get the decorated function's module fromsys.modules
, and use that to obtain the type-valued arguments of aGet
call during@rule
parsing.Result
./pants goals
went from ~4.8 seconds to ~2.3 seconds, while./pants2 goals
went from ~2.3 seconds to ~2.2 seconds. The semantics of how types forGet()
arguments are resolved from the@rule
function's module should be unchanged.Explanation
@Eric-Arellano noted that the return type of
inspect.stack()
changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it.@jsirois ran a quick experiment:
Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing
native_toolchain.py
is significantly faster than when it is imported indirectly.A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889):
Apparently,
inspect.stack(0)
might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change.TODO
Make a python bug about this regression if it isn't already known.