-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 rule resolution causing substantial startup performance hit at import time #7442
Comments
I don't believe "rule resolution" is likely to be involved in the slowdown here -- pants/src/python/pants/engine/rules.py Lines 232 to 233 in 37654aa
The slowness of import when doing anything nontrivial at the top level e.g. parsing |
@cosmicexplorer see: #7373 (comment) which seems to definitely point to @rule or the typical code using @rule today. |
…on in py3 (#7447) ### Problem Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245 In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time: <img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png"> This profile was generated via applying the following diff: ```diff diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py index 91a4ca1..7510e6e 100644 --- a/src/python/pants/bin/pants_loader.py +++ b/src/python/pants/bin/pants_loader.py @@ -101,7 +101,9 @@ class PantsLoader(object): def load_and_execute(entrypoint): assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`' module_path, func_name = entrypoint.split(':', 1) - module = importlib.import_module(module_path) + from pants.util.contextutil import maybe_profiled + with maybe_profiled('import_module_profile.prof'): + module = importlib.import_module(module_path) entrypoint_main = getattr(module, func_name) assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint) entrypoint_main() ``` then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason. ### Solution - Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing. ### Result `./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged. ### Explanation @Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it. @jsirois ran a quick experiment: ``` $ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done python2.7: 2.63740515709 python3.4: 3.8067204039980425 python3.5: 4.471806422996451 python3.6: 4.527370049007004 python3.7: 3.3335486440046225 ``` Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly. A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889): > per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.* Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change. ### TODO Make a python bug about this regression if it isn't already known.
…on in py3 (#7447) ### Problem Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245 In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time: <img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png"> This profile was generated via applying the following diff: ```diff diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py index 91a4ca1..7510e6e 100644 --- a/src/python/pants/bin/pants_loader.py +++ b/src/python/pants/bin/pants_loader.py @@ -101,7 +101,9 @@ class PantsLoader(object): def load_and_execute(entrypoint): assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`' module_path, func_name = entrypoint.split(':', 1) - module = importlib.import_module(module_path) + from pants.util.contextutil import maybe_profiled + with maybe_profiled('import_module_profile.prof'): + module = importlib.import_module(module_path) entrypoint_main = getattr(module, func_name) assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint) entrypoint_main() ``` then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason. ### Solution - Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing. ### Result `./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged. ### Explanation @Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it. @jsirois ran a quick experiment: ``` $ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done python2.7: 2.63740515709 python3.4: 3.8067204039980425 python3.5: 4.471806422996451 python3.6: 4.527370049007004 python3.7: 3.3335486440046225 ``` Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly. A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889): > per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.* Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change. ### TODO Make a python bug about this regression if it isn't already known.
FYI - how to evaluate import time performance
Python 3.7 introduced a new performance profile for imports. Throughout this entire PR, to evaluate the performance, apply this diff:
Then, when running Pants, run
./pants 2> my_log_name.log
Finally, we can use the library
tuna
to analyze the results:pip3 install tuna
tuna my_log_name.log
We cannot use this profile with Python 2 or Python 3.6, but regardless we need to fix this slowdown.
Original issue - Python 3 performance regression
We identified Python 3 has a fixed slowdown of about 2 seconds when running any command, e.g.
./pants
vs../pants2
. This overhead appears because of the linemodule = importlib.import_module(module_path)
, which when running locally is approximately the same asimport pants.bin.pants_exe as module
. See #7373.This import results in a whole 2.8 seconds.
Notably, the import for
pants.backend.native.subsystems.native_toolchain
takes a whole 1.27 seconds!Issue appears to be
@rule
resolutionpants.backend.native.subsystems.native_toolchain
seems to take so long because it uses the V2 engine, and at import time all of its function signatures, including its@rule
decorators, must be resolved. Decorators are not lazily evaluated, and are evaluated at import time.As a simple test for this, removing the rules-related imports for does result in a substantial speedup.
Apply this diff:
Foil in this hypothesis: importing directly is fine
The above runs assume we are trying to run
bin/pants_loader.py
, which is what we normally do when running./pants
. Thebackend/native
code is only being imported as a dependency.We can apply this diff to directly import
backend/native
:When running
./pants
, we then get this perfectly reasonable import time. Why is this import 4 times faster than whenpants_loader.py
indirectly imports the same code?Other open questions
The text was updated successfully, but these errors were encountered: