Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 rule resolution causing substantial startup performance hit at import time #7442

Closed
Eric-Arellano opened this issue Mar 27, 2019 · 2 comments · Fixed by #7447
Closed

V2 rule resolution causing substantial startup performance hit at import time #7442

Eric-Arellano opened this issue Mar 27, 2019 · 2 comments · Fixed by #7447

Comments

@Eric-Arellano
Copy link
Contributor

FYI - how to evaluate import time performance

Python 3.7 introduced a new performance profile for imports. Throughout this entire PR, to evaluate the performance, apply this diff:

diff --git a/pants b/pants
index 416d6e191..a2ed8d646 100755
--- a/pants
+++ b/pants
@@ -92,7 +92,7 @@ source ${HERE}/build-support/pants_venv
 source ${HERE}/build-support/bin/native/bootstrap_code.sh

 # Default to using Python 3 if not otherwise specified.
-export PY="${PY:-python3}"
+export PY="${PY:-python3.7}"

 # Set interpreter constraints to exactly match the interpreter used for the venv, if not already set.
 # Note that $PY only impacts which virtualenv we use for the parent Pants process; we must also set
@@ -133,7 +133,7 @@ function exec_pants_bare() {
   activate_pants_venv 1>&2
   bootstrap_native_code 1>&2
   PYTHONPATH="${PANTS_SRCPATH}:${PYTHONPATH}" \
-    exec python ${PANTS_EXE} "$@"
+    exec python -X importtime ${PANTS_EXE} "$@"
 }

 if [[ ! -z "${WRAPPER_REQUIREMENTS}" ]]; then

Then, when running Pants, run ./pants 2> my_log_name.log

Finally, we can use the library tuna to analyze the results:

  1. pip3 install tuna
  2. tuna my_log_name.log

We cannot use this profile with Python 2 or Python 3.6, but regardless we need to fix this slowdown.

Original issue - Python 3 performance regression

We identified Python 3 has a fixed slowdown of about 2 seconds when running any command, e.g. ./pants vs. ./pants2. This overhead appears because of the line module = importlib.import_module(module_path), which when running locally is approximately the same as import pants.bin.pants_exe as module. See #7373.

This import results in a whole 2.8 seconds.

performance_pants-loader

Notably, the import for pants.backend.native.subsystems.native_toolchain takes a whole 1.27 seconds!

Issue appears to be @rule resolution

pants.backend.native.subsystems.native_toolchain seems to take so long because it uses the V2 engine, and at import time all of its function signatures, including its @rule decorators, must be resolved. Decorators are not lazily evaluated, and are evaluated at import time.

As a simple test for this, removing the rules-related imports for does result in a substantial speedup.

Apply this diff:

diff --git a/src/python/pants/backend/native/subsystems/xcode_cli_tools.py b/src/python/pants/backend/native/subsystems/xcode_cli_tools.py
index 3c2e47278..d8eb191ea 100644
--- a/src/python/pants/backend/native/subsystems/xcode_cli_tools.py
+++ b/src/python/pants/backend/native/subsystems/xcode_cli_tools.py
@@ -6,7 +6,6 @@ from __future__ import absolute_import, division, print_function, unicode_litera
 
 import os
 
-from pants.backend.native.config.environment import Assembler, CCompiler, CppCompiler, Linker
 from pants.engine.rules import rule
 from pants.engine.selectors import Select
 from pants.subsystem.subsystem import Subsystem
@@ -131,6 +130,7 @@ class XCodeCLITools(Subsystem):
 
   @memoized_method
   def assembler(self):
+    from pants.backend.native.config.environment import Assembler, CCompiler, CppCompiler, Linker
     return Assembler(
       path_entries=self.path_entries(),
       exe_filename='as',
@@ -139,6 +139,7 @@ class XCodeCLITools(Subsystem):
 
   @memoized_method
   def linker(self):
+    from pants.backend.native.config.environment import Assembler, CCompiler, CppCompiler, Linker
     return Linker(
       path_entries=self.path_entries(),
       exe_filename='ld',
@@ -150,6 +151,7 @@ class XCodeCLITools(Subsystem):
 
   @memoized_method
   def c_compiler(self):
+    from pants.backend.native.config.environment import Assembler, CCompiler, CppCompiler, Linker
     return CCompiler(
       path_entries=self.path_entries(),
       exe_filename='clang',
@@ -159,6 +161,7 @@ class XCodeCLITools(Subsystem):
 
   @memoized_method
   def cpp_compiler(self):
+    from pants.backend.native.config.environment import Assembler, CCompiler, CppCompiler, Linker
     return CppCompiler(
       path_entries=self.path_entries(),
       exe_filename='clang++',
@@ -167,30 +170,6 @@ class XCodeCLITools(Subsystem):
       extra_args=[MIN_OSX_VERSION_ARG])
 
 
-@rule(Assembler, [Select(XCodeCLITools)])
-def get_assembler(xcode_cli_tools):
-  return xcode_cli_tools.assembler()
-
-
-@rule(Linker, [Select(XCodeCLITools)])
-def get_ld(xcode_cli_tools):
-  return xcode_cli_tools.linker()
-
-
-@rule(CCompiler, [Select(XCodeCLITools)])
-def get_clang(xcode_cli_tools):
-  return xcode_cli_tools.c_compiler()
-
-
-@rule(CppCompiler, [Select(XCodeCLITools)])
-def get_clang_plusplus(xcode_cli_tools):
-  return xcode_cli_tools.cpp_compiler()
-
-
 def create_xcode_cli_tools_rules():
   return [
-    get_assembler,
-    get_ld,
-    get_clang,
-    get_clang_plusplus,
   ]

Xcode import without rules resolution

Foil in this hypothesis: importing directly is fine

The above runs assume we are trying to run bin/pants_loader.py, which is what we normally do when running ./pants. The backend/native code is only being imported as a dependency.

We can apply this diff to directly import backend/native:

diff --git a/pants b/pants
index a2ed8d646..ef074a23f 100755
--- a/pants
+++ b/pants
@@ -108,7 +108,7 @@ if [[ "${ONLY_USING_SINGLE_PYTHON_VERSION:-false}" != 'true' ]]; then
   clean_if_interpreter_constraints_changed "${PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS}"
 fi
 
-PANTS_EXE="${HERE}/src/python/pants/bin/pants_loader.py"
+PANTS_EXE="${HERE}/src/python/pants/backend/native/subsystems/native_toolchain.py"
 
 if [[ ! -z "${WRAPPER_REQUIREMENTS}" ]]; then
   REQUIREMENTS=(

When running ./pants, we then get this perfectly reasonable import time. Why is this import 4 times faster than when pants_loader.py indirectly imports the same code?

Importing backend/native directly

Other open questions

  1. Why is this so much slower in Python 3 than Python 2. We know the difference is entirely from the speed of import time. Presumably, Python 2 is also evaluating the rules at import time - is there an even deeper issue that rules resolution is slower in Python 3?
  2. How do we actually fix this? I don't know of any pattern to lazily evaluate decorators. Perhaps we can lazily import the relevant modules? Is there some way to cache the import?
@Eric-Arellano Eric-Arellano changed the title V2 rule resolution causing startup performance regression at import time V2 rule resolution causing substantial startup performance hit at import time Mar 27, 2019
@cosmicexplorer
Copy link
Contributor

I don't believe "rule resolution" is likely to be involved in the slowdown here -- @rule is just doing ast parsing, see:

def _make_rule(output_type, input_selectors, for_goal=None, cacheable=True):
"""A @decorator that declares that a particular static function may be used as a TaskRule.
Actually compiling the rules and checking correctness happens after import time, here:
rule_index = RuleIndex.create(list(rules), union_rules)
and a bit in rust when creating the scheduler below that as well.

The slowness of import when doing anything nontrivial at the top level e.g. parsing @rules is part of why the daemon is such a necessary work stream, but the difference compared to py2 isn't. If the py3 import profiler also lets us drill down into individual methods, that would allow us to tell whether the ast.parse() call or the ast traversal is causing the majority of the import time issues (which would help us determine what changed from py2 to py3 and how to fix it). We may be able to do this with the normal pex profiler or something if we e.g. import pants.util.contextutil.maybe_profiled first, then wrap an import that pulls in all our rules with that.

@jsirois
Copy link
Contributor

jsirois commented Mar 27, 2019

@cosmicexplorer see: #7373 (comment) which seems to definitely point to @rule or the typical code using @rule today.

cosmicexplorer added a commit that referenced this issue Mar 28, 2019
…on in py3 (#7447)

### Problem

Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245

In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time:

<img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png">

This profile was generated via applying the following diff:
```diff
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca1..7510e6e 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -101,7 +101,9 @@ class PantsLoader(object):
   def load_and_execute(entrypoint):
     assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
     module_path, func_name = entrypoint.split(':', 1)
-    module = importlib.import_module(module_path)
+    from pants.util.contextutil import maybe_profiled
+    with maybe_profiled('import_module_profile.prof'):
+      module = importlib.import_module(module_path)
     entrypoint_main = getattr(module, func_name)
     assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
     entrypoint_main()
```
then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason.

### Solution

- Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing.

### Result

`./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged.

### Explanation
@Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it.

@jsirois ran a quick experiment:
```
$ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done
python2.7: 2.63740515709
python3.4: 3.8067204039980425
python3.5: 4.471806422996451
python3.6: 4.527370049007004
python3.7: 3.3335486440046225
```

Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly.

A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889):
> per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.*

Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change.

### TODO

Make a python bug about this regression if it isn't already known.
stuhood pushed a commit that referenced this issue Mar 29, 2019
…on in py3 (#7447)

### Problem

Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245

In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time:

<img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png">

This profile was generated via applying the following diff:
```diff
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca1..7510e6e 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -101,7 +101,9 @@ class PantsLoader(object):
   def load_and_execute(entrypoint):
     assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
     module_path, func_name = entrypoint.split(':', 1)
-    module = importlib.import_module(module_path)
+    from pants.util.contextutil import maybe_profiled
+    with maybe_profiled('import_module_profile.prof'):
+      module = importlib.import_module(module_path)
     entrypoint_main = getattr(module, func_name)
     assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
     entrypoint_main()
```
then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason.

### Solution

- Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing.

### Result

`./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged.

### Explanation
@Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it.

@jsirois ran a quick experiment:
```
$ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done
python2.7: 2.63740515709
python3.4: 3.8067204039980425
python3.5: 4.471806422996451
python3.6: 4.527370049007004
python3.7: 3.3335486440046225
```

Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly.

A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889):
> per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.*

Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change.

### TODO

Make a python bug about this regression if it isn't already known.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants