-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed overhead for python3 vs python2 in pantsbuild/pants #7373
Comments
Looking at https://speed.python.org/comparison/ and filtering down to the startup benchmarks. It would seem that startup time for python has gone up in python 3. Not sure that explains 100% of the regression. |
It looks like the scale on that graph is seconds, so I don't think that python 3's inherent startup time could be causing this. Given that https://www.pantsbuild.org/invoking.html#profiling-pants isn't showing anything useful, another possibility is that the overhead is outside of pants' own entrypoint somewhere... but it's likely that just adding some debug |
I've also updated the description: this repros for:
vs
|
Found one whole second of additional overhead due to this line: Apply this diff: diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..cd39da454 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -7,6 +7,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import importlib
import locale
import os
+import time
import sys
import warnings
from builtins import object
@@ -99,11 +100,23 @@ class PantsLoader(object):
@staticmethod
def load_and_execute(entrypoint):
+ time_before_load = time.time()
+ print("time before load: {}".format(time_before_load))
+
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
+ time_after_parsing_entrypoint = time.time()
+ print("time after parsing entrypoint: {} (elapsed since last {})".format(time_after_parsing_entrypoint, time_after_parsing_entrypoint - time_before_load))
+
module = importlib.import_module(module_path)
+ time_after_importing_module = time.time()
+ print("time after importing module: {} (elapsed since last {})".format(time_after_importing_module, time_after_importing_module - time_after_parsing_entrypoint))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
+ time_after_getting_entrypoint = time.time()
+ print("time after getting entrypoint: {} (elapsed since last {})".format(time_after_getting_entrypoint, time_after_getting_entrypoint - time_after_importing_module))
+
entrypoint_main()
@classmethod Running
Running
-- Initial idea why this could be happening: as of Py3.3, Also, this only explains half of the slow down. |
I don't think this is the issue. Even with implementing our own import loader by following the tutorial from https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly and https://docs.python.org/3/library/importlib.html#approximating-importlib-import-module, and intentionally leaving out the code to resolve parents, we still get an execution of 1.5 - 1.8 seconds due to the line diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..048c9e5fa 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -7,6 +7,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import importlib
import locale
import os
+import time
import sys
import warnings
from builtins import object
@@ -97,13 +98,35 @@ class PantsLoader(object):
def determine_entrypoint(env_var, default):
return os.environ.pop(env_var, default)
+ @staticmethod
+ def load_module(name):
+ """An approximate implementation of import."""
+ spec = importlib.util.find_spec(name)
+ module = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(module)
+ sys.modules[name] = module
+ return module
+
@staticmethod
def load_and_execute(entrypoint):
+ time_before_load = time.time()
+ print("time before load: {}".format(time_before_load))
+
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
- module = importlib.import_module(module_path)
+ time_after_parsing_entrypoint = time.time()
+ print("time after parsing entrypoint: {} (elapsed since last {})".format(time_after_parsing_entrypoint, time_after_parsing_entrypoint - time_before_load))
+
+ # module = importlib.import_module(module_path)
+ module = PantsLoader.load_module(module_path)
+ time_after_importing_module = time.time()
+ print("time after importing module: {} (elapsed since last {})".format(time_after_importing_module, time_after_importing_module - time_after_parsing_entrypoint))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
+ time_after_getting_entrypoint = time.time()
+ print("time after getting entrypoint: {} (elapsed since last {})".format(time_after_getting_entrypoint, time_after_getting_entrypoint - time_after_importing_module))
+
entrypoint_main()
@classmethod So, the issue is Going to take a break from the investigation. |
Also note this issue reproduces with simply
vs
@stuhood you may want to update the issue description. |
Distilling down observations a bit: Re: import / exec_module - these are the same thing for our purposes - a detail of importing is the fact that it means to execute a module in a new namespace. This is true in python2 and python3 and we really have no control over that - the language requires imports work this way. Narrowing the testing above and further eliminating the resolution of parents red-herring by switching to $ git diff
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca161..c987228ec 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -4,7 +4,6 @@
from __future__ import absolute_import, division, print_function, unicode_literals
-import importlib
import locale
import os
import sys
@@ -101,7 +100,13 @@ class PantsLoader(object):
def load_and_execute(entrypoint):
assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
module_path, func_name = entrypoint.split(':', 1)
- module = importlib.import_module(module_path)
+
+ import time
+ start = time.time()
+ __import__(module_path)
+ module = sys.modules[module_path]
+ print('>>> import took {}'.format(time.time() - start))
+
entrypoint_main = getattr(module, func_name)
assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
entrypoint_main()
py3:
py2
-V executes as little code as possible so the overall times here represent almost pure importing. We measure one bit of that - the time it takes to import the pants entrypoint - and this takes 2x as long under python3. We already know from historical work that a large portion of pants start up time is consumed by gathering options from Tasks and Subsystems. This involves importing more code outside of what we measured here. I think it's safe to say then that import being 2x slower explains everything here. Figuring out why import is 2x slower remains. Is it something we have control over or not? |
Eeinteresting - helpful new python3.7 feature:
Yields: import-times.txt Via
The interesting times here are in the left-hand column - they give the time to import the module itself. The heaviest hitter takes 500ms and their are several 100ms+. Also interesting is comparing
The ratios seem to track the import self time ratios well. Perhaps putting an even finer point on things is looking at the tree output of the profile combined with the diff above for emphasis:
It really does look like something about the |
…on in py3 (#7447) ### Problem Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245 In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time: <img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png"> This profile was generated via applying the following diff: ```diff diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py index 91a4ca1..7510e6e 100644 --- a/src/python/pants/bin/pants_loader.py +++ b/src/python/pants/bin/pants_loader.py @@ -101,7 +101,9 @@ class PantsLoader(object): def load_and_execute(entrypoint): assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`' module_path, func_name = entrypoint.split(':', 1) - module = importlib.import_module(module_path) + from pants.util.contextutil import maybe_profiled + with maybe_profiled('import_module_profile.prof'): + module = importlib.import_module(module_path) entrypoint_main = getattr(module, func_name) assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint) entrypoint_main() ``` then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason. ### Solution - Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing. ### Result `./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged. ### Explanation @Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it. @jsirois ran a quick experiment: ``` $ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done python2.7: 2.63740515709 python3.4: 3.8067204039980425 python3.5: 4.471806422996451 python3.6: 4.527370049007004 python3.7: 3.3335486440046225 ``` Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly. A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889): > per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.* Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change. ### TODO Make a python bug about this regression if it isn't already known.
…on in py3 (#7447) ### Problem Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245 In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time: <img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png"> This profile was generated via applying the following diff: ```diff diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py index 91a4ca1..7510e6e 100644 --- a/src/python/pants/bin/pants_loader.py +++ b/src/python/pants/bin/pants_loader.py @@ -101,7 +101,9 @@ class PantsLoader(object): def load_and_execute(entrypoint): assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`' module_path, func_name = entrypoint.split(':', 1) - module = importlib.import_module(module_path) + from pants.util.contextutil import maybe_profiled + with maybe_profiled('import_module_profile.prof'): + module = importlib.import_module(module_path) entrypoint_main = getattr(module, func_name) assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint) entrypoint_main() ``` then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason. ### Solution - Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing. ### Result `./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged. ### Explanation @Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it. @jsirois ran a quick experiment: ``` $ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done python2.7: 2.63740515709 python3.4: 3.8067204039980425 python3.5: 4.471806422996451 python3.6: 4.527370049007004 python3.7: 3.3335486440046225 ``` Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly. A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889): > per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.* Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change. ### TODO Make a python bug about this regression if it isn't already known.
When comparing
./pants help
with/withoutpants2
, there appears to be a fixed ~2 second overhead on my machine for python3.Profiling using https://www.pantsbuild.org/invoking.html#profiling-pants doesn't show any significant difference between
pants2
andpants
, so it's possible that this overhead exists below/before pants' own entrypoint...?The text was updated successfully, but these errors were encountered: