-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove import re
from script template
#239
Conversation
How much of a speed-up do you expect? Would the user of the CLI tool even notice? Even if the CLI tool is a no-op (where the extra time would be the largest, proportionately), how much time do you save? Have you benchmarked/measured it? |
FYI, looking at I wasn't able to construct an end to end benchmark where this is bigger than the margin of error on my own machine, but that doesn't exclude pathological systems and/or cases where it's being called programmatically where it might be noticeable. |
It doesn't seem like the extra time would be noticeable, given that the CLI tool would presumably be doing some useful work, taking perhaps hundreds of milliseconds or longer - the time for the run would presumably dwarf this time difference. |
@notatallshaw's estimate seems right to me. I also get ~8 ms difference with: foo@bar:~$ hyperfine --warmup 3 'python3.13 -SsuB -c "import sys; import re"' 'python3.13 -SsuB -c "import sys"'
Benchmark 1: python3.13 -SsuB -c "import sys; import re"
Time (mean ± σ): 15.0 ms ± 0.8 ms [User: 13.2 ms, System: 2.4 ms]
Range (min … max): 14.1 ms … 17.9 ms 157 runs
Benchmark 2: python3.13 -SsuB -c "import sys"
Time (mean ± σ): 6.9 ms ± 0.4 ms [User: 6.3 ms, System: 1.4 ms]
Range (min … max): 6.4 ms … 8.5 ms 326 runs
Summary
'python3.13 -SsuB -c "import sys"' ran
2.18 ± 0.17 times faster than 'python3.13 -SsuB -c "import sys; import re"' This is for import time.
foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])'
1 loop, best of 1: 113 usec per loop
foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'if sys.argv[0].endswith("-script.pyw"): sys.argv[0] = sys.argv[0][: -11]' 'elif sys.argv[0].endswith(".exe"): sys.argv[0] = sys.argv[0][: -4]'
1 loop, best of 1: 2.57 usec per loop I used timeit's
I agree it's not a lot. Though, with performance improvements, I find sometimes it's easier to come up with 10 ways to gain 10 ms rather than one 100 ms gain. |
Followed the discussion on discuss.python.org. Overall, I do think avoiding imports is good, to leave as much control in the hands of the application developer as possible, if they choose to shave load times to the bone. I'd also note that Anyway, I wanted to see the speed of the argv[0] fixup using the original regex compared to the proposed change. It's an order of magnitude faster (58ns vs 540-1590ns) and much less dependent on the content of the from timeit import timeit
setup = """\
from types import SimpleNamespace
import re
sys = SimpleNamespace(argv=["{argv0}"])
re.purge()
"""
regex = r"""\
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
"""
branch = r"""\
if sys.argv[0].endswith('-script.pyw'):
sys.argv[0] = sys.argv[0][:-11]
elif sys.argv[0].endswith('.exe'):
sys.argv[0] = sys.argv[0][:-4]
"""
walrus = r"""\
if (arg0 := (argv := sys.argv)[0]).endswith('-script.pyw'):
argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
argv[0] = arg0[:-4]
"""
nowalrus = r"""\
argv = sys.argv
arg0 = argv[0]
if arg0.endswith('-script.pyw'):
argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
argv[0] = arg0[:-4]
"""
for argv0 in ('test', 'test-script.pyw', 'test.exe', 'test-script', '/somewhat/long/absolute/path/to/test-script.py'):
print(f"{argv0=}")
res = timeit(stmt=regex, setup=setup.format(argv0=argv0))
print(f"regex: {res*1000:.3g}ns")
res = timeit(stmt=branch, setup=setup.format(argv0=argv0))
print(f"branch: {res*1000:.3g}ns")
res = timeit(stmt=nowalrus, setup=setup.format(argv0=argv0))
print(f"nowalrus: {res*1000:.3g}ns")
res = timeit(stmt=walrus, setup=setup.format(argv0=argv0))
print(f"walrus: {res*1000:.3g}ns\n")
|
I think the same might apply to
Interesting measurements. My understanding is, though, that this script must work with Python 2.7 and 3.6, so newer features such as walrus operator or
Proportionally, this can more than double the performance of a no-op: foo@bar:~$ hyperfine --warmup 6 'python3.13 -SsuB -c "pass"' 'python3.13 -SsuB -c "import re"'
Benchmark 1: python3.13 -SsuB -c "pass"
Time (mean ± σ): 6.7 ms ± 0.4 ms [User: 5.8 ms, System: 1.5 ms]
Range (min … max): 6.2 ms … 8.7 ms 332 runs
Benchmark 2: python3.13 -SsuB -c "import re"
Time (mean ± σ): 14.2 ms ± 0.3 ms [User: 12.4 ms, System: 2.4 ms]
Range (min … max): 13.7 ms … 15.5 ms 176 runs
Summary
'python3.13 -SsuB -c "pass"' ran
2.13 ± 0.14 times faster than 'python3.13 -SsuB -c "import re"'
In a vacuum, the extra time may not be noticeable (on a typical modern system), but I think it could well be one piece in a puzzle of someone optimizing their CLI no-op from a slowish 200 ms to near instant 100 ms. There are also environments slower than a typical desktop/laptop. On my Raspberry Pi 4, for instance, Performance aside, this PR frees up the name |
Doesn't that apply to an entry point function |
Yes it does. I think that could either be documented, or special cased by writing a script with something like from some.entrypoint.module import sys as _sys in case someone chooses to use EDIT: And same goes for the name |
Are the issues of name shadowing real, or just theoretical? It seems to me that they aren't likely to matter in real life. As far as the performance issue is concerned, it seems like an easy enough fix for a small but measurable gain. Not much to get worked up about, but equally not unreasonable to fix as people do complain about Python startup time, and every little improvement helps. Ultimately, though, it's @vsajip's call, and I don't think it's worth an extended debate either way. Users can always override the script template themselves if they want to. |
Exactly, and I'm thinking that I'm not inclined to change this, because that opens things up to lots of other micro-optimisations being suggested because of the valid point that Paul made that "every little improvement helps", and in this case I don't think it's worth changing because the improvement is minuscule. I decided to actually create some executables and see for myself. Using this alternate template: ALT_TEMPLATE = r'''# -*- coding: utf-8 -*-
import sys
from %(module)s import %(import_name)s
if __name__ == '__main__':
arg = sys.argv[0]
if arg.endswith('-script.pyw'):
sys.argv[0] = arg[: -11]
elif arg.endswith('.exe'):
sys.argv[0] = arg[: -4]
sys.exit(%(func)s())
''' and this minimal script def main():
pass I created some executables using both the standard script template ( ~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_re')"
5 loops, best of 1000: 64.7 msec per loop
~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_nore')"
5 loops, best of 1000: 60.9 msec per loop which, in this admittedly ad hoc, perhaps non-rigorous exercise, shows an under 4ms difference. For the tiny number of scenarios where the difference matters to someone, they can use |
Too bad, it's a nice little gain, and we are making similar improvements in the stdlib.
As a library or CLI maintainer, where do I use this? |
You're one step removed from being able to do it yourself easily. It's pip that calls distlib (on behalf of the users installing your library) and pip doesn't override the template. You might want to look at the wrappers But the main point is that it's not you who has control here - it's your users (who run the installer that creates the wrapper). If you want that level of control yourself, you might be better building your own executable. I'm in the process of creating a section in the packaging user guide explaining the details here. But it may be more than you want to do, just for a small gain like this. |
This is an unfortunate conclusion to this thread, as the maintainer of the package usually is the one that is motivated to set a good default for how that package is run, not the users of the package. And the so the suggestion to use
That guide seems Windows specific, but in my experience this kind of latency is far more measurable on non-Windows platforms. |
Well, that's because this entire PR only applies (has any useful effect) in Windows, having no applicability to non-Windows platforms because the wrapper is a Windows-specific thing. Non-Windows platforms might make use of zipapp-type functionality, but that's orthogonal to the code being discussed here, I think. Clarification: I'm on my phone and accidentally added the last sentence to the above comment by Damian - I didn't realise you could edit other people's comments! Rectified now. |
Ah, I wasn't aware, thanks for the clarifying! |
I'll leave a note here since this confused me a bit: This PR does apply, and does have an effect, on non-Windows platforms. This script template, vendored by |
You're quite right - I can't explain why I said that, except for a brain freeze on my part 🙈 Sorry about that. |
Hello!
I was wondering if there is a downside to doing what's in this PR, that is, removing the
re
import from the executable wrapper?See Python discourse for context. This would allow simple CLI tools that never import
re
to load a little faster.