Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove import re from script template #239

Closed
wants to merge 1 commit into from

Conversation

hukkin
Copy link

@hukkin hukkin commented Jan 10, 2025

Hello!

I was wondering if there is a downside to doing what's in this PR, that is, removing the re import from the executable wrapper?

See Python discourse for context. This would allow simple CLI tools that never import re to load a little faster.

@vsajip
Copy link
Collaborator

vsajip commented Jan 10, 2025

How much of a speed-up do you expect? Would the user of the CLI tool even notice? Even if the CLI tool is a no-op (where the extra time would be the largest, proportionately), how much time do you save? Have you benchmarked/measured it?

@notatallshaw
Copy link
Member

FYI, looking at python -SsuB -X importtime -c "import sys; import re" vs. python -SsuB -X importtime -c "import sys", which I think is the most optimistic situation, and doesn’t consider the performance of the logic, it's about ~8ms on my computer.

I wasn't able to construct an end to end benchmark where this is bigger than the margin of error on my own machine, but that doesn't exclude pathological systems and/or cases where it's being called programmatically where it might be noticeable.

@vsajip
Copy link
Collaborator

vsajip commented Jan 11, 2025

It doesn't seem like the extra time would be noticeable, given that the CLI tool would presumably be doing some useful work, taking perhaps hundreds of milliseconds or longer - the time for the run would presumably dwarf this time difference.

@hukkin
Copy link
Author

hukkin commented Jan 11, 2025

@notatallshaw's estimate seems right to me. I also get ~8 ms difference with:

foo@bar:~$ hyperfine --warmup 3 'python3.13 -SsuB -c "import sys; import re"' 'python3.13 -SsuB -c "import sys"'
Benchmark 1: python3.13 -SsuB -c "import sys; import re"
  Time (mean ± σ):      15.0 ms ±   0.8 ms    [User: 13.2 ms, System: 2.4 ms]
  Range (min … max):    14.1 ms …  17.9 ms    157 runs
 
Benchmark 2: python3.13 -SsuB -c "import sys"
  Time (mean ± σ):       6.9 ms ±   0.4 ms    [User: 6.3 ms, System: 1.4 ms]
  Range (min … max):     6.4 ms …   8.5 ms    326 runs
 
Summary
  'python3.13 -SsuB -c "import sys"' ran
    2.18 ± 0.17 times faster than 'python3.13 -SsuB -c "import sys; import re"'

This is for import time.

Compiling the regex seems to take about 1 ms more than the str.endswith() checks.
EDIT: Compiling the regex seems to take about 0.1 ms more than the str.endswith() checks.

foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])'
1 loop, best of 1: 113 usec per loop
foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'if sys.argv[0].endswith("-script.pyw"): sys.argv[0] = sys.argv[0][: -11]' 'elif sys.argv[0].endswith(".exe"): sys.argv[0] = sys.argv[0][: -4]'
1 loop, best of 1: 2.57 usec per loop

I used timeit's -n 1 -r 1 options here to avoid caching the regex, but the difference seems constant when I repeat the runs.

So about 10 ms difference in total.
EDIT: So about 8 ms difference in total. Regex compilation time seems insignificant here.

I agree it's not a lot. Though, with performance improvements, I find sometimes it's easier to come up with 10 ways to gain 10 ms rather than one 100 ms gain.

@effigies
Copy link

Followed the discussion on discuss.python.org. Overall, I do think avoiding imports is good, to leave as much control in the hands of the application developer as possible, if they choose to shave load times to the bone. I'd also note that pipx run black will hit any penalty twice / benefit from any speedup twice over.

Anyway, I wanted to see the speed of the argv[0] fixup using the original regex compared to the proposed change. It's an order of magnitude faster (58ns vs 540-1590ns) and much less dependent on the content of the argv[0]. I also included a couple versions that do not repeatedly call sys.argv[0].

from timeit import timeit

setup = """\
from types import SimpleNamespace
import re
sys = SimpleNamespace(argv=["{argv0}"])
re.purge()
"""

regex = r"""\
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
"""

branch = r"""\
if sys.argv[0].endswith('-script.pyw'):
    sys.argv[0] = sys.argv[0][:-11]
elif sys.argv[0].endswith('.exe'):
    sys.argv[0] = sys.argv[0][:-4]
"""

walrus = r"""\
if (arg0 := (argv := sys.argv)[0]).endswith('-script.pyw'):
    argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
    argv[0] = arg0[:-4]
"""

nowalrus = r"""\
argv = sys.argv
arg0 = argv[0]
if arg0.endswith('-script.pyw'):
    argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
    argv[0] = arg0[:-4]
"""

for argv0 in ('test', 'test-script.pyw', 'test.exe', 'test-script', '/somewhat/long/absolute/path/to/test-script.py'):
    print(f"{argv0=}")
    res = timeit(stmt=regex, setup=setup.format(argv0=argv0))
    print(f"regex: {res*1000:.3g}ns")
    res = timeit(stmt=branch, setup=setup.format(argv0=argv0))
    print(f"branch: {res*1000:.3g}ns")
    res = timeit(stmt=nowalrus, setup=setup.format(argv0=argv0))
    print(f"nowalrus: {res*1000:.3g}ns")
    res = timeit(stmt=walrus, setup=setup.format(argv0=argv0))
    print(f"walrus: {res*1000:.3g}ns\n")
argv0='test'
regex: 573ns
branch: 58.3ns
nowalrus: 54.4ns
walrus: 46.4ns

argv0='test-script.pyw'
regex: 543ns
branch: 59.6ns
nowalrus: 51ns
walrus: 50.5ns

argv0='test.exe'
regex: 562ns
branch: 56.4ns
nowalrus: 53.8ns
walrus: 50.2ns

argv0='test-script'
regex: 792ns
branch: 56.8ns
nowalrus: 50.7ns
walrus: 53ns

argv0='/somewhat/long/absolute/path/to/test-script.py'
regex: 1.59e+03ns
branch: 62.9ns
nowalrus: 51.7ns
walrus: 56.2ns

@hukkin
Copy link
Author

hukkin commented Jan 13, 2025

I'd also note that pipx run black will hit any penalty twice / benefit from any speedup twice over.

I think the same might apply to pre-commit, tox, nox, etc. There's many CLI runners/wrappers written in Python.

I also included a couple versions that do not repeatedly call sys.argv[0].

Interesting measurements. My understanding is, though, that this script must work with Python 2.7 and 3.6, so newer features such as walrus operator or str.removesuffix cannot be used.

Even if the CLI tool is a no-op (where the extra time would be the largest, proportionately), how much time do you save?

Proportionally, this can more than double the performance of a no-op:

foo@bar:~$ hyperfine --warmup 6 'python3.13 -SsuB -c "pass"' 'python3.13 -SsuB -c "import re"'
Benchmark 1: python3.13 -SsuB -c "pass"
  Time (mean ± σ):       6.7 ms ±   0.4 ms    [User: 5.8 ms, System: 1.5 ms]
  Range (min … max):     6.2 ms …   8.7 ms    332 runs
 
Benchmark 2: python3.13 -SsuB -c "import re"
  Time (mean ± σ):      14.2 ms ±   0.3 ms    [User: 12.4 ms, System: 2.4 ms]
  Range (min … max):    13.7 ms …  15.5 ms    176 runs
 
Summary
  'python3.13 -SsuB -c "pass"' ran
    2.13 ± 0.14 times faster than 'python3.13 -SsuB -c "import re"'

It doesn't seem like the extra time would be noticeable, given that the CLI tool would presumably be doing some useful work, taking perhaps hundreds of milliseconds or longer - the time for the run would presumably dwarf this time difference.

In a vacuum, the extra time may not be noticeable (on a typical modern system), but I think it could well be one piece in a puzzle of someone optimizing their CLI no-op from a slowish 200 ms to near instant 100 ms.
I encourage everyone to try running sleep 0, sleep .1 and sleep .2 on the command line. They all do feel very different, at least to me.

There are also environments slower than a typical desktop/laptop. On my Raspberry Pi 4, for instance, import re takes over 25 ms. And that's a relatively new model of the "flagship" series.


Performance aside, this PR frees up the name re in the wrapper script's namespace. Any names used in that script limit the naming of entry point functions. Currently, an uncaught AttributeError is raised by the wrapper script if an entrypoint function is named as re. The entrypoint spec doesn't seem to mention exceptional function names that cannot be used, so if not fixed, this limitation may be worth documenting.

@vsajip
Copy link
Collaborator

vsajip commented Jan 13, 2025

Any names used in that script limit the naming of entry point functions. Currently, an uncaught AttributeError is raised by the wrapper script if an entrypoint function is named as re.

Doesn't that apply to an entry point function sys, too?

@hukkin
Copy link
Author

hukkin commented Jan 13, 2025

Yes it does. I think that could either be documented, or special cased by writing a script with something like

from some.entrypoint.module import sys as _sys

in case someone chooses to use sys as function name.


EDIT: And same goes for the name __name__. That would also break the script when used as function name.

@pfmoore
Copy link
Member

pfmoore commented Jan 13, 2025

Are the issues of name shadowing real, or just theoretical? It seems to me that they aren't likely to matter in real life.

As far as the performance issue is concerned, it seems like an easy enough fix for a small but measurable gain. Not much to get worked up about, but equally not unreasonable to fix as people do complain about Python startup time, and every little improvement helps.

Ultimately, though, it's @vsajip's call, and I don't think it's worth an extended debate either way. Users can always override the script template themselves if they want to.

@vsajip
Copy link
Collaborator

vsajip commented Jan 14, 2025

Users can always override the script template themselves if they want to.

Exactly, and I'm thinking that I'm not inclined to change this, because that opens things up to lots of other micro-optimisations being suggested because of the valid point that Paul made that "every little improvement helps", and in this case I don't think it's worth changing because the improvement is minuscule.

I decided to actually create some executables and see for myself. Using this alternate template:

ALT_TEMPLATE = r'''# -*- coding: utf-8 -*-
import sys
from %(module)s import %(import_name)s
if __name__ == '__main__':
    arg = sys.argv[0]
    if arg.endswith('-script.pyw'):
        sys.argv[0] = arg[: -11]
    elif arg.endswith('.exe'):
        sys.argv[0] = arg[: -4]
    sys.exit(%(func)s())
'''

and this minimal script noop.py:

def main():
    pass

I created some executables using both the standard script template (noop_re.exe) and the ALTERNATE_TEMPLATE shown above (noop_nore.exe). Then I ran some timings to see the difference in a scenario with an actual launcher executable:

~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_re')"  
5 loops, best of 1000: 64.7 msec per loop                                                     
                                                                                              
~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_nore')"
5 loops, best of 1000: 60.9 msec per loop                                                     

which, in this admittedly ad hoc, perhaps non-rigorous exercise, shows an under 4ms difference. For the tiny number of scenarios where the difference matters to someone, they can use script_maker.script_template = ALT_TEMPLATE logic to have exactly the launcher they need.

@vsajip vsajip closed this Jan 14, 2025
@hugovk
Copy link

hugovk commented Jan 14, 2025

Too bad, it's a nice little gain, and we are making similar improvements in the stdlib.

For the tiny number of scenarios where the difference matters to someone, they can use script_maker.script_template = ALT_TEMPLATE logic to have exactly the launcher they need.

As a library or CLI maintainer, where do I use this?

@pfmoore
Copy link
Member

pfmoore commented Jan 14, 2025

As a library or CLI maintainer, where do I use this?

You're one step removed from being able to do it yourself easily. It's pip that calls distlib (on behalf of the users installing your library) and pip doesn't override the template. You might want to look at the wrappers uv uses - they may be different to the ones pip uses (and uv may be more interested in raw speed improvements, because speed is one of their selling points).

But the main point is that it's not you who has control here - it's your users (who run the installer that creates the wrapper). If you want that level of control yourself, you might be better building your own executable. I'm in the process of creating a section in the packaging user guide explaining the details here. But it may be more than you want to do, just for a small gain like this.

@notatallshaw
Copy link
Member

notatallshaw commented Jan 14, 2025

But the main point is that it's not you who has control here - it's your users (who run the installer that creates the wrapper).

This is an unfortunate conclusion to this thread, as the maintainer of the package usually is the one that is motivated to set a good default for how that package is run, not the users of the package.

And the so the suggestion to use script_maker.script_template = ALT_TEMPLATE can't actually be acheived by any of the interested parties here.

I'm in the process of creating a section in the packaging user guide explaining the details pypa/packaging.python.org#1778.

That guide seems Windows specific, but in my experience this kind of latency is far more measurable on non-Windows platforms.

@vsajip
Copy link
Collaborator

vsajip commented Jan 14, 2025

That guide seems Windows specific

Well, that's because this entire PR only applies (has any useful effect) in Windows, having no applicability to non-Windows platforms because the wrapper is a Windows-specific thing. Non-Windows platforms might make use of zipapp-type functionality, but that's orthogonal to the code being discussed here, I think.

Clarification: I'm on my phone and accidentally added the last sentence to the above comment by Damian - I didn't realise you could edit other people's comments! Rectified now.

@notatallshaw
Copy link
Member

notatallshaw commented Jan 14, 2025

Non-Windows platforms might make use of zipapp-type functionality, but that's orthogonal to the code being discussed here, I think.

Ah, I wasn't aware, thanks for the clarifying!

@hukkin
Copy link
Author

hukkin commented Jan 14, 2025

Well, that's because this entire PR only applies (has any useful effect) in Windows, having no applicability to non-Windows platforms because the wrapper is a Windows-specific thing.

I'll leave a note here since this confused me a bit:

This PR does apply, and does have an effect, on non-Windows platforms. This script template, vendored by pip, is what pip install uses to wrap entry point functions on my Linux system. I understand if distlib doesn't intend this to be used on non-Windows platforms, but in practice that is what pip does.

@vsajip
Copy link
Collaborator

vsajip commented Jan 15, 2025

This PR does apply, and does have an effect, on non-Windows platforms.

You're quite right - I can't explain why I said that, except for a brain freeze on my part 🙈 Sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants