Remove `import re` from script template #239

hukkin · 2025-01-10T17:18:26Z

Hello!

I was wondering if there is a downside to doing what's in this PR, that is, removing the re import from the executable wrapper?

See Python discourse for context. This would allow simple CLI tools that never import re to load a little faster.

vsajip · 2025-01-10T18:45:00Z

How much of a speed-up do you expect? Would the user of the CLI tool even notice? Even if the CLI tool is a no-op (where the extra time would be the largest, proportionately), how much time do you save? Have you benchmarked/measured it?

notatallshaw · 2025-01-10T19:00:23Z

FYI, looking at python -SsuB -X importtime -c "import sys; import re" vs. python -SsuB -X importtime -c "import sys", which I think is the most optimistic situation, and doesn’t consider the performance of the logic, it's about ~8ms on my computer.

I wasn't able to construct an end to end benchmark where this is bigger than the margin of error on my own machine, but that doesn't exclude pathological systems and/or cases where it's being called programmatically where it might be noticeable.

vsajip · 2025-01-11T09:08:16Z

It doesn't seem like the extra time would be noticeable, given that the CLI tool would presumably be doing some useful work, taking perhaps hundreds of milliseconds or longer - the time for the run would presumably dwarf this time difference.

hukkin · 2025-01-11T13:25:25Z

@notatallshaw's estimate seems right to me. I also get ~8 ms difference with:

foo@bar:~$ hyperfine --warmup 3 'python3.13 -SsuB -c "import sys; import re"' 'python3.13 -SsuB -c "import sys"'
Benchmark 1: python3.13 -SsuB -c "import sys; import re"
  Time (mean ± σ):      15.0 ms ±   0.8 ms    [User: 13.2 ms, System: 2.4 ms]
  Range (min … max):    14.1 ms …  17.9 ms    157 runs
 
Benchmark 2: python3.13 -SsuB -c "import sys"
  Time (mean ± σ):       6.9 ms ±   0.4 ms    [User: 6.3 ms, System: 1.4 ms]
  Range (min … max):     6.4 ms …   8.5 ms    326 runs
 
Summary
  'python3.13 -SsuB -c "import sys"' ran
    2.18 ± 0.17 times faster than 'python3.13 -SsuB -c "import sys; import re"'

This is for import time.

~~Compiling the regex seems to take about 1 ms more than the str.endswith() checks.~~
EDIT: Compiling the regex seems to take about 0.1 ms more than the str.endswith() checks.

foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])'
1 loop, best of 1: 113 usec per loop
foo@bar:~$ python3.13 -m timeit -n 1 -r 1 -s 'import re, sys' 'if sys.argv[0].endswith("-script.pyw"): sys.argv[0] = sys.argv[0][: -11]' 'elif sys.argv[0].endswith(".exe"): sys.argv[0] = sys.argv[0][: -4]'
1 loop, best of 1: 2.57 usec per loop

I used timeit's -n 1 -r 1 options here to avoid caching the regex, but the difference seems constant when I repeat the runs.

~~So about 10 ms difference in total.~~
EDIT: So about 8 ms difference in total. Regex compilation time seems insignificant here.

I agree it's not a lot. Though, with performance improvements, I find sometimes it's easier to come up with 10 ways to gain 10 ms rather than one 100 ms gain.

effigies · 2025-01-11T16:46:18Z

Followed the discussion on discuss.python.org. Overall, I do think avoiding imports is good, to leave as much control in the hands of the application developer as possible, if they choose to shave load times to the bone. I'd also note that pipx run black will hit any penalty twice / benefit from any speedup twice over.

Anyway, I wanted to see the speed of the argv[0] fixup using the original regex compared to the proposed change. It's an order of magnitude faster (58ns vs 540-1590ns) and much less dependent on the content of the argv[0]. I also included a couple versions that do not repeatedly call sys.argv[0].

from timeit import timeit

setup = """\
from types import SimpleNamespace
import re
sys = SimpleNamespace(argv=["{argv0}"])
re.purge()
"""

regex = r"""\
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
"""

branch = r"""\
if sys.argv[0].endswith('-script.pyw'):
    sys.argv[0] = sys.argv[0][:-11]
elif sys.argv[0].endswith('.exe'):
    sys.argv[0] = sys.argv[0][:-4]
"""

walrus = r"""\
if (arg0 := (argv := sys.argv)[0]).endswith('-script.pyw'):
    argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
    argv[0] = arg0[:-4]
"""

nowalrus = r"""\
argv = sys.argv
arg0 = argv[0]
if arg0.endswith('-script.pyw'):
    argv[0] = arg0[:-11]
elif arg0.endswith('.exe'):
    argv[0] = arg0[:-4]
"""

for argv0 in ('test', 'test-script.pyw', 'test.exe', 'test-script', '/somewhat/long/absolute/path/to/test-script.py'):
    print(f"{argv0=}")
    res = timeit(stmt=regex, setup=setup.format(argv0=argv0))
    print(f"regex: {res*1000:.3g}ns")
    res = timeit(stmt=branch, setup=setup.format(argv0=argv0))
    print(f"branch: {res*1000:.3g}ns")
    res = timeit(stmt=nowalrus, setup=setup.format(argv0=argv0))
    print(f"nowalrus: {res*1000:.3g}ns")
    res = timeit(stmt=walrus, setup=setup.format(argv0=argv0))
    print(f"walrus: {res*1000:.3g}ns\n")

argv0='test'
regex: 573ns
branch: 58.3ns
nowalrus: 54.4ns
walrus: 46.4ns

argv0='test-script.pyw'
regex: 543ns
branch: 59.6ns
nowalrus: 51ns
walrus: 50.5ns

argv0='test.exe'
regex: 562ns
branch: 56.4ns
nowalrus: 53.8ns
walrus: 50.2ns

argv0='test-script'
regex: 792ns
branch: 56.8ns
nowalrus: 50.7ns
walrus: 53ns

argv0='/somewhat/long/absolute/path/to/test-script.py'
regex: 1.59e+03ns
branch: 62.9ns
nowalrus: 51.7ns
walrus: 56.2ns

hukkin · 2025-01-13T10:52:27Z

I'd also note that pipx run black will hit any penalty twice / benefit from any speedup twice over.

I think the same might apply to pre-commit, tox, nox, etc. There's many CLI runners/wrappers written in Python.

I also included a couple versions that do not repeatedly call sys.argv[0].

Interesting measurements. My understanding is, though, that this script must work with Python 2.7 and 3.6, so newer features such as walrus operator or str.removesuffix cannot be used.

Even if the CLI tool is a no-op (where the extra time would be the largest, proportionately), how much time do you save?

Proportionally, this can more than double the performance of a no-op:

foo@bar:~$ hyperfine --warmup 6 'python3.13 -SsuB -c "pass"' 'python3.13 -SsuB -c "import re"'
Benchmark 1: python3.13 -SsuB -c "pass"
  Time (mean ± σ):       6.7 ms ±   0.4 ms    [User: 5.8 ms, System: 1.5 ms]
  Range (min … max):     6.2 ms …   8.7 ms    332 runs
 
Benchmark 2: python3.13 -SsuB -c "import re"
  Time (mean ± σ):      14.2 ms ±   0.3 ms    [User: 12.4 ms, System: 2.4 ms]
  Range (min … max):    13.7 ms …  15.5 ms    176 runs
 
Summary
  'python3.13 -SsuB -c "pass"' ran
    2.13 ± 0.14 times faster than 'python3.13 -SsuB -c "import re"'

It doesn't seem like the extra time would be noticeable, given that the CLI tool would presumably be doing some useful work, taking perhaps hundreds of milliseconds or longer - the time for the run would presumably dwarf this time difference.

In a vacuum, the extra time may not be noticeable (on a typical modern system), but I think it could well be one piece in a puzzle of someone optimizing their CLI no-op from a slowish 200 ms to near instant 100 ms.
I encourage everyone to try running sleep 0, sleep .1 and sleep .2 on the command line. They all do feel very different, at least to me.

There are also environments slower than a typical desktop/laptop. On my Raspberry Pi 4, for instance, import re takes over 25 ms. And that's a relatively new model of the "flagship" series.

Performance aside, this PR frees up the name re in the wrapper script's namespace. Any names used in that script limit the naming of entry point functions. Currently, an uncaught AttributeError is raised by the wrapper script if an entrypoint function is named as re. The entrypoint spec doesn't seem to mention exceptional function names that cannot be used, so if not fixed, this limitation may be worth documenting.

vsajip · 2025-01-13T18:35:54Z

Any names used in that script limit the naming of entry point functions. Currently, an uncaught AttributeError is raised by the wrapper script if an entrypoint function is named as re.

Doesn't that apply to an entry point function sys, too?

hukkin · 2025-01-13T18:44:06Z

Yes it does. I think that could either be documented, or special cased by writing a script with something like

from some.entrypoint.module import sys as _sys

in case someone chooses to use sys as function name.

EDIT: And same goes for the name __name__. That would also break the script when used as function name.

pfmoore · 2025-01-13T18:57:12Z

Are the issues of name shadowing real, or just theoretical? It seems to me that they aren't likely to matter in real life.

As far as the performance issue is concerned, it seems like an easy enough fix for a small but measurable gain. Not much to get worked up about, but equally not unreasonable to fix as people do complain about Python startup time, and every little improvement helps.

Ultimately, though, it's @vsajip's call, and I don't think it's worth an extended debate either way. Users can always override the script template themselves if they want to.

vsajip · 2025-01-14T15:56:26Z

Users can always override the script template themselves if they want to.

Exactly, and I'm thinking that I'm not inclined to change this, because that opens things up to lots of other micro-optimisations being suggested because of the valid point that Paul made that "every little improvement helps", and in this case I don't think it's worth changing because the improvement is minuscule.

I decided to actually create some executables and see for myself. Using this alternate template:

ALT_TEMPLATE = r'''# -*- coding: utf-8 -*-
import sys
from %(module)s import %(import_name)s
if __name__ == '__main__':
    arg = sys.argv[0]
    if arg.endswith('-script.pyw'):
        sys.argv[0] = arg[: -11]
    elif arg.endswith('.exe'):
        sys.argv[0] = arg[: -4]
    sys.exit(%(func)s())
'''

and this minimal script noop.py:

def main():
    pass

I created some executables using both the standard script template (noop_re.exe) and the ALTERNATE_TEMPLATE shown above (noop_nore.exe). Then I ran some timings to see the difference in a scenario with an actual launcher executable:

~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_re')"  
5 loops, best of 1000: 64.7 msec per loop                                                     
                                                                                              
~\Projects\scratch> python3 -m timeit -r 1000 -s "from os import system" "system('noop_nore')"
5 loops, best of 1000: 60.9 msec per loop

which, in this admittedly ad hoc, perhaps non-rigorous exercise, shows an under 4ms difference. For the tiny number of scenarios where the difference matters to someone, they can use script_maker.script_template = ALT_TEMPLATE logic to have exactly the launcher they need.

hugovk · 2025-01-14T17:16:01Z

Too bad, it's a nice little gain, and we are making similar improvements in the stdlib.

For the tiny number of scenarios where the difference matters to someone, they can use script_maker.script_template = ALT_TEMPLATE logic to have exactly the launcher they need.

As a library or CLI maintainer, where do I use this?

pfmoore · 2025-01-14T17:35:14Z

As a library or CLI maintainer, where do I use this?

You're one step removed from being able to do it yourself easily. It's pip that calls distlib (on behalf of the users installing your library) and pip doesn't override the template. You might want to look at the wrappers uv uses - they may be different to the ones pip uses (and uv may be more interested in raw speed improvements, because speed is one of their selling points).

But the main point is that it's not you who has control here - it's your users (who run the installer that creates the wrapper). If you want that level of control yourself, you might be better building your own executable. I'm in the process of creating a section in the packaging user guide explaining the details here. But it may be more than you want to do, just for a small gain like this.

notatallshaw · 2025-01-14T17:49:48Z

But the main point is that it's not you who has control here - it's your users (who run the installer that creates the wrapper).

This is an unfortunate conclusion to this thread, as the maintainer of the package usually is the one that is motivated to set a good default for how that package is run, not the users of the package.

And the so the suggestion to use script_maker.script_template = ALT_TEMPLATE can't actually be acheived by any of the interested parties here.

I'm in the process of creating a section in the packaging user guide explaining the details pypa/packaging.python.org#1778.

That guide seems Windows specific, but in my experience this kind of latency is far more measurable on non-Windows platforms.

vsajip · 2025-01-14T17:59:44Z

That guide seems Windows specific

Well, that's because this entire PR only applies (has any useful effect) in Windows, having no applicability to non-Windows platforms because the wrapper is a Windows-specific thing. Non-Windows platforms might make use of zipapp-type functionality, but that's orthogonal to the code being discussed here, I think.

Clarification: I'm on my phone and accidentally added the last sentence to the above comment by Damian - I didn't realise you could edit other people's comments! Rectified now.

notatallshaw · 2025-01-14T18:46:17Z

Non-Windows platforms might make use of zipapp-type functionality, but that's orthogonal to the code being discussed here, I think.

Ah, I wasn't aware, thanks for the clarifying!

hukkin · 2025-01-14T22:51:56Z

Well, that's because this entire PR only applies (has any useful effect) in Windows, having no applicability to non-Windows platforms because the wrapper is a Windows-specific thing.

I'll leave a note here since this confused me a bit:

This PR does apply, and does have an effect, on non-Windows platforms. This script template, vendored by pip, is what pip install uses to wrap entry point functions on my Linux system. I understand if distlib doesn't intend this to be used on non-Windows platforms, but in practice that is what pip does.

vsajip · 2025-01-15T09:03:34Z

This PR does apply, and does have an effect, on non-Windows platforms.

You're quite right - I can't explain why I said that, except for a brain freeze on my part 🙈 Sorry about that.

Remove import re from script template

74b7a3a

hukkin mentioned this pull request Jan 14, 2025

Fix generated script when entry point func name is used by the script template #240

Closed

vsajip closed this Jan 14, 2025

hukkin mentioned this pull request Jan 14, 2025

Speed up small CLI tools by removing import re from the excecutable template pypa/pip#13165

Open

1 task

hukkin mentioned this pull request Jan 15, 2025

Remove import re from entrypoint wrapper scripts astral-sh/uv#10627

Merged

ichard26 mentioned this pull request Apr 13, 2025

Guard script wrapper entrypoint import with if __name__ == "__main__" #242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `import re` from script template #239

Remove `import re` from script template #239

hukkin commented Jan 10, 2025

vsajip commented Jan 10, 2025

notatallshaw commented Jan 10, 2025

vsajip commented Jan 11, 2025

hukkin commented Jan 11, 2025 •

edited

Loading

effigies commented Jan 11, 2025

hukkin commented Jan 13, 2025 •

edited

Loading

vsajip commented Jan 13, 2025

hukkin commented Jan 13, 2025 •

edited

Loading

pfmoore commented Jan 13, 2025

vsajip commented Jan 14, 2025 •

edited

Loading

hugovk commented Jan 14, 2025

pfmoore commented Jan 14, 2025

notatallshaw commented Jan 14, 2025 •

edited by vsajip

Loading

vsajip commented Jan 14, 2025 •

edited

Loading

notatallshaw commented Jan 14, 2025 •

edited

Loading

hukkin commented Jan 14, 2025

vsajip commented Jan 15, 2025

Remove import re from script template #239

Remove import re from script template #239

Conversation

hukkin commented Jan 10, 2025

vsajip commented Jan 10, 2025

notatallshaw commented Jan 10, 2025

vsajip commented Jan 11, 2025

hukkin commented Jan 11, 2025 • edited Loading

effigies commented Jan 11, 2025

hukkin commented Jan 13, 2025 • edited Loading

vsajip commented Jan 13, 2025

hukkin commented Jan 13, 2025 • edited Loading

pfmoore commented Jan 13, 2025

vsajip commented Jan 14, 2025 • edited Loading

hugovk commented Jan 14, 2025

pfmoore commented Jan 14, 2025

notatallshaw commented Jan 14, 2025 • edited by vsajip Loading

vsajip commented Jan 14, 2025 • edited Loading

notatallshaw commented Jan 14, 2025 • edited Loading

hukkin commented Jan 14, 2025

vsajip commented Jan 15, 2025

Remove `import re` from script template #239

Remove `import re` from script template #239

hukkin commented Jan 11, 2025 •

edited

Loading

hukkin commented Jan 13, 2025 •

edited

Loading

hukkin commented Jan 13, 2025 •

edited

Loading

vsajip commented Jan 14, 2025 •

edited

Loading

notatallshaw commented Jan 14, 2025 •

edited by vsajip

Loading

vsajip commented Jan 14, 2025 •

edited

Loading

notatallshaw commented Jan 14, 2025 •

edited

Loading