Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up small CLI tools by removing import re from the excecutable template #13165

Open
1 task done
hukkin opened this issue Jan 14, 2025 · 10 comments · May be fixed by #13166
Open
1 task done

Speed up small CLI tools by removing import re from the excecutable template #13165

hukkin opened this issue Jan 14, 2025 · 10 comments · May be fixed by #13166
Labels
state: needs discussion This needs some more discussion type: feature request Request for a new feature

Comments

@hukkin
Copy link
Contributor

hukkin commented Jan 14, 2025

What's the problem this feature will solve?

This performance improvement was discussed in upstream distlib here, but was rejected: pypa/distlib#239. Performance measurements can be found in that discussion.

Describe the solution you'd like

Override distlib.scripts.ScriptMaker.script_template with a template that doesn't import re.

Alternative Solutions

Keep current state.

Additional context

Previous discussions:

Code of Conduct

@hukkin hukkin added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels Jan 14, 2025
@hukkin hukkin linked a pull request Jan 14, 2025 that will close this issue
@pfmoore
Copy link
Member

pfmoore commented Jan 14, 2025

I'm not particularly keen on having to maintain our own template. While I don't completely agree with the arguments in the distlib issue that it's not worth adding this fix there, I'm not convinced the gain is enough to warrant the extra maintenance cost for us.

(I feel as though we had a custom template at some point in the past, but I was unable to find it with git blame. If we did, it would be useful to know why we dropped it).

@vsajip
Copy link
Contributor

vsajip commented Jan 14, 2025

That "extra maintenance cost" argument applies to distlib too, of course. The script_template attribute was specifically provided so that people who care enough could use their own templates optimised for their own use cases. I feel this is a case of premature optimisation, as no one has actually shown any real-world performance problem - just a micro-benchmark showing a tiny speed-up. And micro-benchmarks are not a good basis for improving app performance.

@notatallshaw
Copy link
Member

That "extra maintenance cost" argument applies to distlib too, of course

I'm not sure it's equivalent here? distlib presumably has it's own test cases it runs for itself, but pip does not run distlib test cases. And, at a glance, and I may be wrong, the non-re implementation seems simpler than the re implementation.

I feel this is a case of premature optimisation, as no one has actually shown any real-world performance problem - just a micro-benchmark showing a tiny speed-up. And micro-benchmarks are not a good basis for improving app performance.

The import of re does concretely require an extra IO operation to read the re file, and the context is running CLI apps that themselves are already well optimized and short lived, I don't think almost any of the arguments linked apply here.

Anecdotally, I have worked in environments where each IO operation can, seemingly randomly, take up to an extra few seconds at the discretion of some security tool. As is often the case with such environments, when I worked in it I was not permitted to communicate anything about it, and I have no way to reproduce it outside that environment anyway. I imagine there are many other pathological cases out there where the performance difference is occasionally noticeable.

Ultimately though, even if I don't agree with the reasoning, I defer to you as the distlib maintainer for what should be the distlib default. Which is why I didn't try to make the case beyond stating the import timing on pypa/distlib#239).

Anyway, perhaps a better solution for a Python app focused on fast startup time and low overhead is to do some kind of Python -> C transpiling and compile to an executable in C land.

@vsajip
Copy link
Contributor

vsajip commented Jan 15, 2025

I'm not sure it's equivalent here?

What I was referring to was that there is nothing to stop people suggesting lots of micro-optimisations, each one shaving a millisecond here or there. While each such optimisation might give a tiny improvement, in practice, I aver that 99.9% of the time, nobody will notice (in this scenario) because it's lost in the overall time taken by a script doing any real work. So my point is that working on this sort of thing is not time well spent, even if theoretically valid, because you're shaving a few milliseconds off something that realistically takes hundreds or thousands of milliseconds, in the main.

Did you read the above link about micro-benchmarks? Do you have any measurements on meaningful time differences in a realistic Python CLI script because of the use of re in the wrapper? Note that loads of scripts doing real work will import re anyway, as it's so useful. You'd gain nothing in those cases, right?

I defer to you as the distlib maintainer for what should be the distlib default

I gave a default and a fairly easy way of overriding it - it's specifically there for that purpose. The fact that pip chooses not to override that default suggests they don't have a sufficient problem with it to change it. In the scenarios that might require optimal performance, to the level of caring about milliseconds as much as this, maybe Python isn't even the right tool, and C, Rust or Go should be used! Note how one good answer in the Python packaging and development tool space is to use such other tools (ruff, uv).

@pfmoore
Copy link
Member

pfmoore commented Jan 15, 2025

The fact that pip chooses not to override that default suggests they don't have a sufficient problem with it to change it.

I think there's a valid argument that there could be a way in pip to allow overriding the default script template. However, the problem is that any such control would be in the hands of the end user, not the application author, making it less useful. IMO, this is a systemic issue with the "entry points" mechanism for creating executables, and is more a reflection of the poor range of options for people wanting to deploy applications written in Python, than a matter of performance.

I would like to see better ways of writing efficient1 applications in Python. But the community seems to have little or no appetite for exploring options in that area, so we have to do the best with what we have.

Footnotes

  1. Within the efficiency constraints imposed by writing your code in Python

@notatallshaw
Copy link
Member

notatallshaw commented Jan 15, 2025

Did you read the above link about micro-benchmarks? Do you have any measurements on meaningful time differences in a realistic Python CLI script because of the use of re in the wrapper? Note that loads of scripts doing real work will import re anyway, as it's so useful. You'd gain nothing in those cases, right?

I read it, I disagree with your reasoning, I don't think it's theoretical, I don't think most of the points apply, it's a provable op vs. no-op (a read, and sometimes compilation, of a large Python file).

But that's discussion for the distlib repo, and as I said, I ultimately defer to your judgement as a distlib maintainer, even if I don't agree with your reasoning.

I gave a default and a fairly easy way of overriding it - it's specifically there for that purpose. The fact that pip chooses not to override that default suggests they don't have a sufficient problem with it to change it

I think this strongly misunderstands pip's incentive structure. If this logic was maintained by pip I would accept it immediately, but it isn't.

Pip is a critical piece of open source infrastructure that has very few development resources, and pip has to vendor various libraries without always having a strong expertise in them, therefore there is a strong incentive to use defaults where possible on the assumption those libraries know their subject matter better than the pip maintainers.

@hukkin
Copy link
Contributor Author

hukkin commented Jan 15, 2025

(I feel as though we had a custom template at some point in the past, but I was unable to find it with git blame. If we did, it would be useful to know why we dropped it).

Here's the PR that dropped the custom template: #6763
It seems to me what happened was that distlib had a complex script template, distlib simplified it, and pip no longer felt the need to override it.

And micro-benchmarks are not a good basis for improving app performance.

Just like notatallshaw, I don't feel this applies very well. We're not optimizing a Django app, but a wrapper that launches console scripts. An improvement will not help a single app, but thousands of tools. Ideally the wrapper should do as little as possible, and if it does more, there's unfortunately no reasonable way for a CLI tool developer to fix that themself.

In the context of this wrapper, this is not a micro-optimization, but a major one. Without the optimization it takes at least 70% more time to run the wrapper than an empty python file (or a file that only says pass). With the optimization the wrapper takes about 1% more time. Almost all time is spent doing import re. I notice you fear people will suggest more optimizations, but to me it seems there isn't much to optimize after this, so maybe simply reject them (yeah, I realize this may be how you feel already now ;) ).

@pfmoore
Copy link
Member

pfmoore commented Jan 15, 2025

Here's the PR that dropped the custom template: #6763

Thanks for finding this! I'm glad to know I wasn't hallucinating it 🙂

I'll be blunt, though, at this point I feel like we've probably spent more time debating this change than will ever be gained by implementing it. Maybe it's time to just drop the issue - @vsajip has made his decision as distlib maintainer, let's accept it.

@notatallshaw
Copy link
Member

One note, the equivelant PR was accepted on uv and has been released: astral-sh/uv#10627. uv has quite a broad user base at this point, so it should surface any issues with it.

@eli-schwartz
Copy link
Contributor

One of the major interesting cases for speeding up barebones startup times is for cases such as progname --version where there is incentive to make the program return a version number and exit as fast as possible. No meaningful work is being done there, spending time on anything is essentially wasteful.

The trick is that it is very challenging to actually make this faster. If you handroll your own argv parser you can do it... but if you use any of:

  • getopt
  • optparse
  • argparse
  • click

then at minimum you are importing gettext for reasons that are very relevant for the purpose of --help text. And gettext imports re. Actually, argparse also directly imports re, and optparse imports it indirectly again via textwrap, and click imports too many things to count including also a direct import of re.

So if you care about early startup performance you are probably using one of the stdlib options. But you aren't avoiding re, at least... not right now. It looks easy to remove it. python/cpython#128899 / python/cpython#128898

@ichard26 ichard26 added state: needs discussion This needs some more discussion and removed S: needs triage Issues/PRs that need to be triaged labels Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs discussion This needs some more discussion type: feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants