Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pex CLI doesn't offer as much control over pex building process as the PEXBuilder #1907

Closed
hrfuller opened this issue Sep 16, 2022 · 6 comments · Fixed by #2512
Closed

Pex CLI doesn't offer as much control over pex building process as the PEXBuilder #1907

hrfuller opened this issue Sep 16, 2022 · 6 comments · Fixed by #2512

Comments

@hrfuller
Copy link
Contributor

As a major PEX user twitter leverages the PEXBuilder class to build pexes instead of shelling out to the PEX cli for a few reasons.

  1. The --sources-directory option to add sources to a pex doesn't offer fine enough control over the final import paths of source files. In order to make --sources-directory work we would have to prepare all the sources in merged directories. E.g Often multiple directories need to be merged into the same Chroot directory.
  2. We make use of PEXBuilder.add_dist_location to add pre-installed wheels (unzipped wheels) to pexes. We have found that the overhead of using direct file URLs via --requirement on the CLI and even the main PEXBuilder.add_distribution methods to add significant overhead by invoking pip.pex for each distribution. We can improve build times significantly by relying on the existence of a mechanism for fetching 3rdparties that is outside of pex itself.
  3. ... will update if I think of more ...

These may not be things that the Pex CLI wants to support but figured I would list them out based on the fact that PEXBuilder isn't under a stable API contract with users.

@jsirois
Copy link
Member

jsirois commented Sep 18, 2022

@hrfuller can you provide more detail on item 1? What would make sense? Right now you can only use multiple --sources-directory to point at multiple sys.path entries and everything under each pointed to is merged. Do you need to further exclude some files or subdirectories from --sources-directory entries or would a simpler explicit file list work?

On 2 can you provide more color? Pex already never installs a given wheel more than once for a given PEX_ROOT. If the wheel has already been installed there (in installed_wheels/<hash>/the.whl/) it will be re-used. Is it the case you have a cache of installed wheels that is resident but not a PEX_ROOT that is resident? Do you use pex3 lock create lock files? Those also cut out Pip overheads such that, given a --lock Pex just downloads wheels mentioned in the lock (if needed) and installs them (if needed) but never invokes Pip's resolution code after lock creation.

@jsirois
Copy link
Member

jsirois commented Sep 18, 2022

Basically I need a lot more detail about the environmental constraints you use Pex under I think as well as the perf improvements you get in % terms using your current techniques over some example PEX CLI baseline. With that info in hand, I can come up with real answers or ideas or infeasibles.

@jsirois
Copy link
Member

jsirois commented Aug 17, 2024

@hrfuller you were not able to provide more details and Pex has since grown even more ways to pick sources, now including:

  • -D / --sources-directory: Include everything under the given directory.
  • -P / --package: Add a package and all its sub-packages with @ support for locating where the package lives relative to the current directory; e.g.: -P foo.bar@src to select the foo.bar package under src/.
  • -M / --module: Like -P, but to select an individual module, also supporting the @ syntax.
  • --project: The directory of a project to add. Uses the pyproject.toml / setup.py / setup.cfg in that directory to collect sources (and dependencies).

I'm going to assume the details won't be forthcoming and close, but please re-open if you have more specifics about cases modern Pex does not handle well via the CLI that you think it should.

@jsirois jsirois closed this as completed Aug 17, 2024
@jsirois
Copy link
Member

jsirois commented Aug 17, 2024

I totally missed item 2. Although you can now pex --no-pre-install-wheels to get wheels stored in a PEX as is (no extra compression, no pre-installation in a chroot), you still must run a Pip resolve. I find a ~40% speed increase with a quick hack building a torch PEX:

# Existing perf for pre-resolved wheel house:
:; time python -mpex --pip-version latest --no-pre-install-wheels /tmp/wheels/*.whl --intransitive -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m10.857s
user    0m6.478s
sys     0m3.873s

# A quick hack that allows adding all wheels in `PEX_WHEELS_DIR` to the PEXBuilder directly:
:; time PEX_WHEELS_DIR=/tmp/wheels python -mpex --pip-version latest --no-pre-install-wheels -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m6.376s
user    0m3.977s
sys     0m2.451s

On the precedent front there is --requirement-pex:

  --requirements-pex FILE
                        Add requirements from the given .pex file. This option
                        can be used multiple times. (default: [])

So I'll re-open to fix item 2 in the OP.

@hrfuller
Copy link
Contributor Author

Thanks John! I haven't been involved with pex since some job changes but if you still feel like 2 would be a benefit to the project that's great. There has been talk of a pex_binary target in bazel rules_python for sometime. Based on my experience writing bazel rules for pex these changes would have given more control over the final pex that is created. Cheers.

As for details in number 1 I'm a bit hazy now but it had to do with merging directories that represented the same "source root" a pants concept into the pexes being created. This was using a much older version of pex v2 so that may have changed already

@jsirois
Copy link
Member

jsirois commented Aug 17, 2024

Hey Henry! Yeah, I think 2 makes sense as a feature on its own. As to 1, I'm still hazy too. you could definitely merge source roots just using -D source/root/1 -D source/root/2 even back when you filed this issue. So, at that point in time you must've wanted to be able to mask off portions of those source roots to not grab all of them, just some. AFAICT, you can achieve that now with -P package1@source/root/1 -P package2@source/root/1 -D package3@source/root/2. It may be the case that subtracting is easier in some cases than adding like in this example, but I'll definitely wait for specific pain reports to surface before delving into that.

jsirois added a commit to jsirois/pex that referenced this issue Sep 13, 2024
When building a PEX or creating a venv via `pex3 venv create` you can
now tell Pex to use a set of pre-resolved distributions. This is similar
to using `--no-pypi --find-links ...` except that:
1. Its roughly 3x faster since Pip is not asked to do any resolution.
2. It requires all the distributions specified form an already complete
   resolve.

One way to obtain distributions that meet criteria 2 is to use `pip
download -d ...` or `pip wheel -w ...` to pre-resolve the distributions
you need.

Closes pex-tool#1907
jsirois added a commit that referenced this issue Sep 15, 2024
When building a PEX or creating a venv via `pex3 venv create` you can
now tell Pex to use a set of pre-resolved distributions. This is similar
to using `--no-pypi --find-links ...` except that:
1. Its roughly 3x faster since Pip is not asked to do any resolution.
2. It requires all the distributions specified form an already complete
   resolve.

One way to obtain distributions that meet criteria 2 is to use
`pip download -d ...` or `pip wheel -w ...` to pre-resolve the
distributions you need.

Closes #1907
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants