-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BinaryPaths should fingerprint found paths. #10769
Comments
A BinaryPathRequest can now include a test to run against each found binary to validate it works and optionally fingerprint it. Eventually fingerprinting the binary contents should be performed automatically and the optional test fingerprint mixed in with it when present as tracked by pantsbuild#10769. We leverage testing found binaries in PexEnvironment to ensure we find only interpreter paths compatible with Pex bootstrapping. This plugs an existing hole not yet encountered in the wild where a Python 2.6 binary (for example) could be chosen and then PEX file bootstrapping fail as a result. We additionally fingerprint interpreters passing the version range test to ensure we detect interpreter upgrades and pyenv shim switches. Even with the automatic hashing of binaries tracked in pantsbuild#10769 working, we'd still need to do this in the pyenv shim case since the same shim script can redirect to different interpreters depending on configuration external to the shim script. # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
A BinaryPathRequest can now include a test to run against each found binary to validate it works and optionally fingerprint it. Eventually fingerprinting the binary contents should be performed automatically and the optional test fingerprint mixed in with it when present as tracked by #10769. We leverage testing found binaries in PexEnvironment to ensure we find only interpreter paths compatible with Pex bootstrapping. This plugs an existing hole not yet encountered in the wild where a Python 2.6 binary (for example) could be chosen and then PEX file bootstrapping fail as a result. We additionally fingerprint interpreters passing the version range test to ensure we detect interpreter upgrades and pyenv shim switches. Even with the automatic hashing of binaries tracked in #10769 working, we'd still need to do this in the pyenv shim case since the same shim script can redirect to different interpreters depending on configuration external to the shim script.
As discussed in slack, this should likely use an absolute file watching facility in the engine, possibly by exposing intrinsics like I'm not sure whether it should be exposed to |
Point two of #10526 will need to support something like the tests shown here if we're to do this in CommandRunners:
|
@jsirois : That makes sense, but it feels like it couples "discovering all entries on a PATH" with "filtering whether a particular discovered PATH entry is valid". The latter thing can easily be implemented in "user space" by |
Sure, but if the filtering is separate and this is not an intrinsic but built into the CommandRunner as you suggested then the sequence becomes:
The current userspace implementation does:
I think we need to replace 1 with a native binary for OSX and a native binary for Linux (to replace the script + hashing of the binaries found pre-filter) since we need to support doing all this over remoting. IOW, even though the local CommandRunner or a local intrinsic could search the search path and fingerprint all in-process, the remoting implementation cannot. For that we need the aformentioned platform specific binaries to push into the CAS for remoting to execute. Since we need that, we might as well just head there 1st. Doing it in-process in the local CommandRunner becomes a performance optimization only. |
I'm less sure about this. Yes, the uniformity between local and remote is good, but not being able to use our file watching facilities because the files that are accessed are hidden from the engine in a forked external binary will mean that we need to fork that binary on every run and have it re-fingerprint things repeatedly. The reason I mention |
We appear to violently agree. As I said, a native binary is needed for remoting so tactically it seemed to make sense to do this 1st. All the latter will be a performance optimization for the local use case. |
Got it, yes. I think that I misread the "native binary for OSX and a native binary for Linux " as "local binary and remote binary", which is clearly not what you meant on re-read. Thanks! |
BinaryPaths will often find binaries in standard search paths like
/bin
and/usr/bin
. These paths are subject to upgrades on user machines where a given path like/usr/bin/python
or/usr/bin/bash
will stay constant over time but its contents will change in ways that can affect program output when upgraded or downgraded. Hashing the contents of the discovered binary paths will automatically invalidate downstream results in the large majority of cases which is what we want.See #10768 (comment) for motivating discussion of this.
The text was updated successfully, but these errors were encountered: