-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent / nondeterministic failure building requirements.pex
due to missing site-packages
#2066
Comments
@danxmoran I assume you're still using the |
@jsirois correct. Here's the #!/bin/bash
# This command line should execute the same process as pants did internally.
export CPPFLAGS= LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LDFLAGS= PATH=$'/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/buildevents/be/bin-linux:/tmp/be/bin-linux' PEX_IGNORE_RCFILES=true PEX_ROOT=.cache/pex_root _PEX_FILE_LOCK_STYLE=bsd
cd /tmp/pants/execution/pants-sandbox-rRJetG
/opt/python/3.9.16/bin/python3.9 ./pex --tmpdir .tmp --jobs 1 --python-path $'/opt/python/3.8.16/bin:/opt/python/3.9.16/bin' --output-file requirements.pex --no-emit-warnings --python /opt/python/3.9.16/bin/python3.9 $'--sources-directory=source_files' $'pantsbuild.pants<2.16,>=2.15.0a0' --lock 3rdparty/lockfiles/resolves/pants-plugins.lockfile --no-pypi $'--index=https://pypi.org/simple/' --manylinux manylinux2014 --layout packed
|
I mounted the failed sandbox into our CI image locally and re-executed |
Another thing I notice: The error says the venv at
|
Ok, thanks @danxmoran I'm completely baffled how this happens with a bsd lock. I'll dig back in this weekend to check my understanding again and report back. |
@danxmoran can you remind me of your CI setup? As I recall Docker / k8s is involved. Is |
We're in the middle of switching from self-hosted GHA on k8s to docker CircleCI executors - we've seen the issue on both, but more often in Circle. AFAIK |
So, just to re-iterate, and focusing on the k8s setup, |
Double-checked and I was wrong, |
Ok, thanks. I continue to be fully stumped. I have a few experiments left to try though. |
@danxmoran if you can run this in the CI container you use - substituting the full path of the python interpreter your CI runs with, this will provide a sanity check that that Python thinks it has
It's easy to get the same process owning a lock 2x under The other sanity check is to triple-confirm that no part of |
This is needed to have independent POSIX fcntl locks in the same process by multiple threads and also needed whenever BSD flock locks silently use fcntl emulation (exotic systems or under NFS mounts). Pex is designed to avoid multi-threading when using POSIX locks; so this just serves as a design-error backstop for those style locks. For silently emulated BSD locks, this provides correctness. Analysis of pex-tool#2066 and pex-tool#1969 do not point to this enhancement solving any existing problems, but this is an improvement for the cases mentioned should we hit them. Work regarding pex-tool#2066.
This is needed to have independent POSIX fcntl locks in the same process by multiple threads and also needed whenever BSD flock locks silently use fcntl emulation (exotic systems or under NFS mounts). Pex is designed to avoid multi-threading when using POSIX locks; so this just serves as a design-error backstop for those style locks. For silently emulated BSD locks, this provides correctness. Analysis of #2066 and #1969 do not point to this enhancement solving any existing problems, but this is an improvement for the cases mentioned should we hit them. Work regarding #2066.
Prints Running
I don't see any mention of |
Thanks @danxmoran. I'll be running some experiments using overlay2 today. The kernel notes indicate some POSIX non-compliance, the most interesting bit being how an inode can differ for a file if it starts in a lower layer and is copied up to the upper layer -> this would allow the same path to be treated as two different paths and foil locking. That said, the lower layer is read only and would not have a lockfile in it IIUC - i.e.: I think Even though it should have no bearing on your issue, you might try upgrading to Pex 2.1.124 and reporting back in a few weeks if you're game:
|
@danxmoran within the container image you experimented in above, there are 38! layers. If you have a chance to check their contents for This should do the trick:
Here, the only layer with Just replace the image name with yours. |
I had some trouble running that exact command because of the extra indirection in Docker Desktop for Mac's setup, but I was able to walk through our image's layers using
|
Ok, thanks. No |
@danxmoran I have absolutely no ideas once again. I have not been able to repro in a container using overlay2 like this and I have exhausted the logical possibilities I'm aware of. I'll leave this open but I'm stopping work on it until new information becomes available. |
@jsirois fair enough 😅 will let you know if we have any new insights |
@danxmoran does this style error continue to occur? It's not immediately clear to me if a recent Pants sandbox cleanup fix could be related to this failure / backtrace. |
We haven't seen it since pulling in the sandbox-cleanup fix |
Ok, thanks. I'll do some analysis to see if that fix could conceivably be related here and close out with a comment if I can point out the link. |
Ok - it is the case that the OP error can be emitted in a racy sandbox scenario. Right here a venv (and its site-packages dir) are created on line 199: If the sandbox then gets nuked, the constructor code on 201 below will fail to find the just created I'll close this as fixed / solved by the Pants racy sandbox fix; IOW: this is a won't fix here in Pex - if you externally mutate either the PEX_ROOT or a directory you tell Pex to create a venv in, it's on you. |
Our CI will occasionally fail with an error like:
We've seen it hit across different resolves, Python interpreter versions, and Pex versions - the error above happened on Pex
v2.1.122
.It happens very infrequently (once every few weeks).
I captured the Pants caches & execution dir for the error above. The archive is too big to attach via GitHub, but I can share it via Slack / GDrive upload.
The text was updated successfully, but these errors were encountered: