-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-work atomic_directory locking for faster / clearer failures. #1961
Conversation
The 1st commit is a pure re-name such that only the subsequent commits need be looked at to see changes to the locking code and call sites. |
I tested this over in Pants with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we can get rid of FileLockStyle.BSD
?
No. That's still needed in two cases where thread pools are used. Posix locks do not work on their own in the presence of threads for several reasons. You may enjoy bits of my reading list: Poettering (pulseaudio, systemd): Allison (Samba): SQLite source code: |
If it's not clear, this PR fixes nothing and I can still spot no bugs before or after this change. I'm only aiming to make what was, afaict, correct code, simpler. The additional change of not using a uuid suffix for the workdir just doubles down on the correctness assertion by asking for / arranging a clear error up front to prove something is somehow wrong with the locking code or its assumptions. |
pex/atomic_directory.py
Outdated
# If there is an error making the work_dir that means file-locking guarantees have failed | ||
# somehow and another process has the lock and has made the work_dir already. We let the error | ||
# from os.mkdir propagate in that case. | ||
os.mkdir(atomic_dir.work_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also mean that something segfaulted or was sigkilled, such that it exited uncleanly without tearing down the working directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. I'll step back and think about this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just warn now to ensure we get stderr data but clean up stale workdirs to prevent a stop-in-your-tracks bug that requires manual intervention.
This was broken by pex-tool#1961 which did not account for the two unlocked uses of AtomicDirectory by the pex.resolver for wheel building and installing. Although those directories deserve a second look as candidates for exclusive locking, we have seen no known issues with those directories since introduction of AtomicDirectory up until pex-tool#1961. As such, this change just restores the prior behavior for those two cases. Fixes pex-tool#1968
This was broken by #1961 which did not account for the two unlocked uses of AtomicDirectory by the pex.resolver for wheel building and installing. Although those directories deserve a second look as candidates for exclusive locking, we have seen no known issues with those directories since introduction of AtomicDirectory up until #1961. As such, this change just restores the prior behavior for those two cases. Fixes #1968
Change
atomic_directory
to always grab an exclusive lock and use astable work directory per target directory to surface multiple lock
owners as up-front warnings instead of possibly slow corruptions.