Re-work atomic_directory locking for faster / clearer failures. #1961

jsirois · 2022-10-20T05:26:55Z

Change atomic_directory to always grab an exclusive lock and use a
stable work directory per target directory to surface multiple lock
owners as up-front warnings instead of possibly slow corruptions.

jsirois · 2022-10-20T05:27:43Z

The 1st commit is a pure re-name such that only the subsequent commits need be looked at to see changes to the locking code and call sites.

jsirois · 2022-10-20T05:34:18Z

I tested this over in Pants with ./pants fmt lint check test --force src/python:: tests/python:: after removing the named caches dir. For what that's worth with the elusive-to-repro missing files bugs, I hit no errors of that sort or the new fail-fast errors that should occur in their place.

benjyw

Does this mean we can get rid of FileLockStyle.BSD ?

jsirois · 2022-10-20T12:18:46Z

Does this mean we can get rid of FileLockStyle.BSD ?

No. That's still needed in two cases where thread pools are used. Posix locks do not work on their own in the presence of threads for several reasons.

You may enjoy bits of my reading list:

Poettering (pulseaudio, systemd):

Allison (Samba):

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html

SQLite source code:

https://www.sqlite.org/src/artifact/c230a7a24?ln=994-1081

jsirois · 2022-10-20T13:49:53Z

If it's not clear, this PR fixes nothing and I can still spot no bugs before or after this change. I'm only aiming to make what was, afaict, correct code, simpler. The additional change of not using a uuid suffix for the workdir just doubles down on the correctness assertion by asking for / arranging a clear error up front to prove something is somehow wrong with the locking code or its assumptions.

stuhood · 2022-10-20T22:02:43Z

pex/atomic_directory.py

+    # If there is an error making the work_dir that means file-locking guarantees have failed
+    # somehow and another process has the lock and has made the work_dir already. We let the error
+    # from os.mkdir propagate in that case.
+    os.mkdir(atomic_dir.work_dir)


It could also mean that something segfaulted or was sigkilled, such that it exited uncleanly without tearing down the working directory?

Yes, that's true. I'll step back and think about this a bit.

Ok, I just warn now to ensure we get stderr data but clean up stale workdirs to prevent a stop-in-your-tracks bug that requires manual intervention.

This was broken by pex-tool#1961 which did not account for the two unlocked uses of AtomicDirectory by the pex.resolver for wheel building and installing. Although those directories deserve a second look as candidates for exclusive locking, we have seen no known issues with those directories since introduction of AtomicDirectory up until pex-tool#1961. As such, this change just restores the prior behavior for those two cases. Fixes pex-tool#1968

This was broken by #1961 which did not account for the two unlocked uses of AtomicDirectory by the pex.resolver for wheel building and installing. Although those directories deserve a second look as candidates for exclusive locking, we have seen no known issues with those directories since introduction of AtomicDirectory up until #1961. As such, this change just restores the prior behavior for those two cases. Fixes #1968

jsirois added 4 commits October 19, 2022 21:19

Extract atomic_directory module from common.

d2ea1f2

Always exclusively lock.

5313be5

Since all locks are now exclusive, use a fixed workdir name.

18bdaf6

Doc improvements.

7c7ad0d

jsirois requested review from Eric-Arellano, benjyw and stuhood October 20, 2022 05:50

benjyw approved these changes Oct 20, 2022

View reviewed changes

This was referenced Oct 20, 2022

ModuleNotFoundError: No module named 'pex' pantsbuild/pants#17221

Closed

Nondeterministic ImportError: No module named __pants_df_parser while computing changed targets pantsbuild/pants#16778

Closed

stuhood reviewed Oct 20, 2022

View reviewed changes

Accept a pre-existing workdir, but warn about the circumstances.

f5c5bc8

benjyw approved these changes Oct 21, 2022

View reviewed changes

jsirois merged commit dbd4c13 into pex-tool:main Oct 21, 2022

jsirois deleted the locking/debug branch October 21, 2022 19:12

jsirois mentioned this pull request Oct 31, 2022

pex build fails due to existing work-directory #1969

Closed

jsirois mentioned this pull request Nov 3, 2022

Restore AtomicDirectory non-locked good behavior. #1974

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-work atomic_directory locking for faster / clearer failures. #1961

Re-work atomic_directory locking for faster / clearer failures. #1961

jsirois commented Oct 20, 2022 •

edited

Loading

jsirois commented Oct 20, 2022

jsirois commented Oct 20, 2022 •

edited

Loading

benjyw left a comment

jsirois commented Oct 20, 2022 •

edited

Loading

jsirois commented Oct 20, 2022 •

edited

Loading

stuhood Oct 20, 2022

jsirois Oct 20, 2022

jsirois Oct 21, 2022

Re-work atomic_directory locking for faster / clearer failures. #1961

Re-work atomic_directory locking for faster / clearer failures. #1961

Conversation

jsirois commented Oct 20, 2022 • edited Loading

jsirois commented Oct 20, 2022

jsirois commented Oct 20, 2022 • edited Loading

benjyw left a comment

Choose a reason for hiding this comment

jsirois commented Oct 20, 2022 • edited Loading

jsirois commented Oct 20, 2022 • edited Loading

stuhood Oct 20, 2022

Choose a reason for hiding this comment

jsirois Oct 20, 2022

Choose a reason for hiding this comment

jsirois Oct 21, 2022

Choose a reason for hiding this comment

jsirois commented Oct 20, 2022 •

edited

Loading

jsirois commented Oct 20, 2022 •

edited

Loading

jsirois commented Oct 20, 2022 •

edited

Loading

jsirois commented Oct 20, 2022 •

edited

Loading