-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
test_glob: test_selflink() fails randomly on Linux #109959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I didn't see this failure recently, I close the issue. |
Seen recently in a PR CI run. |
Dumping the expected value and the set content from the failed run. The shortest value in the set has one more
|
I assume it gets up to a max path length, so the OS & CWD will affect that number. |
That was my first thought too, but it doesn't seem to be the case, since all the paths in the set were accessible on disk (the set being checked is derived from the glob result). The largest value in the
I also confirmed both tracebacks are pointing at the same assertion:
To get the reported failures:
|
I'm tempted to change the way this test works to build a full set of expected values and then do a set comparison rather than checking the entries one by one. Something like:
(and similarly for the other two search loops) It won't fix the problem, but it should make it clearer how many entries are actually missing when the test does fail (since it will only show the diff, rather than all the remaining entries after the first missing entry). |
It's hard to see any possible source of discrepancies between "directories scanned" and "paths reported" in a recursive glob, since The check for whether that recursion operation is valid then appears to just be whether |
What a mysterious bug! The pattern in question (" Lines 54 to 60 in 6b280a8
I suppose it's possible there's a bug in (edit: removing the |
Hypothesis tests on Ubuntu: https://github.com/python/cpython/actions/runs/12415820872/job/34663070210?pr=128097
|
I entirely removed the use of |
New theory: If the fd was closed prematurely, then |
Nevermind - the test doesn't use pass |
Debug logs (length of the current working directory) when the test fails. Logs from Tests / Hypothesis tests on Ubuntu CI.
|
If I run the test in a loop on a busy system, it fails randomly. The problem is that the ELOOP error is not deterministic. Sometimes, the issue occurs on strace:
I would expect that once we reach the maximum number of symlinks, stat() fails with ELOOP, longer path would also fail with ELOOP, but it's not the case. The Linux tool |
Simpler reproducer: import os
import sys
MAX_LINKS = 40
TEMPDIR = 'tempdir'
def BUG():
print()
print("BUUUUUUUUUUUUUUUG")
print("BUUUUUUUUUUUUUUUG")
print("BUUUUUUUUUUUUUUUG")
sys.exit(1)
def check_lexists_bug():
bug = False
for depth in range(1, MAX_LINKS+2):
path = 'link/' * depth
err = None
try:
st = os.stat(path, follow_symlinks=True)
except OSError as exc:
err = exc
lexists = False
else:
lexists = True
if not lexists:
lexists = f'{lexists} <============ {err!r}'
if depth < MAX_LINKS:
bug = True
print(f'Depth {depth}: lexists? {lexists}')
if not lexists:
break
if bug:
BUG()
print()
run = 1
while True:
print(f"=== Run {run} ===")
old_dir = os.getcwd()
os.mkdir(TEMPDIR)
try:
os.chdir(TEMPDIR)
open('file', 'wb').close()
os.symlink(os.path.join('..', TEMPDIR), 'link')
check_lexists_bug()
finally:
os.unlink('file')
os.unlink('link')
os.chdir(old_dir)
os.rmdir(TEMPDIR)
run += 1 Example of bug: lexists() fails at depth 33 but works at depths before and after. The bug occurs randomly on a busy system. On an idle system, the bug doesn't show up. For example, I run
Another example where lexists() fails at multiple depths:
|
Does that imply that it's a glibc or kernel bug? |
I can see the behavior at the syscall level, so it's the behavior of the Linux kernel. I don't know if it's a bug. I don't know if Linux has guarantee that all path operations support up to 40 levels of links. |
My
I'm on 5.15.0-126-generic FWIW. Could SELinux somehow cause this? |
I saw the issue on the Ubuntu Hypothesis CI, and Ubuntu doesn't use SELinux (but AppArmor). |
The flaky test is now skipped. Since the root issue comes from the Linux kernel, maybe we should just remove the test. Or the |
FWIW this might be fixed by #116392, because it no longer |
The test is not reliable, it fails randomly on Linux: python#109959 (comment)
The test is not reliable, it fails randomly on Linux: #109959 (comment)
The test is not reliable, it fails randomly on Linux: python#109959 (comment)
AMD64 RHEL8 Refleaks 3.x:
The tested path
dir/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link
has one lesslink/
than the first path of the test: 'dir/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/'.build: https://buildbot.python.org/all/#/builders/259/builds/892
Linked PRs
test_glob
#128255The text was updated successfully, but these errors were encountered: