-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"mv: replace 'blah', overriding mode 0007?" hidden prompt seemingly hangs opam download #4516
Comments
Thank you for the extremely detailed log report! The detail and checking you've done leads very quickly to the problem. The issue is a race condition, combined with unusual permissions.
If I may be so bold as to offer debugging advice! Faced with this message:
There are only two possibilities: The issue here is a race condition (the file isn't apparent afterwards because one of the racing processes has renamed it again). When opam downloads files, the downloader is instructed to download to a .tmp.part file (see https://github.com/kwshi/buildah-bug-reports/blob/main/opam-help/fail-log#L62). On completion, this is renamed to .tmp and the checksums are verified. On success, this file has the .tmp extension removed and is formally cached. dune-configurator and dune (and menhir and menhirSdk - I don't think ocp-index is involved here, other than helping the timing of the race condition) come from the same upstream package. opam is attempting to do the same download (see https://github.com/kwshi/buildah-bug-reports/blob/main/opam-help/fail-log#L62 and https://github.com/kwshi/buildah-bug-reports/blob/main/opam-help/fail-log#L81) and the You can see the mv effect fairly simply with: touch foo bar
chmod 6 bar
mv foo bar I haven't been able to persuade my system to reproduce the issue (but your logs mean I have no doubt that it's real!). opam 2.1 generalises the scheduler so that builds and downloads can take place at the same time - if you're able to experiment with the 2.1 beta4 binary that would be quite handy, although I expect that it may make the race condition even harder to surface. This issue has always been present, but we wouldn't see this normally because with the expected 664 permissions on the files,
xref #3741 |
Thanks for the response!
On that note, it's worth mentioning that even when the file isn't apparent afterwards,
I do wonder if that teeters us a little bit closer to the "coreutils |
Ah!! Your description, along with that example, is exactly what I needed to find a minimal example reproducing this bug, independent of opam. Here's how it works:
Here is a sequence of commands implementing the above: touch a b
chmod 6 a
mv a x
ls -la
mv b a and output:
I have written some containerfiles demonstrating this issue on various base systems. (To be clear, this is not a bug I can reproduce on my host system either--only from within a podman/buildah container. Which is why I'm led to believe there is an error in the underlying filesystem implementation.) What's especially interesting is that, among those examples, Alpine is the only OS not exhibiting the bug, and incidentally Alpine is the only OS not using coreutils |
@rjbou will confirm, but I think that in 2.1 she already implemented that part:
which should avoid the issue. |
@kwshi - how interesting that a bug in opam surfaced a bug in fuse-overlayfs! You ought to be seeing a similar message from busybox |
Apart from alpine, what's another distro that uses busybox? Also, perhaps it has to do with differences in how the "already exists" check is done, namely
that enables one to be more robust against the bug than the other? (After all, the correct behavior is that it should not think the file already exists and therefore should not give the message.) |
The multiple download issue still exists, but as this is tracked in another issue I'm closing this one |
I'll preface this with a concession that I am trying to use opam in a very non-standard setting, and that may be the reason behind this bug. But I wanted to open an issue anyway to see if people more familiar with the internal workings of opam have better insight on why this is happening, or at least what is happening under the hood to help me find a more direct way to reproduce this error.
The issue in question: while trying to build a Podman container containing opam and a few packages, opam appears to hang indefinitely on
until I press the Enter key, which promptly causes it to proceed and fail with
Per advice from #3970, I reran the install command with
-vvv
and found that the command causing the "hang" was a prompt frommv
, saying:Indeed, simply pressing Enter causes it to fail, but typing
y
and pressing Enter actually allows the install to succeed. But clearly this isn't intended behavior, since (1) I ran opam with-y
and (2) this confirmation prompt is coming frommv
, notopam
.It's worth noting that
dune-configurator
is not the only package that causes this issue; I've also encountered it, depending on the order in which I install packages, the same issue withmenhirLib
,menhirSdk
, andocp-index
. Installingdune-configurator
first just happens to be the easiest way I've found to reproduce this message. That's why I think this issue is not specific to a single package but might have something to do with opam in general.There are two really strange things about this:
/home/opam/.opam/download-cache-sha256/e2/e2c4e8230f7c96236503fd75f22bdbc263639971bf104509e446855ded35ae1e.tmp
, so it totally baffles me whymv
thinks it's "replacing" it.opam install -y ocp-index
, and then runopam install -y dune-configurator
, then the prompt/hang/error doesn't appear at all.I am of the belief that the specific cause of this issue comes from Podman's rootless filesystem implementation. Nevertheless, my capacity to narrow down the exact source and find a clearer way to reproduce this bug is limited by the fact that I have no idea what exactly opam is doing under the hood to possibly induce this error, so I am seeking help here on figuring out exactly what is going on and why this error's appearance seems to vary depending on the order of package installations.
I also filed containers/buildah#2951 describing the same issue (describing observations on the Podman side of things).
opam config report
For reference, a complete log of the Podman build (together with the
opam install -vvvy dune-configurator
log) is available here: https://github.com/kwshi/buildah-bug-reports/blob/main/opam-help/fail-logThe text was updated successfully, but these errors were encountered: