-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS: unexpected EOF reading a line #3605
Comments
This test has been randomly failing for a while. |
In case it's relevant, it also fails on bar:
|
#3137 looks quite relevant, there might be a bug in Nix. |
Possibly relevant: #1704 |
I'm not sure it's the same issue since the context is different, but I've run into some flaky instances of this message. Let me know if you think I should open this separately? I took a little time this week to pick at writing a plain bash script to set up Nix on travis-CI in a way that is hopefully more maintainable than the Ruby language integration, but as soon as I had the install working correctly I started noticing flaky failures with this message: installing 'nix-2.3.7'
error: unexpected EOF reading a line
---- oh no! -------------------------------------------------------------------- Some notes:
Edit: I've force a rebuild on 10.13 every time I can remember for the past 5 days and I've finally strung together about 60 builds without a single 10.13 failure. Given how common the 10.14 and 10.15 failures have been, I'm feeling fairly confident at this point that (again, assuming this is the same as the issue observed here) that this issue doesn't manifest on 10.13. I haven't quite decided yet if I should still see it as a blocker for my purpose or not 😬 Here are links to specific runs: |
I saw a fresh probable instance of this on CI today:
|
I asked @domenkozar about this on IRC and he confirmed that he has not seen this on github-actions macOS runners running either 10.14 or 10.15. |
I have been running Nix builds on a 2013 MBA running macOS 10.14 as I poke at Big Sur installer updates, and I've been seeing spotty EOF errors (on the scale of 0-4 per distinct attempt to build). Most of them have been in tests/user-envs.sh, but I've also seen tests/remote-store.sh come up at least once. Edit: It seems like these have been growing more common as I kept building. I've attached a log of a loop I did where it tried and failed more than 10 times. At some point it got stuck, so I killed it. After a reboot it built fine. |
Logging test is now failing too quite often with:
|
There's now debugging prints in Nix master that shows there's no contents where Nix expects it and thus fails. I'd like to offer $100 from https://opencollective.com/nix-macos fund to anyone that fixes this. |
Offering $150 now to whomever fixes it. |
Note that this happens also in the wild:
Existing debugging doesn't yield much insight, I wonder what other information could we extract to see why setting up build environment fails? |
|
It's not using sandbox in the tests, so the builder is actually just the derivation? |
Using -vvv:
|
That seems somewhat related, but the log using
It seems like that |
We have a heisen bug, as soon as I add debugging statements to buildenv, the issue disappears. |
Opened #4965 |
cc @lheckemann as you helped debug previous EOF :) |
i think i was getting a similar error when running |
I checked the various test on master and run them a bunch of time and no failure. I have even checkout this PR #3605 (comment) I can't seems to reproduce. I have also removed the snippet that retry on failure. For the next time someone see a failure can you provide: how you got your nix (checksum of channel or of flake) Have you sandbox enabled (I have sandbox enabled. Which command you used. If you have a repro somewhere snapshot in a branch and don't touch it. |
Can't test:
|
Possibly related: #7242 |
Discussed in the Nix team meeting:
|
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-03-13-nix-team-meeting-minutes-40/26309/1 |
Hopefully this fixes "unexpected EOF" failures on macOS (NixOS#3137, NixOS#3605, The problem appears to be that under some circumstances, macOS discards the output written to the slave side of the pseudoterminal. Hence the parent never sees the "sandbox initialized" message from the child, even though it succeeded. The conditions are: * The child finishes very quickly. That's why this bug is likely to trigger in nix-env tests, since that uses a builtin builder. Adding a short sleep before the child exits makes the problem go away. * The parent has closed its duplicate of the slave file descriptor. This shouldn't matter, since the child has a duplicate as well, but it does. E.g. moving the close to the bottom of startBuilder() makes the problem go away. However, that's not a solution because it would make Nix hang if the child dies before sending the "sandbox initialized" message. * The system is under high load. E.g. "make installcheck -j16" makes the issue pretty reproducible, while it's very rare under "make installcheck -j1". As a fix/workaround, we now open the pseudoterminal slave in the child, rather than the parent. This removes the second condition (i.e. the parent no longer needs to close the slave fd) and I haven't been able to reproduce the "unexpected EOF" with this.
Hopefully this fixes "unexpected EOF" failures on macOS (NixOS#3137, NixOS#3605, NixOS#7242, NixOS#7702). The problem appears to be that under some circumstances, macOS discards the output written to the slave side of the pseudoterminal. Hence the parent never sees the "sandbox initialized" message from the child, even though it succeeded. The conditions are: * The child finishes very quickly. That's why this bug is likely to trigger in nix-env tests, since that uses a builtin builder. Adding a short sleep before the child exits makes the problem go away. * The parent has closed its duplicate of the slave file descriptor. This shouldn't matter, since the child has a duplicate as well, but it does. E.g. moving the close to the bottom of startBuilder() makes the problem go away. However, that's not a solution because it would make Nix hang if the child dies before sending the "sandbox initialized" message. * The system is under high load. E.g. "make installcheck -j16" makes the issue pretty reproducible, while it's very rare under "make installcheck -j1". As a fix/workaround, we now open the pseudoterminal slave in the child, rather than the parent. This removes the second condition (i.e. the parent no longer needs to close the slave fd) and I haven't been able to reproduce the "unexpected EOF" with this.
Also seeing this on M1 mac when trying to use a flake for
in any version (2.11, 2.12, 2.13, 2.14)
|
Update: killing nix-daemon fixed it.
|
Hopefully fixed in #8049 |
Closing for now, if it comes back we can reopen it. |
Another one, from this CI run
|
Hopefully this fixes "unexpected EOF" failures on macOS (NixOS#3137, NixOS#3605, NixOS#7242, NixOS#7702). The problem appears to be that under some circumstances, macOS discards the output written to the slave side of the pseudoterminal. Hence the parent never sees the "sandbox initialized" message from the child, even though it succeeded. The conditions are: * The child finishes very quickly. That's why this bug is likely to trigger in nix-env tests, since that uses a builtin builder. Adding a short sleep before the child exits makes the problem go away. * The parent has closed its duplicate of the slave file descriptor. This shouldn't matter, since the child has a duplicate as well, but it does. E.g. moving the close to the bottom of startBuilder() makes the problem go away. However, that's not a solution because it would make Nix hang if the child dies before sending the "sandbox initialized" message. * The system is under high load. E.g. "make installcheck -j16" makes the issue pretty reproducible, while it's very rare under "make installcheck -j1". As a fix/workaround, we now open the pseudoterminal slave in the child, rather than the parent. This removes the second condition (i.e. the parent no longer needs to close the slave fd) and I haven't been able to reproduce the "unexpected EOF" with this. (cherry picked from commit c536e00)
I also ran into this on Darwin after updating Nix to 2.18.X (from 2.11.x). A reboot fixed the issue. Suspect that the daemon was simply incompatible with the client. |
Describe the bug
Nix has failed to build on darwin with this log.
https://hydra.nixos.org/build/119296096/nixlog/1
It did succeed afterwards, so unless someone recognizes what's the problem in the log, it's ok to close this and move on.
The failure happens in test case tests/user-envs.sh, with the last couple of lines being:
Steps To Reproduce
This may be a race condition or other intermittent bug.
The entire log of the build failure is available here:
https://hydra.nixos.org/build/119296096/nixlog/1
Expected behavior
Nix just builds on darwin without a test failure.
nix-env --version
outputN/A
Additional context
The text was updated successfully, but these errors were encountered: