Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebooting always unmount Nix volume in macOS Monterey 12.5 #6839

Open
fwten opened this issue Jul 26, 2022 · 14 comments
Open

Rebooting always unmount Nix volume in macOS Monterey 12.5 #6839

fwten opened this issue Jul 26, 2022 · 14 comments
Labels

Comments

@fwten
Copy link

fwten commented Jul 26, 2022

Describe the bug

The Nix Store volume is not mounted automatically after rebooting in macOS Monterey 12.5 (Intel).

Steps To Reproduce

  1. Restart macOS Monterey 12.5.
  2. Disk Utility shows that the Nix Store APFS volume is not mounted.
  3. /nix path is not available.

Expected behavior

The Nix Store APFS volume should be automatically mounted.

nix-env --version output

nix-env (Nix) 2.10.3

Additional context

Things were working fine under Monterey 12.4, and this issue popped up after upgrading to Monterey 12.5.

I tried uninstalling and re-installing Nix, but that did not seem to fix the issue.

To get nix working again, I would have to mount the Nix Store APFS volume in Disk Utility manually and run sudo launchctl kickstart -k system/org.nixos.activate-system.

/etc/fstab and /etc/synthetic.conf seemed fine:

$ cat /etc/fstab 
#
# Warning - this file should only be modified with vifs(8)
#
# Failure to do so is unsupported and may be destructive.
#
UUID=D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2 /nix apfs rw,noauto,nobrowse,suid,owners
$ cat /etc/synthetic.conf 
nix
run	private/var/run
$ diskutil info disk3s7
   Device Identifier:         disk3s7
   Device Node:               /dev/disk3s7
   Whole:                     No
   Part of Whole:             disk3

   Volume Name:               Nix Store
   Mounted:                   Yes
   Mount Point:               /nix

   Partition Type:            41504653-0000-11AA-AA11-00306543ECAC
   File System Personality:   APFS
   Type (Bundle):             apfs
   Name (User Visible):       APFS
   Owners:                    Enabled

   OS Can Be Installed:       Yes
   Booter Disk:               disk3s2
   Recovery Disk:             disk3s3
   Media Type:                Generic
   Protocol:                  USB
   SMART Status:              Not Supported
   Volume UUID:               D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2
   Disk / Partition UUID:     D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2

   Disk Size:                 2.0 TB (2000189177856 Bytes) (exactly 3906619488 512-Byte-Units)
   Device Block Size:         4096 Bytes

   Container Total Space:     2.0 TB (2000189177856 Bytes) (exactly 3906619488 512-Byte-Units)
   Container Free Space:      1.9 TB (1851595505664 Bytes) (exactly 3616397472 512-Byte-Units)
   Allocation Block Size:     4096 Bytes

   Media OS Use Only:         No
   Media Read-Only:           No
   Volume Read-Only:          No

   Device Location:           External
   Removable Media:           Fixed

   Solid State:               Yes

   This disk is an APFS Volume.  APFS Information:
   APFS Container:            disk3
   APFS Physical Store:       disk2s2
   Fusion Drive:              No
   Encrypted:                 No
   FileVault:                 No
   Sealed:                    No
   Locked:                    No

In case this is relevant, I'm also running this Monterey from an external drive with the following structure:
image

@fwten fwten added the bug label Jul 26, 2022
@abathur
Copy link
Member

abathur commented Jul 26, 2022

Thanks for the good report. Gets a lot of basics out of the way. Three (groups of) questions to start:

  1. When you say you uninstalled and reinstalled, did you DIY, or follow the instructions from the manual?

    (I suspect you followed them, since you didn't report any reinstall issues, but just in case...)

  2. Does /Library/LaunchDaemons/org.nixos.darwin-store.plist exist? Does it refer to the same volume UUID in your earlier output? Does it mention /usr/bin/security? If you reboot and run sudo launchctl kickstart -k system/org.nixos.darwin-store.plist, does it mount the volume?

  3. Can you elaborate on the statement below (lay out exactly what happened and what the symptoms were)?

    Things were working fine under Monterey 12.4, and this issue popped up after upgrading to Monterey 12.5.

    There's an ongoing known issue (macOS updates often break nix installation (updates replace path-hooks on multi-user install) #3616) that has been causing nix not to appear on your PATH after a macOS update (because the shell hook isn't being run). A little more information may help clarify whether this is a sign that updates are breaking things in a novel way (and may help others who'll presumably be running into the same issue find the thread easier).

@fwten
Copy link
Author

fwten commented Jul 26, 2022

Hi @abathur, thank you for looking into this! I will try to clear up the questions you raised here, please do let me know if you need more information:

  1. I followed the instructions from the manual and there were no issues at all here :)

  2. The /Library/LaunchDaemons/org.nixos.darwin-store.plist exists and refers to the same UUID indeed. No mention of /usr/bin/security though. I tried running sudo launchctl kickstart -k system/org.nixos.darwin-store.plist, but this did not mount the volume. Interestingly, running the commands inside it manually in a terminal worked: /usr/sbin/diskutil mount -mountPoint /nix D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2. Here's the content of the file just in case:

$ cat /Library/LaunchDaemons/org.nixos.darwin-store.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>RunAtLoad</key>
  <true/>
  <key>Label</key>
  <string>org.nixos.darwin-store</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/sbin/diskutil</string>
    <string>mount</string>
    <string>-mountPoint</string>
    <string>/nix</string>
    <string>D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2</string>
  </array>
</dict>
</plist>
  1. Alright, I'll try to elaborate on this a little bit more as requested:

Can you elaborate on the statement below (lay out exactly what happened and what the symptoms were)?

What happened and the symptoms:
Right after the upgrade to 12.5 finished (after several automated reboots as well), I logged into my account and saw that my terminal shell and prompt were different. I thought I might have messed up the shell config file so I tried to look into it, but then noticed some of my cli tools were missing as well.

Troubleshooting and investigations:
That's how I realized my /nix was gone because the shell and the missing tools were all installed using nix. I already had some (very) brief experiences uninstalling and re-installing Nix before this and the experience was very straightforward and pleasant, so I figured I could do the same here instead of potentially messing up the system even further. The re-installation went well and Nix works again (albeit a fresh install), but the same issue re-surfaced after I restarted the system again. Since it clearly worked after a re-installation (and before a reboot), I studied the installation script and tried to narrow down potential solutions, and that's how I ended up with the information described above. Just to be clear, apart from not being mounted automatically, the Nix volume itself is seemingly fine. I just had to mount it and launch org.nixos.darwin-store.plist manually, all the packages and setup survived as far as I could tell.

Other remarks:
By the way, since you mentioned /usr/bin/security, I recalled having some issues as I was studying the installation script, in particular with these two commands:

$ sudo /usr/sbin/diskutil apfs unlockVolume disk3s7 -verify -stdinpassphrase -user D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2

<stuck here running/loading/waiting after inputting user password ...>

and

$ sudo security find-generic-password -s D6FA2E1D-00EC-4C24-236B-44D26FC3E8A2 -w

security: SecKeychainSearchCopyNext: The specified item could not be found in the keychain.

I'm not sure how relevant this is but just thought I should mention it. Since I managed to get my setup running again just by mounting /nix manually and running sudo launchctl kickstart -k system/org.nixos.activate-system, I decided to stop investigating further.

I also came across these similar issues (#3616 etc) during my investigations, but I thought it wasn't applicable here because my whole Nix volume was missing, hence I didn't expect something that sources from /nix would work at all. Nevertheless, I still gave it a try but it didn't solve anything as far as I could tell.

@abathur
Copy link
Member

abathur commented Jul 26, 2022

The /usr/bin/security shouldn't be present in your case (it's just added when you have FileVault enabled on the volume). Just making sure the mounter sounds appropriate for your drive.

Unfortunately, the exact circumstances aren't something I think I've heard of. Two thoughts:

  1. You could open Console.app (this will be easier if you can close everything else on the system) after you reboot, try running the kickstart command again, and see if you can find anything in the console that might indicate why it's failing. (You can also do this with the log command if you happen to be familiar with its predicates or prefer searching around in the output with something other than Console.app's interface...)
  2. This is reminding me a little bit of MacOS /nix unmount when reboot. /nix ownership change to root #4640, though the context there was different (just cloud instances and local VMs as far as I know). It might be worth picking through some of the early steps information-gathering steps in there to see if anything leaps out.

@pejaab
Copy link

pejaab commented Feb 19, 2023

I'm not sure whether this helps, but since I ran into the same issues, I'll share my observations and how I fixed it temporarily.

Basically the output of

sudo launchtl print system/org.nixos.darwin-store

shows error code 1. Error code 1 means operation not permitted.
This error only showed after reboot, manually triggering the start of the daemon never showed this error and would mount /nix fine.
I figured, this error was connected to the the fact that the FileVault encrypted Volume needed to be unlocked and somewhere it was lacking the permission. I cannot tell where though.
After disabling FileVault and changing system/org.nixos.darwin-store to just mount the /nix Volume instead of needing to unlock, the unmounting issue is gone, and /nix is mounted fine after every reboot.

I don't necessarily recommend disabling FileVault, but for a temporary solution on a Mac that never leaves home, I can live with it for the time being.

@abathur
Copy link
Member

abathur commented Feb 19, 2023

@pejaab You might be able to disambiguate by looking for a credential with the same UUID in keychain.

Here's how it's formatted/named/described when it's added:

/usr/bin/security -i <<EOF
add-generic-password -a "$volume_label" -s "$volume_uuid" -l "$volume_label encryption password" -D "Encrypted volume password" -j "Added automatically by the Nix installer for use by $NIX_VOLUME_MOUNTD_DEST" -w "$password" -T /System/Library/CoreServices/APFSUserAgent -T /System/Library/CoreServices/CSUserAgent -T /usr/bin/security "/Library/Keychains/System.keychain"
EOF

If this credential goes missing for some reason, or if there is one but it doesn't correspond to your volume's UUID, the darwin-store daemon won't be able to unlock it.

Technically macOS itself can unlock this on mount--but it does it too late to prevent subtle race condition failures if your system needs to run executables or restore apps/window contents that are on the Nix Store volume.

The installer sets the volume up not to use the macOS built-in automounting to make sure problems like this have to get promptly and directly addressed instead of just causing people flaky hard-to-troubleshoot boot problems that might result in data loss and make them miserable for months or years.

@pejaab
Copy link

pejaab commented Mar 1, 2023

@abathur thanks for the explanation. From the discussion above I was more or less able to extrapolate how this should work. The corresponding credential was persisted correctly in an entry in the Keychain App.
The behaviour persisted even after full removal of nix and new installation (I ended up with two entries in Keychain with the corresponding UUID of the newly added volume and the old one).

When manually starting the system/org.nixos.darwin-store deamon, the volume would mount successfully, not so on startup.
As mentioned, for the time being it's fine for me in this case to not have this volume encrypted.
I thought this may give you some further evidence in figuring out the root cause here or help the next person stumbling into this issue...

@abathur
Copy link
Member

abathur commented Mar 1, 2023

I'm a bit stumped on why it would be failing on boot and not when you manually launch the daemon (and why your case is differing with OPs on this point).

I agree that it does smell like there's some sort of permission difference.

Does your device happen to be an org device that may have an MDM profile? (I'm not sure why an org profile might appear to be restricting root more than your user, so I don't really think this would explain it--but we do have a small number of known problems associated with profile-enforced permissions/restrictions.)

@pejaab
Copy link

pejaab commented Mar 2, 2023

No it's not an org device and doesn't have an MDM profile.

If there is anything you feel like worth trying, I'm happy to support in case that gets anyone further.
But as said you don't need to spend time on this for my sake.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/upgrading-to-macos-sonoma/33580/5

@Endle
Copy link

Endle commented Nov 12, 2023

I met a similar problem with you - after I moved my /nix to an external SSD, following p3 and p4

Same as @pejaab pointed in #6839 (comment) I can manually execute sudo launchctl load -w /Library/LaunchDaemons/org.nixos.darwin-store.plist or /usr/sbin/diskutil mount -mountPoint /nix A5671.., but darwin-store.plist itself fails when booting

I guess this is related to race condition for external storage: When darwin-store is invoked, the external USB drive is not ready, causing the failure.

@emileindik
Copy link

emileindik commented Aug 4, 2024

I am facing the same issue after doing a macOS upgrade to Sonoma. /etc/bashrc was overwritten, so I re-added this block to the file

# Nix
if [ -e '/nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh' ]; then
  source '/nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh'
fi
# End Nix

Now my remaining issue is the nix volume not mounting on startup (I'm not using external hard drives).
To workaround, I first manually mount the nix volume in disk utils, then run

$ sudo launchctl load /Library/LaunchDaemons/org.nixos.darwin-store.plist

I am using FileVault and would like to continue using it. The volume UUID agrees with the UUID in org.nixos.darwin-store.plist.

Attaching both .plist files
org.nixos.darwin-store.txt
org.nixos.nix-daemon.txt

By the way, I've seen some threads suggesting to run sudo launchctl load /Library/LaunchDaemons/org.nixos.nix-daemon.plist and others sudo launchctl load /Library/LaunchDaemons/org.nixos.darwin-store.plist. How should I think about the difference between these?

@abathur
Copy link
Member

abathur commented Aug 4, 2024

@emileindik I'm skeptical that this is caused (or solely caused) by the update because we don't have the volume of corroborating reports we'd expect if macOS updates were systematically overwriting /etc/bashrc or messing up the volume-mounting service (in contrast with updates overwriting /etc/zshrc which is extensively corroborated by reports).

I queried in the Nix on macOS room on Matrix and no one has corroborated seeing this on any Sonoma update so far (but I'll update if someone does).

Attaching both .plist files...

I don't see anything wrong with these at a glance.

How should I think about the difference between these?

They are separate services. darwin-store is only responsible for mounting the store at boot, and nix-daemon is only responsible for running the nix daemon. The latter only makes sense once the volume is mounted.

I'm not sure what's going on, so I'll just think out loud a bit...

  • If restarting the store would derail you:
    • Do you now or have you had nix-darwin installed? Does ls -la /etc/*rc indicate that bashrc is a symlink?

      (IIRC nix-darwin does or at least can replace the shell profiles with symlinks to /etc/static/*rc, and AFAIK these target files won't have the block quoted above like you would in a stock Nix install. I believe the equivalent statement there is in a __NIX_DARWIN_SET_ENVIRONMENT_DONE block. This wouldn't really help us figure out why the volume isn't mounting, but it may help us figure out if /etc/bashrc and the update are a red-herring.)

    • Is this a corporate/org device that may have an MDM profile?

    • Confirm you still see nix in /etc/fstab and /etc/synthetic.conf

    • If none of the above have yielded a cause, maybe check logs after boot for any clues. I'm not sure what we'd be looking for, but Console.app and /var/log/system.log are where I'd start. You can also modify the plist to add dedicated logfile paths for the services.

  • If restarting the store won't derail you much I would probably just uninstall (https://nixos.org/manual/nix/stable/installation/uninstall.html#macos) and reinstall, myself. If you're still seeing the problem afterwards, we'll have ~efficiently narrowed it down to something other than the Nix install or update.

@emileindik
Copy link

emileindik commented Aug 7, 2024

Thanks for the quick response @abathur. I updated from Sonoma 14.6 to a developer beta version (14.6.x I think?). That borked my /etc/bashrc. I've since reverted back to 14.6 and this time /etc/bashrc stayed intact. Telling, perhaps.

  • I installed nix by way of Jetify's devbox, so I'm not sure about nix-darwin. ls -la /etc/*rc doesn't show any symlinks.
  • Not corporate device. Personal MBP.
  • /etc/fstab contents:
UUID=5D19FE81-5EBD-475F-B25A-07AE7AB4CC67 /nix apfs rw,noauto,nobrowse,suid,owners
  • /etc/synthetic.conf contents:
nix
  • nix-daemon.log has a bunch of accepted connection from pid <unknown>, user emileindik. This was perhaps me trying to run nix cmds before having run sudo launchctl load /Library/LaunchDaemons/org.nixos.darwin-store.plist. Immediately after boot, launchd doesn't have any logs pertaining to nix and the only darwin log is from Apple's diagnostic plist:
2024-08-07 14:57:29.767201 (system/com.apple.sysdiagnose) <Error>: Could not import service from caller: path = /System/Library/LaunchDaemons/com.apple.sysdiagnose.darwinos.plist, caller = launchd[1], error = 158: Service cannot be loaded on current os variant

I will report back if I uninstall/reinstall nix and still see issue.

Many thanks!

@emileindik
Copy link

Update: finally got it working after reading last post in this thread https://discourse.nixos.org/t/macos-upgrade-breakage/50691/7
Had to turn both of these on.
Screenshot 2024-09-08 at 8 59 37 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants