Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nixbld user name/uid for macOS/darwin #4532

Merged
merged 1 commit into from
Mar 25, 2021

Conversation

abathur
Copy link
Member

@abathur abathur commented Feb 7, 2021

I've been able to confirm the update problem reported in #4531 on a spare system. This PR will:

  1. Add a migration script people can run before taking the update to avoid the problem. (We probably don't need to carry this in the source for long, but it seems good to have a canonical source to link.)
  2. Update the installer to adopt the same values (just for macOS--a hedge against this being an ongoing problem with every update).

Background

I'm enough of a masochist to run clean-install + upgrade cycles until I have a vague sense of the problem:

  1. It looks like the upgrade is doing some sort of user/account migration, and faceplants over something about our nixbld users (i.e., it runs clean if you remove them first).
  2. I get the impression, because of the report timing, that this is something new after Big Sur 11.0 and probably new in 11.2.
    • Reports in this thread have clarified that this has been an ongoing pain point on Big Sur.
    • I've since confirmed that I see this with the update to 11.3 beta--it definitely seems like this will be an ongoing problem.
  3. It doesn't seem to be a critical problem, as long as you ignore the error message that tells you to reinstall (i.e., users who jump the turnstile haven't reported ongoing trouble.)
  4. I eventually found a sane way around when I stumbled on some new sysadminctl -addUser flags ([-GID <group ID>], [-roleAccount]) and this usage note *Role accounts require name starting with _ and UID in 200-400 range.
    • I confirmed that the update runs clean if we change nixbld# usernames to _nixbld# and use UIDs in the 200-400 range.
    • The GID doesn't appear to matter, here.

Open questions

  1. I chose 301 as the new starting UID, but it's just a guess. Objections?
    • There are many macOS system role accounts in the 200-300 range (but none from 300-400), and at least on Catalina the only free contiguous block is from 271-299. I read this as 200-300 being Apple's territory, but if they go make a few dozen new role users I assume they'll just keep rolling into the 300s?
    • I don't know if these are hard-set or dynamically allocated at update/install? In the latter case it'd be fine if they keep rolling into the 300s; in the former it might cause trouble.
  2. Taking up 32 UIDs from 301-332 by default feels a bit greedy (but it may be wasted energy to sweat this?) Is it feasible to make a more-judicious choice (at least on macOS, perhaps based on sysctl -n hw.ncpu or sysctl -n hw.physicalcpu?)

@emilazy
Copy link
Member

emilazy commented Feb 7, 2021

Maybe we could count down from 400 to minimize the risk of an update collision disaster down the line? The number of nixbld users limits how high a -j you can use, and there's frequently reason to want to build more jobs than you have cores (especially with hyperthreading, but even without if most of your builds are short/IO-heavy and the compute-heavy stuff doesn't overload your CPU; e.g. Rust does its own concurrency-limiting logic anyway), so I'd be a bit hesitant about reducing it too far.

@kevingriffin
Copy link

I'd like to add that I've actually seen this behavior in Big Sur ever since the betas, on both Intel and ARM. Our entire organization uses nix for development machines, and we've seen it consistently across updates.

If what we saw was indeed this issue, it seems pretty likely that this isn't limited to 11.2, but is just part of something new in Big Sur. I have a feedback ticket open with Apple about this issue, and I'll link to this PR as part of the dialogue there.

@dhess
Copy link

dhess commented Feb 8, 2021

  1. I get the impression, because of the report timing, that this is something new after Big Sur 11.0 and probably new in 11.2.

It's not super important, but we run Nix on several different Macs of various makes and models, and we saw this issue on both the 11.0->11.1 update, and the 11.1->11.2 update. I didn't really think anything of it, nor mention it in IRC, during the 11.0->11.1 update because I figured it was just a typical Apple one-off bug. But now I'm certain that it's been around since the 11.1 update, at least.

(We may have also seen it with the original 11.0 install, but I don't recall anymore.)

@abathur
Copy link
Member Author

abathur commented Feb 8, 2021

@kevingriffin @dhess Thanks for clarifying this; updated the summary.

@matthewbauer
Copy link
Member

Does the group id not matter? It's still 30000 here.

The migrate script doesn't appear to be working since the nixbld group still has references to nixbld1, nixbld2, nixbld3, nixbldN:

$ dscacheutil -q group -a name nixbld
name: nixbld
password: *
gid: 30000
users: nixbld1 nixbld2 nixbld3 nixbld4 nixbld5 nixbld6 nixbld7 nixbld8 nixbld9 nixbld10 nixbld11 nixbld12 nixbld13 nixbld14 nixbld15 nixbld16 nixbld17 nixbld18 nixbld19 nixbld20 nixbld21 nixbld22 nixbld23 nixbld24 nixbld25 nixbld26 nixbld27 nixbld28 nixbld29 nixbld30 nixbld31 nixbld32

I'm not sure how to fix this. I ran:

for i in $(seq 1 32); do sudo dscl . change /Users/_nixbld$i RecordName _nixbld$i nixbld$i; done

to reverse this.

@abathur
Copy link
Member Author

abathur commented Feb 8, 2021

Does the group id not matter? It's still 30000 here.

It doesn't appear to, so I didn't touch it.

The migrate script doesn't appear to be working since the nixbld group still has references to nixbld1, nixbld2, nixbld3, nixbldN

Good catch, it looks like I got caught up in optimizing for the smallest change that didn't drop us into recovery. I'll disable the recommendation in the other issue and take a look at it.

@abathur
Copy link
Member Author

abathur commented Feb 8, 2021

@matthewbauer I've updated it to remove each account with dseditgroup before the change and re-add it after. I have force-pushed this change, but won't have a chance to test the whole script again until later today.

Updates:

  • First swing at this had a typo. I've force-pushed again, and will update when tested.
  • Second swing does appear to work. :)
If you've already run this script (and have not reverted it) you should be able to fix group membership with these commands
sudo dscl . delete /Groups/nixbld dsAttrTypeStandard:GroupMembership
sudo dscl . append /Groups/nixbld dsAttrTypeStandard:GroupMembership _nixbld{1..32}
  • this assumes bash; you may need to translate it to the idioms of your shell :)

@abathur
Copy link
Member Author

abathur commented Feb 9, 2021

I'll also update the main post, but just wanted to mention here that I've added some extra evidence that this will be an ongoing problem:

  1. Install macOS 11.0 (beta), and Nix
  2. Update to 11.2, observe boot to recovery as already described, and reboot into the OS.
  3. Update to 11.3 (beta), and get booted into a fresh hell. This time, after making me pass an activation lock with my AppleID, it booted me to some sort of recovery/reset screen I wasn't already familiar with that was trying to goad me into resetting the password for all of my accounts (the list shows all of the nixbld accounts, with my real one mixed in). It never prompted me for an admin login previous to this. Despite this, I was still able to simply boot into the OS.

Here's a picture of this:
PXL_20210209_063133824 MP

@abathur
Copy link
Member Author

abathur commented Feb 9, 2021

If you've been affected by this and want to report it to Apple, you can refer to the feedback I filed last night: FB8997501

@dhess
Copy link

dhess commented Feb 9, 2021

I ran the nixbld -> _nixbld conversion script in this PR on an 11.2 system that had previously had problems with System Update, then updated to 11.2.1, and everything went fine!

However, this system uses nix-darwin and when I ran darwin-rebuild switch after the update, nix-darwin created these users again, so there are more moving parts here, I'm afraid. Anyway, the System Update issue appears to root-caused!

@abathur
Copy link
Member Author

abathur commented Mar 11, 2021

I've rebased to re-run tests now that #4577 is merged. macOS install: https://github.com/abathur/nix/runs/2088031573

@robertoschwald
Copy link

However, this system uses nix-darwin and when I ran darwin-rebuild switch after the update, nix-darwin created these users again, so there are more moving parts here, I'm afraid. Anyway, the System Update issue appears to root-caused!

Does not happen here. Users stay _nixbldXX after darwin-rebuild switch on my systems.

@dhess
Copy link

dhess commented Mar 15, 2021

However, this system uses nix-darwin and when I ran darwin-rebuild switch after the update, nix-darwin created these users again, so there are more moving parts here, I'm afraid. Anyway, the System Update issue appears to root-caused!

Does not happen here. Users stay _nixbldXX after darwin-rebuild switch on my systems.

It's been fixed in nix-darwin since I wrote that comment on Feb 9. :)

@domenkozar domenkozar merged commit dc6a8f1 into NixOS:master Mar 25, 2021
@domenkozar
Copy link
Member

@abathur should we backport this to 2.3?

@abathur
Copy link
Member Author

abathur commented Mar 25, 2021

@domenkozar I think so.

@domenkozar
Copy link
Member

Pushed to 2.3-maintenance branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants