Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't bring up networking in the initramfs on first boot by default #443

Closed
jlebon opened this issue Mar 31, 2020 · 8 comments · Fixed by coreos/fedora-coreos-config#426
Closed
Labels
jira for syncing to jira

Comments

@jlebon
Copy link
Member

jlebon commented Mar 31, 2020

Currently, Fedora CoreOS and RHEL CoreOS always try to bring up networking on first boot in the initramfs: https://github.com/coreos/coreos-assembler/blob/87098329e7d3112c8544d3706de45581ce0c4d59/src/grub.cfg#L48.

The reason for this is that we can't be sure whether Ignition will require access to the network to fetch remote resources.

However, this causes a host of problems. Essentially, in any environment that strays from DHCP (e.g. static IP or no networking), it's a pain or sometimes impossible to have to change kernel arguments either at install time, or during the first boot itself. The networking kargs are also not a nice interface.

One obvious example is the live ISO; if no Ignition config is provided (or if one is provided but doesn't require networking), there's no reason to bring up networking (#349).

We should make networking transparently optional by only bringing it up when Ignition needs it.

@jlebon
Copy link
Member Author

jlebon commented Mar 31, 2020

@jlebon
Copy link
Member Author

jlebon commented Mar 31, 2020

I call this the "conditional networking" issue. There's a proposal to solve this in Ignition itself: coreos/ignition#956. That patch and the related PRs in coreos/ignition#956 (comment) solve this issue.

@miabbott
Copy link
Member

Notes from internal meeting - https://hackmd.io/LwAyi1T6SnKU_zX79DdHSg

@cgwalters
Copy link
Member

cgwalters commented Mar 31, 2020

Don't bring up networking in the initramfs on first boot by default

AGREEMENT: simplified live ISO approach for the short term; explore generic approach for later releases

#443

Option 1: fetch-offline

(+) Solves the general case
(-) Runs systemctl start network-online.target

Option 2: Special case only ignition.config.url and /config.ign cases

(+) Solves the primary ISO case that we really need right now
(-) Doesn't solve the general case, so we will have to keep explaining the special cases

Notes:

  • [jlebon] problem: live ISO requires networking, need to be able to boot ISO w/o network
  • [jlebon] do we fix just the live ISO? or fix the more generic case for all platforms?
  • [jlebon] we have WIP for the latter case that the live ISO case can be folded into
  • [walters] we have mutliple overlapping PRs that address this
  • [jlebon] proposal: separate fetch stage into "offline" fetch and "normal" fetch stage. Ignition should be able to tell us when it needs networking. Offline stage would try to do its best to get the config from provide/platform; if Ignition requires networking, a tmpfile dropped in /run that the initramfs can key off of and run the required networking
  • [jlebon] alternative: make ignition smarter or bind it closer to the OS
  • [bgilbert] this wouldn't work with systemd units? would need a generator
  • [jlebon] work around with systemctl start network-online.target; it is not pretty
  • [bgilbert] oof
  • [walters] running ignition from generator via karg or file seems sane; if the generator is doing block device probing (e.g.), it gets ugly
  • [walters] do we need to solve the problem of conditional networking now? or just solve it for live ISO case
  • [imcleod] is the elegance of solving the problem generically worth the ugliness of the solution?
  • [jlebon] there are probably use cases we have not considered yet.
  • [dustymabe] when does ConditionPathExist evaulate? (A: when the unit runs)
  • [slowrie] the solution should default to off, when networking is needed, we drop the file, etc. if we enter a case where we need networking and can't tell, it is a bug and we should fix it.
  • [slowrie] when would the detection of networking happen? at the initial run of ignition or later when we need it in a ignition stage?
  • [jlebon] probably the latter
  • [slowrie] if we go down this route, we need a lot more testing, especially in scenarios where network could take longer to come up. we don't want to fail boots that we are not failing currently because of slow network.
  • [jlebon] networking is brought up between runs, determine if need net, turn it on, otherwise skip
  • [slowrie] start the networking means just kick it off...may not be fully up before the unit finishes
  • [jlebon] we're doing network-online.target to be more sure things are up
  • [imcleod] what about the proposal makes the situation worse than it currently is?
  • [slowrie] networking is now starting later than it currently is, need to retry a few times; timeouts might be more likely to be tripped (dusty +1 to concern)
  • [dustymabe] assuming we can do something more elegant than systemctl start, is this reasonable?
  • [bgilbert] ok in principle; solving completely is difficult. we don't have to solve it completely. need clear docs about the rules around this. remaining question is work vs. value
  • [imcleod] we don't want to change how network is handled in things like AWS, to fix the live ISO. do we have to?
  • [jlebon] path would be same, but the timing would be different when the networking would be brought up. code should be agnostic to platform.
  • [slowrie] RHCOS would give us a lot of test coverage since OCP always requires networking for Ignition
  • [lucab] did we consider plugging it in via kargs?
  • [dustymabe] i have a PR for that; remove default networking kargs for ISO, if a user provides ignition URL on karg, we bring up networking. in any other case, we do not. allows user to boot ISO and get to prompt w/o networking. uses dracut initqueue method instead of systemd.
  • [walters] we should summarize pros/cons and get a path forward
  • [jlebon] two parts: how to signal we need networking and how to bring up networking. we can judge each on its own.
  • [lucab] with VMware backchannel, we can provide both kargs ip= and rd.neednet=1. i.e. way to provide network and do we need network
  • [walters] we could use coreos-installer embed iso to stuff things into /etc/cmdline.d too
  • [walters] this crosses over to static ip discussion.
  • [lucab] do we need static IP on the live ISO?
  • [dustymabe] we likely have users providing static ip kargs to ISO installer now.
  • [imcleod] we have agreed that the best way to configure static ip networking is to use the live ISO. the only hard requirement is that the live ISO does not require network to boot.
  • [slowrie] will the use case be booting the live ISO w/o Ignition?
  • [dustymabe] we tell people to boot the live ISO, curl the Ignition config, and then run coreos-installer. there is autlogin on the FCOS ISO
  • [imcleod] we are being told that users want a solution to boot into a live ISO and then configure the networking. we should also support the kargs to the ISO use case
  • [dustymabe] we should recommend PXE too
  • [slowrie] booting a live ISO w/o Ignition config is a special case that we can detect; wildly different than any other use case
  • [slowrie] during the embed iso we can append rd.neednet=1
  • [dustymabe] do we agree that focusing on the live ISO case is the thing to focus on Right Now? this will solve the problem for 4.5
  • [jlebon] yes, we can work on the live ISO case to unblock us. the subtelty is the embed iso case because users are going to want to automate it.
  • [walters] precisely we are talking about turning on DHCP in initrd
  • [walters] i don't want to do interactive, i have the iDRAC thing, if i want to do install, i can just create an ISO with Ignition embedded, and attach it to iDRAC to boot from.
  • [jlebon] once osmet is in the live ISO, we have no need for network. default path is zero networking.
  • [walters] the MCS serves Ignition config now; installer generates pointer config. unless we create a flow in OCP for users to get the full rendered config...
  • [jlebon] install boot...not initial boot of image
  • [slowrie] use current dusty PR and change coreos-installer if an ignition config is provided, we drop the rd.neednet=1 on the kargs?
  • [imcleod] suitable approach +1
  • [dustymabe] if someone embeds an ignition config, we assume they want networking. they don't want a bash prompt.
  • [walters] big difference between need network in initrd vs need network for anything else. i.e. callbacks only network in the real root, not initrd
  • [dustymabe] NM defaults to DHCP, if we get to real root and has no networking config, NM tries DHCP. proposal to quiet network-online.target?
  • [walters] want to be ergonomic for offline installs
  • [slowrie] let's go as simple as possible to solve the problem; don't worry about it being horrible under the covers. scope it down to live ISO and figure out niceties later.
  • [jlebon] +1, but don't turn on networking if igntion is embedded. need to provide another arg to turn on networking. they are orthogonal.
  • [slowrie] +1; maybe coreos-installer has option not to turn on networking?
  • [dustymabe] currently have ip= and rd.neednet=1 as defaults. PR removes those, but adds them back if Ignition URL is provided.
  • [dustymabe] use case: on laptop, have ISO + coreos-installer binary + ignition config. at time of embed iso, i need to also specify turning on networking?
  • [slowrie] default: if ignition config is embedded, turn on networking. but have an option to turn off networking.
  • [jlebon] we need path for embedding ignition, but doesn't want to turn on networking.
  • [dustymabe] will explore that
  • [slowrie] we should be able to write a /etc/cmdline.d extension to cpio archive
  • [slowrie] let's stick to the simple approach for 4.5 and explore the more general approach later. lots of soak time in FCOS needed.
  • [slowrie] can we use testiso to handle these use new cases?
  • [walters] for sure. want to extend it as a platform where external tests can be excuted. nothing should be blocking making more of a matrix for it.
  • AGREEMENT: simplified live ISO approach for the short term; explore generic approach for later releases
  • [dustymabe] need to enumerate use cases for more generic cases

@dustymabe dustymabe added the meeting topics for meetings label Apr 1, 2020
@dustymabe
Copy link
Member

FYI we discussed this at the community meeting today. Minutes and links to the logs are here: https://lists.fedoraproject.org/archives/list/coreos@lists.fedoraproject.org/message/NA7CMN5HZWKUVVQNG4XPJDX5226BBNGG/

@dustymabe dustymabe removed the meeting topics for meetings label Apr 1, 2020
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue May 25, 2020
This is part of moving to conditional networking
(coreos/fedora-coreos-tracker#443).

Let's move the firstboot kargs here as prep for dropping them entirely.
See also: coreos/coreos-assembler#1373
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue May 25, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the networking-related `firstboot-kargs`, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue May 25, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
@jlebon
Copy link
Member Author

jlebon commented May 25, 2020

Fix for this in coreos/fedora-coreos-config#426.

@jlebon jlebon added the jira for syncing to jira label May 25, 2020
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue May 26, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jun 1, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jun 9, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jun 17, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jul 14, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the firstboot kargs, as well as
coreos-liveiso-network-kargs.service, which is no longer needed (i.e.
the live ISO will now enable initrd networking as required given the
embedded Ignition config).

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jul 15, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the `rd.neednet=1` firstboot karg.

Also don't enable coreos-liveiso-network-kargs.service on FCOS since
it's no longer needed (i.e.  the live ISO will now enable initrd
networking as required given the embedded Ignition config).

On RHCOS, we still need it for now until we move to spec3. Then we can
remove the service and script completely.

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Jul 15, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the `rd.neednet=1` firstboot karg.

Also don't enable coreos-liveiso-network-kargs.service on FCOS since
it's no longer needed (i.e.  the live ISO will now enable initrd
networking as required given the embedded Ignition config).

On RHCOS, we still need it for now until we move to spec3. Then we can
remove the service and script completely.

Fixes: coreos/fedora-coreos-tracker#443
jlebon added a commit to coreos/fedora-coreos-config that referenced this issue Jul 15, 2020
We have all the piece in place now to move to conditional networking. So
let's drop the `rd.neednet=1` firstboot karg.

Also don't enable coreos-liveiso-network-kargs.service on FCOS since
it's no longer needed (i.e.  the live ISO will now enable initrd
networking as required given the embedded Ignition config).

On RHCOS, we still need it for now until we move to spec3. Then we can
remove the service and script completely.

Fixes: coreos/fedora-coreos-tracker#443
@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Jul 16, 2020
@dustymabe
Copy link
Member

The fix for this went into testing stream release 32.20200715.2.2. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Jul 17, 2020
dustymabe added a commit to dustymabe/fedora-coreos-docs that referenced this issue Jul 26, 2020
We no longer bring up the network in the initramfs if it's not needed.
See coreos/fedora-coreos-tracker#443.
dustymabe added a commit to dustymabe/fedora-coreos-docs that referenced this issue Jul 26, 2020
We no longer bring up the network in the initramfs if it's not needed.
See coreos/fedora-coreos-tracker#443.
lucab pushed a commit to coreos/fedora-coreos-docs that referenced this issue Jul 27, 2020
…on (#111)

* pages/static-ip-config: move around useful information

Move the persistent NIC naming and the link to NetworkManager
documentation to more appropriate places higher up in the text.

* pages/static-ip-config: remove reference to bug 358

coreos/fedora-coreos-tracker#358 is fixed now
so we shouldn't need a whole section devoted to troubleshooting the
issue.

* pages/static-ip-config: Remove note about initramfs network bringup

We no longer bring up the network in the initramfs if it's not needed.
See coreos/fedora-coreos-tracker#443.
@dustymabe
Copy link
Member

The fix for this went into stable stream release 32.20200715.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants