Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

DougieLawson · 2014-05-09T22:36:39Z

I quite often connect my RPi to my Windows system with a direct cable using the 169.254.xxx.xxx address scheme. By assiging 169.254.3.14 I can easily find my RPi. It's a convenient way to work away from home to get connected before the WiFi (which often needs a password or web page interaction) is running.

If I bring the machine home and don't connect a cat5 cable to my home router (or connect to my laptop) then the kernel hangs during boot. I get the splash screen and the Raspberry logo and nothing more.

Looking in the code and /proc/config.gz I think I've found the cause.

The kernel config has CONFIG_ROOT_NFS=y
So when we run in net/ipv4/ipconfig.c the retries count is ignored and we loop round

#ifdef CONFIG_ROOT_NFS
                        if (ROOT_DEV ==  Root_NFS) {
                                pr_err("IP-Config: Retrying forever (NFS root)...\n");
                                goto try_try_again;
                        }
#endif

which causes the boot to hang.

The quick resolution is simple

Pull the card, edit cmdline.txt to drop the ip=169.254.3.14 and reboot
Wire the ethernet to my laptop

The permanent fix is to reset CONFIG_ROOT_NFS and rebuild the kernel.

The text was updated successfully, but these errors were encountered:

asb · 2014-05-09T22:45:19Z

But all the ip setting stuff is part of CONFIG_ROOT_NFS. e.g. the ip kernel command line parameter is part of nfsroot https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt Though it is (ab)used for other purposes, e.g. 9P2000 root.

Ultimately, you should be assigning an IP with userspace configuration (e.g. /etc/network/interfaces). I personally wouldn't consider the behaviour you've encountered a bug.

DougieLawson · 2014-05-09T23:05:29Z

It's a bug because it's well documented that you can assign an IP using the cmdline.txt ip=vvv.xxx.yyy.zzz and there's no checking that it's being used for an NFS rootfs. If the parm is only to be used with NFS then it should barf earlier or (probably not a good idea) silently ignore it.

I agree if the rootfs was on an NFS device that there's no point continuing, if the connection doesn't come active, but that should bail out with an oops after a few more turns round the loop (with retries set to a higher value). It's never sensible to solidly hang the boot in an endless loop.

It's a convenience thing to be able to set an IP address for the ethernet interface when you can't access the ext4 filesystem. My RPis normally run with fixed addresses from the 10.1.1.0/24 block so I could fiddle with the windows side to fix the IP address there, but popping the SDCard and updating cmdline.txt is easier (and if it didn't hang I could set it and forget it).

asb · 2014-05-09T23:28:25Z

Wait a minute, the code you post doesn't really explain the issue you're seeing. Why would ROOT_DEV == Root_NFS be true?

DougieLawson · 2014-05-09T23:36:07Z

That appears to be the only place in the code where we loop back to try_try_again without decrementing retries.

asb · 2014-05-09T23:41:40Z

Sure, though if ROOT_DEV is set to Root_NFS without passing root=/dev/nfs on the kernel command line it seems that's where the bug really is (and it seems it would be an upstream bug).

popcornmix · 2014-05-10T11:43:37Z

Removing CONFIG_ROOT_NFS is not an option. I do all my development with an nfs mounted rootfs, and it appears a very common configuration.

I assume (from your description) you are not seeing the message "IP-Config: Retrying forever (NFS root)..."? Are you sure that is where it gets stuck?

DougieLawson · 2014-05-10T15:07:17Z

I'm going to build a new kernel with debugging set (I may even put some extra messages in). I thought I'd found the hang from reading the code (which handles the ip=vvv.xxx.yyy.zzz parm).

I'd have thought giving it five minutes (rather than endless) before doing something to tell the user the boot isn't going to complete would have been a better design. Perhaps I'm too used to seeing IBM mainframe operating systems set disabled wait states when their initial program load can't continue.

DougieLawson · 2014-05-11T10:08:08Z

It's amazing what you see when you add some debugging with

#define IPCONFIG_DEBUG

There's a delay loop (120 seconds) before ipconfig.c gives up the ghost and carries on. I guess I've never been patient enough to wait that long before pulling the power and giving up.
I could petition for

#define CONF_CARRIER_TIMEOUT    120000  /* Wait for carrier timeout */

to be made smaller or I could just accept that boot is going to hang for two minutes when I'm stupid enough to define ip=vvv.xxx.yyy.zzz but not connect a wire.

commit 09712f5 upstream. When resuming from s2ram on an SMP system without cpufreq operating points (e.g. there's no "operating-points" property for the CPU node in DT, or the platform doesn't use DT yet), the kernel crashes when bringing CPU 1 online: Enabling non-boot CPUs ... CPU1: Booted secondary processor Unable to handle kernel NULL pointer dereference at virtual address 0000003c pgd = ee5e6b00 [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000 Internal error: Oops: a07 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589 task: eeec5240 ti: ee704000 task.ti: ee704000 PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c LR is at __cpufreq_add_dev.isra.24+0x244/0x77c pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153 sp : ee705d48 ip : ee705d48 fp : ee705d84 r10: c04e0450 r9 : 00000000 r8 : 00000001 r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734 Verify that policy is not NULL before dereferencing it to fix this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When resuming from s2ram on an SMP system without cpufreq operating points (e.g. there's no "operating-points" property for the CPU node in DT, or the platform doesn't use DT yet), the kernel crashes when bringing CPU 1 online: Enabling non-boot CPUs ... CPU1: Booted secondary processor Unable to handle kernel NULL pointer dereference at virtual address 0000003c pgd = ee5e6b00 [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000 Internal error: Oops: a07 [raspberrypi#1] SMP ARM Modules linked in: CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty raspberrypi#589 task: eeec5240 ti: ee704000 task.ti: ee704000 PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c LR is at __cpufreq_add_dev.isra.24+0x244/0x77c pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153 sp : ee705d48 ip : ee705d48 fp : ee705d84 r10: c04e0450 r9 : 00000000 r8 : 00000001 r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734 Verify that policy is not NULL before dereferencing it to fix this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 8414809 (cpufreq: Preserve policy structure across suspend/resume) Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

DougieLawson closed this as completed May 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

DougieLawson commented May 9, 2014

asb commented May 9, 2014

DougieLawson commented May 9, 2014

asb commented May 9, 2014

DougieLawson commented May 9, 2014

asb commented May 9, 2014

popcornmix commented May 10, 2014

DougieLawson commented May 10, 2014

DougieLawson commented May 11, 2014

Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

Coding ip=169.254.3.14 hangs during boot if there's no cable connected #589

Comments

DougieLawson commented May 9, 2014

asb commented May 9, 2014

DougieLawson commented May 9, 2014

asb commented May 9, 2014

DougieLawson commented May 9, 2014

asb commented May 9, 2014

popcornmix commented May 10, 2014

DougieLawson commented May 10, 2014

DougieLawson commented May 11, 2014