Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel oops adding eth0 to bridge on Pi3+ #2437

Closed
burtyb opened this issue Mar 15, 2018 · 23 comments
Closed

Kernel oops adding eth0 to bridge on Pi3+ #2437

burtyb opened this issue Mar 15, 2018 · 23 comments

Comments

@burtyb
Copy link

burtyb commented Mar 15, 2018

When adding eth0 to a bridge with a Pi3+ (Raspbian 2018-03-13) I get the following Oops on boot.

[ 5.451470] Unable to handle kernel NULL pointer dereference at virtual address 000001f0
[ 5.464412] pgd = b72e4000
[ 5.469282] [000001f0] *pgd=372e8835, *pte=00000000, *ppte=00000000
[ 5.478303] Internal error: Oops: 17 [#1] SMP ARM
[ 5.485414] Modules linked in: bridge stp llc brcmfmac brcmutil cfg80211 rfkill snd_bcm2835 snd_pcm snd_timer snd bcm2835_gpiomem uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
[ 5.505733] CPU: 2 PID: 286 Comm: brctl Not tainted 4.9.80-v7+ #1098
[ 5.505737] Hardware name: BCM2835
[ 5.505745] task: b97ae740 task.stack: b6578000
[ 5.505763] PC is at phy_ethtool_gset+0x14/0x88
[ 5.505774] LR is at lan78xx_get_settings+0x3c/0xcc
[ 5.505783] pc : [<8054a3c8>] lr : [<80550df8>] psr: 60000013
[ 5.505783] sp : b6579d18 ip : b6579d28 fp : b6579d24
[ 5.505787] r10: 00000080 r9 : b9057500 r8 : b6579d54
[ 5.505793] r7 : 00000001 r6 : b9050000 r5 : b6579d98 r4 : 00000000
[ 5.505800] r3 : 80550dbc r2 : 00000001 r1 : b6579d54 r0 : 00000000
[ 5.505807] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 5.505813] Control: 10c5383d Table: 372e406a DAC: 00000055
[ 5.505818] Process brctl (pid: 286, stack limit = 0xb6578210)
[ 5.505825] Stack: (0xb6579d18 to 0xb657a000)
[ 5.505835] 9d00: b6579d4c b6579d28
[ 5.505845] 9d20: 80550df8 8054a3c0 80550dbc b9050000 b6579d98 b716ef80 00000001 80d04850
[ 5.505856] 9d40: b6579d94 b6579d50 8063d4ac 80550dc8 b9b39590 00000001 00000000 00000000
[ 5.505865] 9d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 5.505876] 9d80: b9050000 b9050000 b6579df4 b6579d98 7f4a148c 8063d420 b6579dc4 8025ec74
[ 5.505907] 9da0: b6579df4 b6579db0 8025ec74 8025ece0 b9b384b8 b9154090 a0000013 000003e5
[ 5.505924] 9dc0: b91d9400 a0000013 b9154084 b90574fc b9050000 b716ef80 00000001 80d04850
[ 5.505934] 9de0: b9057500 b91d8c00 b6579e1c b6579df8 7f4a1c98 7f4a1474 80c65740 b9057500
[ 5.505952] 9e00: 00000002 00000001 00000000 00000000 b6579e3c b6579e20 7f4a2af0 7f4a1b2c
[ 5.505971] 9e20: b6579e90 b9057000 b6579e90 7f4aeb54 b6579e54 b6579e40 7f4a34c0 7f4a2a9c
[ 5.505984] 9e40: 000089a2 b9057000 b6579e84 b6579e58 80653300 7f4a3464 80724638 80722d6c
[ 5.505995] 9e60: 00400100 000089a2 000089a2 80c65740 7ef69c14 b6579e90 b6579edc b6579e88
[ 5.506010] 9e80: 80653720 80653048 00000000 b746a220 00307262 00000000 00000000 00000000
[ 5.506021] 9ea0: 00000002 00000000 000260bc 00000000 b6579ed4 000089a2 fffffdfd 7ef69c14
[ 5.506037] 9ec0: 80c65740 00000003 b6578000 00000080 b6579efc b6579ee0 80616aa8 806533c0
[ 5.506056] 9ee0: 7ef69c14 b9b6b520 b9639300 000089a2 b6579f7c b6579f00 80283f68 806168b4
[ 5.506067] 9f00: 00025f08 00000002 00000036 40000028 76f0c80c 00000000 00000003 00000000
[ 5.506076] 9f20: 000089a2 00000000 7ef69c14 00000000 00000062 00000000 00025f08 00000000
[ 5.506089] 9f40: 00000002 00000000 b6578000 00025f08 00000002 b9639300 7ef69c14 b9639300
[ 5.506120] 9f60: 000089a2 00000003 b6578000 00000080 b6579fa4 b6579f80 80284720 80283ec8
[ 5.506134] 9f80: 00000062 00025f08 00000002 7ef69ee0 00000036 80108244 00000000 b6579fa8
[ 5.506144] 9fa0: 80108220 802846e8 00025f08 00000002 00000003 000089a2 7ef69c14 00000062
[ 5.506173] 9fc0: 00025f08 00000002 7ef69ee0 00000036 00000000 00000000 76fc9000 00000000
[ 5.506184] 9fe0: 76f0c800 7ef69bfc 00013ae0 76f0c80c 60000010 00000003 00000000 00000000
[ 5.506208] [<8054a3c8>] (phy_ethtool_gset) from [<80550df8>] (lan78xx_get_settings+0x3c/0xcc)
[ 5.506226] [<80550df8>] (lan78xx_get_settings) from [<8063d4ac>] (__ethtool_get_link_ksettings+0x98/0xe0)
[ 5.506308] [<8063d4ac>] (__ethtool_get_link_ksettings) from [<7f4a148c>] (port_cost+0x24/0x80 [bridge])
[ 5.506448] [<7f4a148c>] (port_cost [bridge]) from [<7f4a1c98>] (br_add_if+0x178/0x4c8 [bridge])
[ 5.506548] [<7f4a1c98>] (br_add_if [bridge]) from [<7f4a2af0>] (add_del_if+0x60/0x7c [bridge])
[ 5.506649] [<7f4a2af0>] (add_del_if [bridge]) from [<7f4a34c0>] (br_dev_ioctl+0x68/0x6c [bridge])
[ 5.506729] [<7f4a34c0>] (br_dev_ioctl [bridge]) from [<80653300>] (dev_ifsioc+0x2c4/0x2fc)
[ 5.506747] [<80653300>] (dev_ifsioc) from [<80653720>] (dev_ioctl+0x36c/0x858)
[ 5.506765] [<80653720>] (dev_ioctl) from [<80616aa8>] (sock_ioctl+0x200/0x2ac)
[ 5.506784] [<80616aa8>] (sock_ioctl) from [<80283f68>] (do_vfs_ioctl+0xac/0x820)
[ 5.506801] [<80283f68>] (do_vfs_ioctl) from [<80284720>] (SyS_ioctl+0x44/0x6c)
[ 5.506819] [<80284720>] (SyS_ioctl) from [<80108220>] (__sys_trace_return+0x0/0x10)
[ 5.506831] Code: e92dd800 e24cb004 e52de004 e8bd4000 (e59021f0)
[ 5.506922] ---[ end trace a6824d7f1c5a36d1 ]---

(full console output attached as startup.txt)
startup.txt

After the Oops Raspbian continues to boot but after many minutes doesn't get to the login prompt.

When the same SD card is used in a Pi3 (none +) it starts correctly and adds eth0 to br0.

To replicate - I'm using a fresh 2018-03-13-raspbian-stretch-lite.img with the following changes.

Add "enable_uart=1" to the end of /boot/config.txt

Boot & login

sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install bridge-utils
sudo cat << EOF | sudo tee -a /etc/network/interfaces

auto br0

iface br0 inet manual
bridge_ports eth0
bridge_stp off
bridge_waitport 0
bridge_fd 0
EOF

echo "denyinterfaces eth0" | sudo tee -a /etc/dhcpcd.conf
sudo reboot

On boot I get the Oops above.

@pelwell
Copy link
Contributor

pelwell commented Mar 16, 2018

Bridging is an advanced use case that hasn't been tested here. The code path in question doesn't exist in the 4.14 kernel we will be switching to imminently. Can you run sudo rpi-update to get the 4.14 kernel and see if the problem still occurs?

@burtyb
Copy link
Author

burtyb commented Mar 16, 2018

Still get an Oops with Linux version 4.14.26-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1099 SMP Wed Mar 14 14:59:28 GMT 2018

[ 6.660958] Unable to handle kernel NULL pointer dereference at virtual address 00000208
[ 6.673725] pgd = b6a74000
[ 6.678702] [00000208] *pgd=36a67835, *pte=00000000, *ppte=00000000
[ 6.687436] Internal error: Oops: 17 [#1] SMP ARM
[ 6.694485] Modules linked in: bridge stp llc brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd fixed uio_pdrv_genirq uio ip_tables x_tables ipv6
[ 6.708876] random: crng init done
[ 6.722741] CPU: 2 PID: 269 Comm: brctl Tainted: G C 4.14.26-v7+ #1099
[ 6.735513] Hardware name: BCM2835
[ 6.741447] task: b7800f00 task.stack: b6a6a000
[ 6.748537] PC is at phy_ethtool_ksettings_get+0x1c/0x94
[ 6.756496] LR is at lan78xx_get_link_ksettings+0x3c/0x4c
[ 6.764559] pc : [<8057fe94>] lr : [<805877e8>] psr: 60000013
[ 6.773621] sp : b6a6bd28 ip : b6a6bd40 fp : b6a6bd3c
[ 6.781589] r10: 00000080 r9 : 00000001 r8 : 80d0a040
[ 6.789512] r7 : 00000000 r6 : b6a6bda8 r5 : 00000000 r4 : b6a6bda8
[ 6.798775] r3 : 805877ac r2 : 00000001 r1 : b6a6bda8 r0 : 00000000
[ 6.807980] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 6.817799] Control: 10c5383d Table: 36a7406a DAC: 00000055
[ 6.826155] Process brctl (pid: 269, stack limit = 0xb6a6a210)
[ 6.834656] Stack: (0xb6a6bd28 to 0xb6a6c000)
[ 6.841616] bd20: b9242000 00000000 b6a6bd5c b6a6bd40 805877e8 8057fe84
[ 6.854981] bd40: b9242000 b6a6bda8 b9242000 b93b6480 b6a6bda4 b6a6bd60 806805b0 805877b8
[ 6.868317] bd60: b6a6bd8c b6a6bd70 802a3034 802a225c b88539a8 b88c62a8 b9df3e80 b9bf3018
[ 6.881870] bd80: b6a6bdac b6a6bd90 802fc4dc 802a2f5c b9242000 b7886500 b6a6be04 b6a6bda8
[ 6.895609] bda0: 7f4e273c 80680560 00000000 00000000 00000000 00000000 00000000 00000000
[ 6.909655] bdc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 6.923943] bde0: 00000000 00000000 00000000 00000000 00000001 b792c000 b6a6be2c b6a6be08
[ 6.938436] be00: 7f4e2f44 7f4e2724 80c74600 b7886500 00000002 00000001 00000000 b6a6a000
[ 6.953230] be20: b6a6be4c b6a6be30 7f4e3d98 7f4e2ddc b6a6bea0 b7886000 b6a6bea0 7f4f10ac
[ 6.968146] be40: b6a6be64 b6a6be50 7f4e475c 7f4e3d44 000089a2 b7886000 b6a6be94 b6a6be68
[ 6.983225] be60: 8069aa70 7f4e4700 80786970 807849d0 00400100 000089a2 000089a2 80c74600
[ 6.998433] be80: b6a6bea0 00000000 b6a6beec b6a6be98 8069aef0 8069a784 00000002 b6a6bea0
[ 7.013672] bea0: 00307262 00000000 00000000 00000000 00000002 00000000 000260bc 00000000
[ 7.028913] bec0: 801cbf6c 000089a2 fffffdfd 7eab5c14 80c74600 00000003 b6a6a000 00000080
[ 7.044154] bee0: b6a6bf0c b6a6bef0 80656a68 8069ab38 7eab5c14 b9b7efe0 b925ad80 000089a2
[ 7.059395] bf00: b6a6bf7c b6a6bf10 8029dadc 80656858 8028ba08 802ad608 00000020 00000000
[ 7.074636] bf20: 8028bafc b7801444 b7801428 b7800f00 b7801444 80c91e04 00000000 00000080
[ 7.089875] bf40: b6a6bf64 00000000 b7801428 b7800f00 b7801444 b925ad80 7eab5c14 b925ad80
[ 7.105114] bf60: 000089a2 00000003 b6a6a000 00000080 b6a6bfa4 b6a6bf80 8029e238 8029da3c
[ 7.120356] bf80: 00000062 00025f08 00000002 7eab5ee0 00000036 80108224 00000000 b6a6bfa8
[ 7.135601] bfa0: 80108200 8029e200 00025f08 00000002 00000003 000089a2 7eab5c14 00000062
[ 7.150853] bfc0: 00025f08 00000002 7eab5ee0 00000036 00000000 00000000 76fb7000 00000000
[ 7.166104] bfe0: 76efa800 7eab5bfc 00013ae0 76efa80c 60000010 00000003 e3500000 ba0005f4
[ 7.181373] [<8057fe94>] (phy_ethtool_ksettings_get) from [<805877e8>] (lan78xx_get_link_ksettings+0x3c/0x4c)
[ 7.198458] [<805877e8>] (lan78xx_get_link_ksettings) from [<806805b0>] (__ethtool_get_link_ksettings+0x5c/0xe0)
[ 7.215883] [<806805b0>] (__ethtool_get_link_ksettings) from [<7f4e273c>] (port_cost+0x24/0x80 [bridge])
[ 7.232647] [<7f4e273c>] (port_cost [bridge]) from [<7f4e2f44>] (br_add_if+0x174/0x4c4 [bridge])
[ 7.248699] [<7f4e2f44>] (br_add_if [bridge]) from [<7f4e3d98>] (add_del_if+0x60/0x7c [bridge])
[ 7.264681] [<7f4e3d98>] (add_del_if [bridge]) from [<7f4e475c>] (br_dev_ioctl+0x68/0x6c [bridge])
[ 7.280907] [<7f4e475c>] (br_dev_ioctl [bridge]) from [<8069aa70>] (dev_ifsioc+0x2f8/0x338)
[ 7.296465] [<8069aa70>] (dev_ifsioc) from [<8069aef0>] (dev_ioctl+0x3c4/0x8e8)
[ 7.307514] [<8069aef0>] (dev_ioctl) from [<80656a68>] (sock_ioctl+0x21c/0x2d8)
[ 7.318494] [<80656a68>] (sock_ioctl) from [<8029dadc>] (do_vfs_ioctl+0xac/0x7c4)
[ 7.332957] [<8029dadc>] (do_vfs_ioctl) from [<8029e238>] (SyS_ioctl+0x44/0x6c)
[ 7.343903] [<8029e238>] (SyS_ioctl) from [<80108200>] (__sys_trace_return+0x0/0x10)
[ 7.358568] Code: e52de004 e8bd4000 e1a04001 e1a05000 (e5901208)
[ 7.368265] ---[ end trace 0b722967cb3efe56 ]---

@graysky2
Copy link

graysky2 commented Mar 20, 2018

I too am experiencing this on a RPi3 B-Plus (the RPi3 B works fine with the identical setup). I actually opened a ticket against systemd (using systemd-networkd to setup the bridge): systemd/systemd#8503 thinking it was to blame.

Distro: Arch ARM (armv7h)
Kernel version: 4.14.27-1-ARCH

@pelwell
Copy link
Contributor

pelwell commented Mar 20, 2018

This may have the same root cause as #2442, for which we are waiting for a patch to be completed and accepted upstream.

@graysky2
Copy link

@pelwell - Sounds good, thanks for the link.

pelwell pushed a commit that referenced this issue Apr 4, 2018
With Alexander Graf's patch ("lan78xx: Connect phy early") applied,
the call to lan78xx_reset within lan78xx_open prevents the phy
interrupt from being generated (even though the link is up).

Avoid this issue by removing the lan78xx_reset call.

See: #2437
     #2442
     #2457
popcornmix added a commit to raspberrypi/firmware that referenced this issue Apr 4, 2018
See: raspberrypi/linux#2458

kernel: Revert lan78xx: Simple patch to prevent some crashes
kernel: lan78xx: Connect phy early
kernel: lan78xx: Don't reset the interface on open
See: raspberrypi/linux#2437
See: raspberrypi/linux#2442
See: raspberrypi/linux#2457

firmware: clockman: Don't use OSC for pixel clock
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=24679&start=150#p1297298
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 4, 2018
See: raspberrypi/linux#2458

kernel: Revert lan78xx: Simple patch to prevent some crashes
kernel: lan78xx: Connect phy early
kernel: lan78xx: Don't reset the interface on open
See: raspberrypi/linux#2437
See: raspberrypi/linux#2442
See: raspberrypi/linux#2457

firmware: clockman: Don't use OSC for pixel clock
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=24679&start=150#p1297298
@popcornmix
Copy link
Collaborator

Latest rpi-update kernel has a potential fix for this issue. Please test.

@graysky2
Copy link

graysky2 commented Apr 4, 2018

@popcornmix - Arch ARM doesn't use that script but I am building:

Will try and report back, thank you!

@burtyb
Copy link
Author

burtyb commented Apr 4, 2018

Doesn't look like eth0 works at all now on the 3b+ even before being added to a bridge.

Using a fresh 2018-03-13-raspbian-stretch-lite.img (with "enable_uart=1" added to config.txt)

On first boot dhcpcd gets an IP on eth0 and it works OK.

Run "sudo rpi-update"

pi@raspberrypi:~$ sudo rpi-update
*** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
*** Performing self-update
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13403 100 13403 0 0 25693 0 --:--:-- --:--:-- --:--:-- 25676
*** Relaunching after update
*** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
*** We're running for the first time
*** Backing up files (this will take a few minutes)
*** Backing up firmware
*** Backing up modules 4.9.80-v7+
#############################################################
This update bumps to rpi-4.14.y linux tree
Be aware there could be compatibility issues with some drivers
Discussion here:
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=197689
##############################################################
*** Downloading specific firmware revision (this will take a few minutes)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 168 0 168 0 0 246 0 --:--:-- --:--:-- --:--:-- 246
100 55.5M 100 55.5M 0 0 356k 0 0:02:39 0:02:39 --:--:-- 476k
*** Updating firmware
*** Updating kernel modules
*** depmod 4.14.32-v7+
*** depmod 4.14.32+
*** Updating VideoCore libraries
*** Using HardFP libraries
*** Updating SDK
*** Running ldconfig
*** Storing current firmware revision
*** Deleting downloaded files
*** Syncing changes to disk
*** If no errors appeared, your firmware was successfully updated to 0382fa9aae15a1641e4c52ba2ffdaf2cfbe0c4f8
*** A reboot is needed to activate the new firmware

And reboot

eth0 no longer gets an IP address

Even after assigning an IP/netmask to eth0 on the manually it's unable to ping the gateway/other IP in the subnet.

@graysky2
Copy link

graysky2 commented Apr 4, 2018

*** If no errors appeared, your firmware was successfully updated to 0382fa9aae15a1641e4c52ba2ffdaf2cfbe0c4f8

...does this correspond to a commit in https://github.com/raspberrypi/firmware/commits/master ? I cannot find it.

@popcornmix
Copy link
Collaborator

@graysky2 rpi-update gets firmware from https://github.com/Hexxeh/rpi-firmware

@graysky2
Copy link

graysky2 commented Apr 4, 2018

@popcornmix - Thanks. To keep things simple, I will limit my replies to #2442 rather than mirroring them here.

@burtyb
Copy link
Author

burtyb commented Apr 5, 2018

Looking at the output of ethtool before/after updating the only difference I could see was
before running rpi-update it said "Transceiver: external"
after running rpi-update where eth0 no longer works it now says "Transceiver: internal"

Trying to set it back to external with "sudo ethtool -s eth0 xcvr external" fails with the error

Cannot set new settings: Success
not setting transceiver

BUT from a fresh boot if I run "sudo ethtool -s eth0 xcvr internal" (even though it already says internal) I get an IPv6 address via NDP and DHCP obtains an IPv4 address a few seconds later and things work again.

@pelwell
Copy link
Contributor

pelwell commented Apr 5, 2018

Thanks for the update. I've been investigating and the failure seems to be timing dependent - adding logging makes it go away (there may be a minimal useful set), and adding dtparam=eee=off avoids the problem (can you confirm?).

pelwell pushed a commit that referenced this issue Apr 5, 2018
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
@pelwell
Copy link
Contributor

pelwell commented Apr 5, 2018

The code that enabled EEE mode was running at the point the interface was being opened. This forced a second round of negotiation, delaying readiness (sometimes permanently, depending on the timing).

Moving the EEE enabling into the PHY initialisation function, immediately after the connection to the PHY is established, which is long before phy_start is called, prevents the renegotiation and avoids the slowdown (and occasional failure).

@graysky2
Copy link

graysky2 commented Apr 5, 2018

@pelwell - For the record, the bridge not coming up at boot time has been fixed with b5b6bb9 on my system. Perhaps @burtyb has some other issue at work.

@pelwell
Copy link
Contributor

pelwell commented Apr 5, 2018

Yes there was another issue, but my moving some of the initialisation code (c2eb306) I believe that issue has been resolved.

@burtyb
Copy link
Author

burtyb commented Apr 5, 2018

With the rpi-update kernel after adding "dtparam=eee=off" eth0 links and works OK on fresh Raspbian and adding to a bridge also works OK.

Recompiling with c2eb306 eth0 links OK (without "dtparam=eee=off") and can also be added to a bridge without Oops.

@pelwell
Copy link
Contributor

pelwell commented Apr 5, 2018

Cool. I'll close the issue after the next firmware release.

@alexreinert
Copy link

@pelwell Do you have a rough idea, when a fixed kernel and firmware will be released as deb package?

@pelwell
Copy link
Contributor

pelwell commented Apr 6, 2018

The firmware could be released today (@popcornmix), and the .deb depends on @XECDesign.

@pelwell
Copy link
Contributor

pelwell commented Apr 6, 2018

Given that there are several stability investigations ongoing, releasing a new .deb now would be premature.

popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 6, 2018
See: raspberrypi/linux#2437

kernel: config: Add BT_HCIUART_BCM=y and SERIAL_DEV_BUS=m
See: raspberrypi/linux#2479

firmware: config: gpio - Allow pn (pull none) as alternative to np (no pull)
popcornmix added a commit to raspberrypi/firmware that referenced this issue Apr 6, 2018
See: raspberrypi/linux#2437

kernel: config: Add BT_HCIUART_BCM=y and SERIAL_DEV_BUS=m
See: raspberrypi/linux#2479

firmware: config: gpio - Allow pn (pull none) as alternative to np (no pull)
popcornmix pushed a commit that referenced this issue May 2, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue May 13, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue May 20, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue May 20, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue May 28, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue May 28, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 3, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 3, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 10, 2024
Christoph reports a page allocator splat triggered by xfstests:

generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30
XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273
XFS (nvme0n1): Ending clean mount
XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Ending clean mount
XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9
XFS (nvme1n1): Ending clean mount
------------[ cut here ]------------
page type is 3, passed migratetype is 1 (nr=512)
WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0
Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug]
CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:expand+0x1c5/0x1f0
Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3
RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000
RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009
R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000
FS:  00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x7b/0x120
 ? expand+0x1c5/0x1f0
 ? report_bug+0x191/0x1c0
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? expand+0x1c5/0x1f0
 ? expand+0x1c5/0x1f0
 __rmqueue_pcplist+0x3a9/0x730
 get_page_from_freelist+0x7a0/0xf00
 __alloc_pages_noprof+0x153/0x2e0
 __folio_alloc_noprof+0x10/0xa0
 __filemap_get_folio+0x16b/0x370
 iomap_write_begin+0x496/0x680

While trying to service a movable allocation (page type 1), the page
allocator runs into a two-pageblock buddy on the movable freelist whose
second block is typed as highatomic (page type 3).

This inconsistency is caused by the highatomic reservation system
operating on single pageblocks, while MAX_ORDER can be bigger than that -
in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. 
The test case is observed to make several adjacent order-3 requests with
__GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as
highatomic.  Upon freeing, the blocks merge into an order-10 buddy.  When
the highatomic pool is drained later on, this order-10 buddy gets moved
back to the movable list, but only the first pageblock is marked movable
again.  A subsequent expand() of this buddy warns about the tail being of
a different type.

This is a long-standing bug that's surfaced by the recent block type
warnings added to the allocator.  The consequences seem mostly benign, it
just results in odd behavior: the highatomic tail blocks are not properly
drained, instead they end up on the movable list first, then go back to
the highatomic list after an alloc-free cycle.

To fix this, make the highatomic reservation code aware that
allocations/buddies can be larger than a pageblock.

While it's an old quirk, the recently added type consistency warnings seem
to be the most prominent consequence of it.  Set the Fixes: tag
accordingly to highlight this backporting dependency.

Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org
Fixes: e0932b6 ("mm: page_alloc: consolidate free page accounting")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
popcornmix pushed a commit that referenced this issue Jun 12, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 17, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 24, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jun 28, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jul 8, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 9, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 10, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jul 11, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 18, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jul 19, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 20, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 21, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Jul 21, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
popcornmix pushed a commit that referenced this issue Jul 25, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
popcornmix pushed a commit that referenced this issue Jul 29, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: #2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Aug 8, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Aug 9, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Aug 10, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
rajeshkumarwr pushed a commit to rajeshkumarwr/linux-yocto that referenced this issue Aug 10, 2024
commit 54948a2d072e23217ebd5cb89f7ad3754790ee9d from
https://github.com/raspberrypi/linux.git rpi-6.6.y

Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: Rajeshkumar Ramasamy <rajeshkumar.ramasamy@windriver.com>
gibsson pushed a commit to boundarydevices/linux that referenced this issue Sep 2, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
gibsson pushed a commit to boundarydevices/linux that referenced this issue Sep 16, 2024
Enable EEE mode as soon as possible after connecting to the PHY, and
before phy_start. This avoids a second link negotiation, which speeds
up booting and stops the interface failing to become ready.

See: raspberrypi/linux#2437

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants