SD card boot gets stuck often resulting in board not booting. #15

kraj · 2021-05-12T03:14:50Z

I am using starfive_vic7100_evb_sd_net_defconfig for kernel and it results in kernel hangs during boot at times, almost everytime the system is rebooted. Following message is seen when the kernel gets stuck

[    2.216987] sdhci: Secure Digital Host Controller Interface driver
[    2.223206] sdhci: Copyright(c) Pierre Ossman
[    2.227616] Synopsys Designware Multimedia Card Interface Driver
[    2.233929] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.240680] dw_mmc 10000000.sdio0: IDMAC supports 32-bit address mode.
[    2.241157] NET: Registered protocol family 10
[    2.247368] dw_mmc 10000000.sdio0: Using internal DMA controller.
[    2.257870] dw_mmc 10000000.sdio0: Version ID is 290a
[    2.263093] dw_mmc 10000000.sdio0: DW MMC controller at irq 32,32 bit host data width,32 deep fifo
[    2.272303] Segment Routing with IPv6
[    2.272351] mmc_host mmc0: card is polling.
[    2.276110] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    2.280826] dw_mmc 10010000.sdio1: IDMAC supports 32-bit address mode.
[    2.287217] NET: Registered protocol family 17
[    2.292709] dw_mmc 10010000.sdio1: Using internal DMA controller.
[    2.297470] 9pnet: Installing 9P2000 support
[    2.303231] dw_mmc 10010000.sdio1: Version ID is 290a
[    2.307588] Key type dns_resolver registered
[    2.312602] dw_mmc 10010000.sdio1: DW MMC controller at irq 33,32 bit host data width,32 deep fifo
[    2.316984] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[    2.326026] mmc_host mmc1: card is non-removable.
[    2.340468] dw-apb-uart 12440000.serial: forbid DMA for kernel console
[    2.491434] mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)
[    2.551443] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)
[    2.771949] Waiting for root device /dev/mmcblk0p2...
[    2.811404] dw_mmc 10010000.sdio1: Unexpected interrupt latency

I am now building with https://github.com/starfive-tech/freelight-u-sdk/blob/starfive/conf/sdk_210209_defconfig and that seems to result in better behavior where it does not get stuck in SD card issues as much.

but I do see another issue

[ 1059.839470] Showing busy workqueues and worker pools:
[ 1059.844509] workqueue events: flags=0x0
[ 1059.848329]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
[ 1059.855104]     in-flight: 50:request_firmware_work_func
[ 1059.860413]     pending: delayed_fput, do_free_init, deferred_probe_work_func, rht_deferred_worker
[ 1059.869397] workqueue events_unbound: flags=0x2
[ 1059.873912]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=2/512 refcnt=5
[ 1059.880250]     in-flight: 43:fsnotify_mark_destroy_workfn BAR(107), 7:fsnotify_connector_destroy_workfn
[ 1059.889726] workqueue events_freezable: flags=0x4
[ 1059.894409]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1059.901184]     pending: mmc_rescan
[ 1059.904685] workqueue events_power_efficient: flags=0x80
[ 1059.909976]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 1059.916748]     pending: gc_worker, reg_check_chans_work [cfg80211]
[ 1059.923849] workqueue rcu_gp: flags=0x8
[ 1059.927671]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1059.934449]     pending: process_srcu
[ 1059.938121] workqueue mm_percpu_wq: flags=0x8
[ 1059.942462]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1059.949235]     pending: vmstat_update
[ 1059.953005] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=1054s workers=3 idle: 35 17
[ 1059.960659] pool 4: cpus=0-1 flags=0x4 nice=0 hung=20s workers=4 idle: 119 49
[ 1028.471352] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 1023s!

The text was updated successfully, but these errors were encountered:

pdp7 · 2021-05-12T03:22:52Z

@kraj thank you for opening this issue, are you using the Fedora branch in this repo that Wei Fu (@tekkamanninja) created? Wei Fu told me he is staying on 5.10 as @rwmjones is using that for Fedora/RISC-V however the 5.12 branch from @esmil does seem to be working well for some of the beta developers that have tried it/

kraj · 2021-05-12T03:31:49Z

yes I am on latest on fedora branch of 5.10 kernel

pdp7 · 2021-05-20T02:30:53Z

@kraj I am discussing with @tekkamanninja and @MichaelZhuxx, could you please add the full boot log from the console the next time this happens?

kraj · 2021-05-25T02:07:01Z

here is full log

Starting kernel ...

Linux version 5.13.0-rc3-starlight (oe-user@oe-host) (riscv64-yoe-linux-gcc (GCC) 11.1.1 20210523, GNU ld (GNU Binutils) 2.36.1.20210209) #1 SMP Tue May 25 01:38:19 UTC 2021
OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
Machine model: BeagleV Starlight Beta
earlycon: uart0 at MMIO32 0x0000000012440000 (options '115200n8')
printk: bootconsole [uart0] enabled
efi: UEFI not found.
Reserved memory: created CMA memory pool at 0x00000000a0000000, size 640 MiB
OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
Reserved memory: created DMA memory pool at 0x00000000f9000000, size 16 MiB
OF: reserved mem: initialized node framebuffer@f9000000, compatible id shared-dma-pool
Reserved memory: created DMA memory pool at 0x00000000fb000000, size 32 MiB
OF: reserved mem: initialized node framebuffer@fb000000, compatible id shared-dma-pool
Zone ranges:
  DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000027fffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000080200000-0x00000000f8ffffff]
  node   0: [mem 0x00000000f9000000-0x00000000f9ffffff]
  node   0: [mem 0x00000000fa000000-0x00000000faffffff]
  node   0: [mem 0x00000000fb000000-0x00000000fcffffff]
  node   0: [mem 0x00000000fd000000-0x000000027fffffff]
Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
SBI specification v0.3 detected
SBI implementation ID=0x1 Version=0x9
SBI TIME extension detected
SBI IPI extension detected
SBI RFENCE extension detected
software IO TLB: mapped [mem 0x000000009c000000-0x00000000a0000000] (64MB)
SBI v0.2 HSM extension detected
riscv: ISA extensions acdfim
riscv: ELF capabilities acdfim
percpu: Embedded 18 pages/cpu s34200 r8192 d31336 u73728
Built 1 zonelists, mobility grouping on.  Total pages: 2067975
Kernel command line: earlycon root=/dev/mmcblk0p2 rootwait console=ttyS0,115200 stmmac.chain_mode=1
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 6747088K/8386560K available (6826K kernel code, 4815K rwdata, 4096K rodata, 2138K init, 324K bss, 984112K reserved, 655360K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
rcu: Hierarchical RCU implementation.
rcu:    RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
        Tracing variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
riscv-intc: 64 local interrupts mapped
plic: disable L2 cache irq 128 in plic
plic: disable L2 cache irq 131 in plic
plic: disable L2 cache irq 129 in plic
plic: disable L2 cache irq 130 in plic
plic: disable L2 cache irq 128 in plic
plic: disable L2 cache irq 131 in plic
plic: disable L2 cache irq 129 in plic
plic: disable L2 cache irq 130 in plic
plic: interrupt-controller@c000000: mapped 127 interrupts with 2 handlers for 4 contexts.
riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x171024e6b, max_idle_ns: 440795202301 ns
sched_clock: 64 bits at 6MHz, resolution 160ns, wraps every 4398046511040ns
Console: colour dummy device 80x25
Calibrating delay loop (skipped), value calculated using timer frequency.. 12.50 BogoMIPS (lpj=25000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
ASID allocator disabled
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 2 CPUs
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc.isra.0+0x50/0x140 with crng_init=0
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 512 (order: 3, 32768 bytes, linear)
NET: Registered protocol family 16
thermal_sys: Registered thermal governor 'step_wise'
raid6: skip pq benchmark and using algorithm int64x8
raid6: using intx1 recovery algorithm
starfive_gpio 11910000.gpio: StarFive GPIO chip registered 0 GPIOs
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
i2c_designware 118b0000.i2c: running with gpio recovery mode! scl,sda
i2c_designware 118c0000.i2c: running with gpio recovery mode! scl,sda
i2c_designware 12450000.i2c: running with gpio recovery mode! scl,sda
clocksource: Switched to clocksource riscv_clocksource
NET: Registered protocol family 2
IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)
tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear)
TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
TCP: Hash tables configured (established 65536 bind 65536)
UDP hash table entries: 4096 (order: 5, 131072 bytes, linear)
UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Initialise system trusted keyrings
workingset: timestamp_bits=62 max_order=21 bucket_order=0
xor: measuring software checksum speed
   8regs           :  2185 MB/sec
   8regs_prefetch  :  2181 MB/sec
   32regs          :  2343 MB/sec
   32regs_prefetch :  2337 MB/sec
xor: using function: 32regs (2343 MB/sec)
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler bfq registered
starfive,vpp-lcdc 12000000.sfivefb: starfive_fb_probe
panel_name: tda_998x_1080p
of_parse_rd_cmd_info: failed, key=id_read_cmd_info
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info
of_parse_wr_cmd: failed, key=panel-on-command
of_parse_wr_cmd: failed, key=panel-off-command
of_parse_reset_seq:707, error reading reset-sequence, length found = 0
panel_name: seeed_5_inch
of_parse_rd_cmd_info: failed, key=id_read_cmd_info
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info
of_parse_wr_cmd: failed, key=panel-on-command
of_parse_wr_cmd: failed, key=panel-off-command
of_parse_reset_seq:707, error reading reset-sequence, length found = 0
starfive,vpp-lcdc 12000000.sfivefb: select displayer: tda_998x_1080p
starfive,vpp-lcdc 12000000.sfivefb: DDR to LCDC
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx
Console: switching to colour frame buffer device 240x67
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx
select displayer: tda_998x_1080p
i2c_designware 118b0000.i2c: controller timed out
tda998x 0-0070: Error -110 writing to cec:0xff
tda998x 0-0070: tda998x is not ready!
dw_axi_dmac_platform 100b0000.dma-controller: DesignWare AXI DMA Controller, 4 channels
dw_axi_dmac_platform 10500000.dma-controller: DesignWare AXI DMA Controller, 16 channels
L2CACHE: No. of Banks in the cache: 2                                                                                                                                                                  [39/1991]
L2CACHE: No. of ways per bank: 16
L2CACHE: Sets per bank: 1024
L2CACHE: Bytes per cache block: 64
L2CACHE: Index of the largest way enabled: 0
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
dw-apb-uart 11870000.serial: detected caps 00000000 should be 00000100
11870000.serial: ttyS1 at MMIO 0x11870000 (irq = 8, base_baud = 4640625) is a 16550
12440000.serial: ttyS0 at MMIO 0x12440000 (irq = 10, base_baud = 6250000) is a 16550A
printk: console [ttyS0] enabled
printk: console [ttyS0] enabled
printk: bootconsole [uart0] disabled
printk: bootconsole [uart0] disabled
loop: module loaded
zram: Added device: zram0
i2c_designware 118b0000.i2c: controller timed out
tps65086 0-005e: Failed to read revision register
tps65086: probe of 0-005e failed with error -110
dw_spi_mmio 12410000.spi: DMA init failed
wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
libphy: Fixed MDIO Bus: probed
tun: Universal TUN/TAP device driver, 1.6
stmmaceth 10020000.ethernet: IRQ eth_lpi not found
stmmaceth 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
stmmaceth 10020000.ethernet: PTP uses main clock
stmmaceth 10020000.ethernet: User ID: 0x59, Synopsys ID: 0x37
stmmaceth 10020000.ethernet:    DWMAC1000
stmmaceth 10020000.ethernet: DMA HW capability register supported
stmmaceth 10020000.ethernet: RX Checksum Offload Engine supported
stmmaceth 10020000.ethernet: COE Type 2
stmmaceth 10020000.ethernet: Wake-Up On Lan supported
stmmaceth 10020000.ethernet: Enhanced/Alternate descriptors
stmmaceth 10020000.ethernet: Enabled extended descriptors
stmmaceth 10020000.ethernet: Chain mode enabled
stmmaceth 10020000.ethernet: Enable RX Mitigation via HW Watchdog Timer
stmmaceth 10020000.ethernet: device MAC address 2c:f7:f1:1b:e3:33
libphy: stmmac: probed
Micrel KSZ9031 Gigabit PHY stmmac-0:07: attached PHY driver (mii_bus:phy_addr=stmmac-0:07, irq=POLL)
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
xhci-hcd xhci-hcd.0.auto: hcc params 0x200073c8 hci version 0x100 quirks 0x0000002000018010
xhci-hcd xhci-hcd.0.auto: irq 14, io mem 0x104d0000
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.13
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: xHCI Host Controller
usb usb1: Manufacturer: Linux 5.13.0-rc3-starlight xhci-hcd
usb usb1: SerialNumber: xhci-hcd.0.auto
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.13
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: xHCI Host Controller
usb usb2: Manufacturer: Linux 5.13.0-rc3-starlight xhci-hcd
usb usb2: SerialNumber: xhci-hcd.0.auto
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
usbcore: registered new interface driver uas
usbcore: registered new interface driver usb-storage
i2c /dev entries driver
sfctemp 124a0000.tmon: hwmon0: sensor '124a0000.tmon'
Synopsys Designware Multimedia Card Interface Driver
dw_mmc 10000000.mmc: IDMAC supports 32-bit address mode.
usbcore: registered new interface driver usbhid
dw_mmc 10000000.mmc: Using internal DMA controller.
usbhid: USB HID core driver
dw_mmc 10000000.mmc: Version ID is 290a
NET: Registered protocol family 10
dw_mmc 10000000.mmc: DW MMC controller at irq 31,32 bit host data width,32 deep fifo
mmc_host mmc0: card is polling.
dw_mmc 10010000.mmc: IDMAC supports 32-bit address mode.
dw_mmc 10010000.mmc: Using internal DMA controller.
dw_mmc 10010000.mmc: Version ID is 290a
mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)
dw_mmc 10010000.mmc: DW MMC controller at irq 32,32 bit host data width,32 deep fifo
dw_mmc 10010000.mmc: allocated mmc-pwrseq
mmc_host mmc1: card is non-removable.
Segment Routing with IPv6
NET: Registered protocol family 17
Loading compiled-in X.509 certificates
mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)
Btrfs loaded, crc32c=crc32c-generic, zoned=no
dw_mmc 10000000.mmc: Unexpected interrupt latency
cfg80211: Loading compiled-in X.509 certificates for regulatory database
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
dw-apb-uart 12440000.serial: forbid DMA for kernel console
Waiting for root device /dev/mmcblk0p2...
dw_mmc 10010000.mmc: Unexpected interrupt latency
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 32.
usb usb1-port1: couldn't allocate usb_device
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 32.
usb usb2-port1: couldn't allocate usb_device

Paul E. McKenney reported [1] that commit 1f0723a ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags") results in the lockdep complaint: ====================================================== WARNING: possible circular locking dependency detected 5.12.0+ #15 Not tainted ------------------------------------------------------ rcu_torture_sta/109 is trying to acquire lock: ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20 but task is already holding lock: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (slab_mutex){+.+.}-{3:3}: lock_acquire+0xb9/0x3a0 __mutex_lock+0x8d/0x920 slub_cpu_dead+0x15/0xf0 cpuhp_invoke_callback+0x17a/0x7c0 cpuhp_invoke_callback_range+0x3b/0x80 _cpu_down+0xdf/0x2a0 cpu_down+0x2c/0x50 device_offline+0x82/0xb0 remove_cpu+0x1a/0x30 torture_offline+0x80/0x140 torture_onoff+0x147/0x260 kthread+0x10a/0x140 ret_from_fork+0x22/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: check_prev_add+0x8f/0xbf0 __lock_acquire+0x13f0/0x1d80 lock_acquire+0xb9/0x3a0 cpus_read_lock+0x21/0xa0 static_key_enable+0x9/0x20 __kmem_cache_create+0x38d/0x430 kmem_cache_create_usercopy+0x146/0x250 kmem_cache_create+0xd/0x10 rcu_torture_stats+0x79/0x280 kthread+0x10a/0x140 ret_from_fork+0x22/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(cpu_hotplug_lock); lock(slab_mutex); lock(cpu_hotplug_lock); *** DEADLOCK *** 1 lock held by rcu_torture_sta/109: #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250 stack backtrace: CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: dump_stack+0x6d/0x89 check_noncircular+0xfe/0x110 ? lock_is_held_type+0x98/0x110 check_prev_add+0x8f/0xbf0 __lock_acquire+0x13f0/0x1d80 lock_acquire+0xb9/0x3a0 ? static_key_enable+0x9/0x20 ? mark_held_locks+0x49/0x70 cpus_read_lock+0x21/0xa0 ? static_key_enable+0x9/0x20 static_key_enable+0x9/0x20 __kmem_cache_create+0x38d/0x430 kmem_cache_create_usercopy+0x146/0x250 ? rcu_torture_stats_print+0xd0/0xd0 kmem_cache_create+0xd/0x10 rcu_torture_stats+0x79/0x280 ? rcu_torture_stats_print+0xd0/0xd0 kthread+0x10a/0x140 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 This is because there's one order of locking from the hotplug callbacks: lock(cpu_hotplug_lock); // from hotplug machinery itself lock(slab_mutex); // in e.g. slab_mem_going_offline_callback() And commit 1f0723a made the reverse sequence possible: lock(slab_mutex); // in kmem_cache_create_usercopy() lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable() The simplest fix is to move static_key_enable() to a place before slab_mutex is taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it at least self-contained and obvious. [1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/ Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz Fixes: 1f0723a ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MichaelZhuxx · 2021-05-26T13:08:12Z

I got an interesting finds that hartid will be zero in opensbi booting infomation and I2C and on-chip temperature sensor work not normally in kernel log when every time system get stuck.

here is findings when get stuck

Domain0 Boot HART         : 0
Boot HART ID              : 0

[    4.003090] i2c_designware 118b0000.i2c: controller timed out
[    4.008908] tda998x 0-0070: Error -110 writing to cec:0xff
[    4.014439] tda998x 0-0070: tda998x is not ready!
[    4.726129] temp(c): -81.-10

here is the information that the system started successfully

Domain0 Boot HART         : 1
Boot HART ID              : 1

[    3.121276] tda998x 0-0070: found TDA19988
[    3.867158] temp(c): 38.69
[    6.177588]  mmcblk0: p1 p2

Heard from our SoC team, the hartid is zero as default before running into kernel.
maybe there are some potential issues in opensbi, I guess.
@tekkamanninja

esmil · 2021-05-26T16:12:46Z

@MichaelZhuxx Very nice find! Is hart 0 the E24 co-processor, because then it's no wonder everything is broken when that is chosen as boot hart.

kraj · 2021-05-26T16:29:14Z

I am using fedora branch of starfive fork of openSBI

https://github.com/riscv/meta-riscv/blob/master/recipes-bsp/opensbi/opensbi_starfive-0.8.0.bb#L12

I wonder if we should switch to using upstream version now that support has landed there ?

pdp7 · 2021-05-26T19:16:37Z

I am using fedora branch of starfive fork of openSBI

https://github.com/riscv/meta-riscv/blob/master/recipes-bsp/opensbi/opensbi_starfive-0.8.0.bb#L12

I wonder if we should switch to using upstream version now that support has landed there ?

I opened starfive-tech/opensbi#4 to see if we can switch. Hopefully @lbmeng and @tekkamanninja can comment

pdp7 · 2021-05-29T01:38:27Z

@MichaelZhuxx any updates from the investigation you have been doing into this issue?

mcd500 · 2021-05-31T01:07:39Z

I realized that this is the same issue. starfive-tech/Fedora_on_StarFive#40
Closed it at the Fedora_on_StarFive/issues/40.

mcd500 · 2021-05-31T04:59:41Z

@pdp7 @esmil @davidlt @geertu I made the PR to fix the boot hang. Please see the
esmil/linux#15 <- found later, this is not needed.
and
davidlt/opensbi#1

It prevent choosing hart0 which is e24 and booting from hart1 or hart2 from u74 in the opensbi lottery selection.
Which makes a bit complicate it is that the fix in opensbi require the dtsi fix in linux at the same time.
This is the log of opensbi.

OpenSBI v0.5-407-g7e1952c
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : StarFive VIC7100
Platform Features         : timer,mfdeleg
Platform HART Count       : 2
Firmware Base             : 0x80000000
Firmware Size             : 112 KB
Runtime SBI Version       : 0.3

Domain0 Name              : root
Domain0 Boot HART         : 1
Domain0 HARTs             : 1*,2*
Domain0 Region00          : 0x0000000080000000-0x000000008001ffff ()
Domain0 Region01          : 0x0000000000000000-0x0000007fffffffff (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000088000000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes

Boot HART ID              : 1
Boot HART Domain          : root
Boot HART ISA             : rv64imafdcsux
Boot HART Features        : scounteren,mcounteren
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4096
Boot HART PMP Address Bits: 36
Boot HART MHPM Count      : 0
Boot HART MHPM Count      : 0
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109


U-Boot 2021.04-13662-ga18449847e (May 26 2021 - 20:54:21 +0900)

CPU:   rv64imafdc
DRAM:  8 GiB
MMC:   sdio0@10000000: 0, sdio1@10010000: 1
Net:   dwmac.10020000
Autoboot in 2 seconds
MMC CD is 0x0, force to True.
MMC CD is 0x0, force to True.
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found /extlinux/extlinux.conf
Retrieving file: /extlinux/extlinux.conf
1964 bytes read in 9 ms (212.9 KiB/s)
Fedora-riscv64-vic7100-xfce-dev-Rawhide-20210516233526.n.0 Boot Options.
1:      debug
2:      Custom-kernel aligh
3:      Custom-kernel sd-aligh
4:      Custom-kernel sd
5:      Fedora-riscv64-vic7100-xfce-dev-Rawhide-20210516233526.n.0 (5.10.6-210.0.riscv64.)
Enter choice: 1:        debug
Retrieving file: /initramfs-5.10.6-210.0.riscv64.fc33.riscv64.img
16475287 bytes read in 3446 ms (4.6 MiB/s)
Retrieving file: /Image.gz
6294544 bytes read in 1319 ms (4.6 MiB/s)
append: ro root=UUID=ae1e722a-d01b-4cdc-ab56-7b68abcdd0fe rhgb console=tty0 console=ttyS08
Retrieving file: /jh7100-starlight.dtb
17366 bytes read in 10 ms (1.7 MiB/s)
   Uncompressing Kernel Image
Moving Image from 0x84000000 to 0x80200000, end=817c8000
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Using Device Tree in place at 0000000088000000, end 00000000880073d5

Starting kernel ...

Linux version 5.13.0-rc3-starlight-201461-gce0d090ee714 (akirat@ryzen-micro) (riscv64-unk1
OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
Machine model: BeagleV Starlight Beta
efi: UEFI not found.
Reserved memory: created CMA memory pool at 0x00000000a0000000, size 640 MiB
OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
Reserved memory: created DMA memory pool at 0x00000000f9000000, size 16 MiB
OF: reserved mem: initialized node framebuffer@f9000000, compatible id shared-dma-pool
Reserved memory: created DMA memory pool at 0x00000000fb000000, size 32 MiB
OF: reserved mem: initialized node framebuffer@fb000000, compatible id shared-dma-pool
Zone ranges:
  DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000027fffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000080200000-0x00000000f8ffffff]
  node   0: [mem 0x00000000f9000000-0x00000000f9ffffff]
  node   0: [mem 0x00000000fa000000-0x00000000faffffff]
  node   0: [mem 0x00000000fb000000-0x00000000fcffffff]
  node   0: [mem 0x00000000fd000000-0x000000027fffffff]
Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
SBI specification v0.3 detected
SBI implementation ID=0x1 Version=0x9
SBI TIME extension detected
SBI IPI extension detected
SBI RFENCE extension detected
software IO TLB: mapped [mem 0x000000009c000000-0x00000000a0000000] (64MB)
SBI v0.2 HSM extension detected
riscv: ISA extensions acdfim
riscv: ELF capabilities acdfim
percpu: Embedded 25 pages/cpu s70808 r0 d31592 u102400
Built 1 zonelists, mobility grouping on.  Total pages: 2067975
Kernel command line: ro root=UUID=ae1e722a-d01b-4cdc-ab56-7b68abcdd0fe rhgb console=tty0 8
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 6728892K/8386560K available (8281K kernel code, 5427K rwdata, 4096K rodata, 2197K)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
trace event string verifier disabled
rcu: Hierarchical RCU implementation.
rcu:    RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
        Tracing variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
riscv-intc: 64 local interrupts mapped
plic: disable L2 cache irq 128 in plic
plic: disable L2 cache irq 131 in plic
plic: disable L2 cache irq 129 in plic
plic: disable L2 cache irq 130 in plic
plic: disable L2 cache irq 128 in plic
plic: disable L2 cache irq 131 in plic
plic: disable L2 cache irq 129 in plic
plic: disable L2 cache irq 130 in plic
plic: interrupt-controller@c000000: mapped 127 interrupts with 2 handlers for 4 contexts.
riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [1]
clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x171024e6b, max_idls
sched_clock: 64 bits at 6MHz, resolution 160ns, wraps every 4398046511040ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 12.50 BogoMIPS)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
ASID allocator disabled
rcu: Hierarchical SRCU implementation.
EFI services will not be available.
smp: Bringing up secondary CPUs ...
CPU1: failed to start
smp: Brought up 1 node, 1 CPU
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc.isra.0+0x50/0x180 with crng_init=0
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100s
futex hash table entries: 512 (order: 3, 32768 bytes, linear)
NET: Registered protocol family 16
thermal_sys: Registered thermal governor 'step_wise'
raid6: skip pq benchmark and using algorithm int64x8
raid6: using intx1 recovery algorithm
starfive_gpio 11910000.gpio: StarFive GPIO chip registered 0 GPIOs
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
i2c_designware 118b0000.i2c: running with gpio recovery mode! scl,sda
i2c_designware 118c0000.i2c: running with gpio recovery mode! scl,sda
i2c_designware 12450000.i2c: running with gpio recovery mode! scl,sda
clocksource: Switched to clocksource riscv_clocksource
NET: Registered protocol family 2
IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)
tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear)
TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
TCP: Hash tables configured (established 65536 bind 65536)
UDP hash table entries: 4096 (order: 5, 131072 bytes, linear)
UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Initialise system trusted keyrings
Unpacking initramfs...
workingset: timestamp_bits=62 max_order=21 bucket_order=0
xor: measuring software checksum speed
   8regs           :  2187 MB/sec
   8regs_prefetch  :  2183 MB/sec
   32regs          :  2343 MB/sec
   32regs_prefetch :  2337 MB/sec
xor: using function: 32regs (2343 MB/sec)
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler bfq registered
starfive,vpp-lcdc 12000000.sfivefb: starfive_fb_probe
panel_name: tda_998x_1080p
of_parse_rd_cmd_info: failed, key=id_read_cmd_info
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info
of_parse_wr_cmd: failed, key=panel-on-command
of_parse_wr_cmd: failed, key=panel-off-command
of_parse_reset_seq:707, error reading reset-sequence, length found = 0
panel_name: seeed_5_inch
of_parse_rd_cmd_info: failed, key=id_read_cmd_info
of_parse_rd_cmd_info: failed, key=esd_read_cmd_info
of_parse_wr_cmd: failed, key=panel-on-command
of_parse_wr_cmd: failed, key=panel-off-command
of_parse_reset_seq:707, error reading reset-sequence, length found = 0
starfive,vpp-lcdc 12000000.sfivefb: select displayer: tda_998x_1080p
starfive,vpp-lcdc 12000000.sfivefb: DDR to LCDC
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx
Console: switching to colour frame buffer device 240x67
starfive,vpp-lcdc 12000000.sfivefb: sf_fb_set_addr NO use PPx
select displayer: tda_998x_1080p
tda998x 0-0070: found TDA19988
random: fast init done
dw_axi_dmac_platform 100b0000.dma-controller: DesignWare AXI DMA Controller, 4 channels
dw_axi_dmac_platform 10500000.dma-controller: DesignWare AXI DMA Controller, 16 channels
L2CACHE: No. of Banks in the cache: 2
L2CACHE: No. of ways per bank: 16
L2CACHE: Sets per bank: 1024
L2CACHE: Bytes per cache block: 64
L2CACHE: Index of the largest way enabled: 15
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
dw-apb-uart 11870000.serial: detected caps 00000000 should be 00000100
11870000.serial: ttyS1 at MMIO 0x11870000 (irq = 6, base_baud = 4640625) is a 16550
12440000.serial: ttyS0 at MMIO 0x12440000 (irq = 7, base_baud = 6250000) is a 16550A
Freeing initrd memory: 16088K
printk: console [ttyS0] enabled
vic-rng 118d0000.trng: Initialized
loop: module loaded
zram: Added device: zram0
tps65086 0-005e: Device: TPS650861, OTP: A, Rev: 0
dw_spi_mmio 12410000.spi: DMA init failed
wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reser.
libphy: Fixed MDIO Bus: probed
tun: Universal TUN/TAP device driver, 1.6
stmmaceth 10020000.ethernet: IRQ eth_lpi not found
stmmaceth 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
stmmaceth 10020000.ethernet: PTP uses main clock
stmmaceth 10020000.ethernet: User ID: 0x59, Synopsys ID: 0x37
stmmaceth 10020000.ethernet:    DWMAC1000
stmmaceth 10020000.ethernet: DMA HW capability register supported
stmmaceth 10020000.ethernet: RX Checksum Offload Engine supported
stmmaceth 10020000.ethernet: COE Type 2
stmmaceth 10020000.ethernet: Wake-Up On Lan supported
stmmaceth 10020000.ethernet: Enhanced/Alternate descriptors
stmmaceth 10020000.ethernet: Enabled extended descriptors
stmmaceth 10020000.ethernet: Chain mode enabled
stmmaceth 10020000.ethernet: Enable RX Mitigation via HW Watchdog Timer
stmmaceth 10020000.ethernet: device MAC address 2c:f7:f1:1b:e4:44
libphy: stmmac: probed
Micrel KSZ9031 Gigabit PHY stmmac-0:07: attached PHY driver (mii_bus:phy_addr=stmmac-0:07)
xhci-hcd xhci-hcd.3.auto: xHCI Host Controller
xhci-hcd xhci-hcd.3.auto: new USB bus registered, assigned bus number 1
xhci-hcd xhci-hcd.3.auto: hcc params 0x200073c8 hci version 0x100 quirks 0x000000200001800
xhci-hcd xhci-hcd.3.auto: irq 11, io mem 0x104d0000
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.13
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: xHCI Host Controller
usb usb1: Manufacturer: Linux 5.13.0-rc3-starlight-201461-gce0d090ee714 xhci-hcd
usb usb1: SerialNumber: xhci-hcd.3.auto
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
xhci-hcd xhci-hcd.3.auto: xHCI Host Controller
xhci-hcd xhci-hcd.3.auto: new USB bus registered, assigned bus number 2
xhci-hcd xhci-hcd.3.auto: Host supports USB 3.0 SuperSpeed
usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.13
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: xHCI Host Controller
usb usb2: Manufacturer: Linux 5.13.0-rc3-starlight-201461-gce0d090ee714 xhci-hcd
usb usb2: SerialNumber: xhci-hcd.3.auto
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
usbcore: registered new interface driver uas
usbcore: registered new interface driver usb-storage
i2c /dev entries driver
sfctemp 124a0000.tmon: hwmon0: sensor '124a0000.tmon'
Synopsys Designware Multimedia Card Interface Driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
NET: Registered protocol family 10
usb 1-1: new high-speed USB device number 2 using xhci-hcd
dw_mmc 10000000.mmc: IDMAC supports 32-bit address mode.
dw_mmc 10000000.mmc: Using internal DMA controller.
dw_mmc 10000000.mmc: Version ID is 290a
dw_mmc 10000000.mmc: DW MMC controller at irq 29,32 bit host data width,32 deep fifo
mmc_host mmc0: card is polling.
dw_mmc 10010000.mmc: IDMAC supports 32-bit address mode.
dw_mmc 10010000.mmc: Using internal DMA controller.
dw_mmc 10010000.mmc: Version ID is 290a
dw_mmc 10010000.mmc: DW MMC controller at irq 30,32 bit host data width,32 deep fifo
dw_mmc 10010000.mmc: allocated mmc-pwrseq
mmc_host mmc1: card is non-removable.
mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div =)
Segment Routing with IPv6
NET: Registered protocol family 17
Loading compiled-in X.509 certificates
mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div =)
Btrfs loaded, crc32c=crc32c-generic, zoned=no
cfg80211: Loading compiled-in X.509 certificates for regulatory database
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
cfg80211: failed to load regulatory.db
dw-apb-uart 12440000.serial: forbid DMA for kernel console
Freeing unused kernel memory: 2196K
Checked W+X mappings: passed, no W+X pages found
rodata_test: all tests were successful
Run /init as init process
usb 1-1: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice= 4.02
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1: Product: USB2.0 Hub
usb 1-1: Manufacturer: GenesysLogic
mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
mmc_host mmc0: Bus speed (slot 0) = 100000000Hz (slot req 50000000Hz, actual 50000000HZ d)
usb 2-1: new SuperSpeed USB device number 2 using xhci-hcd
mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
mmc0: new high speed SDHC card at address 1234
usb 2-1: New USB device found, idVendor=05e3, idProduct=0616, bcdDevice= 4.02
usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 2-1: Product: USB3.0 Hub
usb 2-1: Manufacturer: GenesysLogic
systemd[1]: System time before build time, advancing clock.
mmcblk0: mmc0:1234 SA16G 14.4 GiB 
mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
systemd[1]: systemd v246.6-3.0.riscv64.fc33 running in system mode. (+PAM +AUDIT +SELINUX)
systemd[1]: Detected architecture riscv64.
hub 1-1:1.0: USB hub found
 mmcblk0: p1 p2
mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
hub 1-1:1.0: 4 ports detected
hub 2-1:1.0: USB hub found
systemd[1]: Running in initial RAM disk.
hub 2-1:1.0: 4 ports detected

Welcome to Fedora 33 (Rawhide) dracut-050-26.git20200316.mmc1: queuing unknown CIS tuple )
0.riscv64.fc33 (Initramfs)!

mcd500 · 2021-05-31T12:08:09Z

It turns out that it is not that simple.
Only davidlt/opensbi#2 is required to make the opensbi not to select hart0.
The esmil/linux#15 was not needed.
However, with the patch davidlt/opensbi#2, I found that it will reduce the performance of the network.
This is the logs.

with the patch on opensbi

--------------------------
OpenSBI v0.5-408-gb846c80
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : StarFive VIC7100
Platform Features         : timer,mfdeleg
Platform HART Count       : 2
Firmware Base             : 0x80000000
Firmware Size             : 108 KB
Runtime SBI Version       : 0.3

Domain0 Name              : root
Domain0 Boot HART         : 1
Domain0 HARTs             : 1*,2*
Domain0 Region00          : 0x0000000080000000-0x000000008001ffff ()
Domain0 Region01          : 0x0000000000000000-0x0000007fffffffff (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000088000000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes

Boot HART ID              : 1
Boot HART Domain          : root
Boot HART ISA             : rv64imafdcsux
Boot HART Features        : scounteren,mcounteren
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4096
Boot HART PMP Address Bits: 36
Boot HART MHPM Count      : 0
Boot HART MHPM Count      : 0
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109


U-Boot 2021.04-13662-ga18449847e (May 31 2021 - 19:31:31 +0900)

CPU:   rv64imafdc
DRAM:  8 GiB
MMC:   sdio0@10000000: 0, sdio1@10010000: 1
Net:   dwmac.10020000
Autoboot in 2 seconds
MMC CD is 0x0, force to True.
MMC CD is 0x0, force to True.
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found /extlinux/extlinux.conf
Retrieving file: /extlinux/extlinux.conf
1964 bytes read in 9 ms (212.9 KiB/s)

but this is the network performance

[root@fedora-starfive ~]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44560
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44562
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec  72.6 MBytes   606 Mbits/sec                  
[  5]   1.01-2.00   sec  74.8 MBytes   631 Mbits/sec                  
[  5]   2.00-3.00   sec  74.1 MBytes   622 Mbits/sec                  
[  5]   3.00-4.01   sec  53.7 MBytes   449 Mbits/sec                  
[  5]   4.01-5.00   sec  74.7 MBytes   630 Mbits/sec                  
[  5]   5.00-6.00   sec  74.8 MBytes   627 Mbits/sec                  
[  5]   6.00-7.00   sec  74.4 MBytes   623 Mbits/sec                  
[  5]   7.00-8.00   sec  74.6 MBytes   626 Mbits/sec                  
[  5]   8.00-9.01   sec  75.1 MBytes   627 Mbits/sec                  
[  5]   9.01-10.00  sec  72.3 MBytes   609 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   721 MBytes   605 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
stmmaceth 10020000.ethernet eth0: Link is Down
stmmaceth 10020000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Accepted connection from 192.168.1.153, port 44564
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44566
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  72.8 MBytes   610 Mbits/sec                  
[  5]   1.00-2.00   sec  75.3 MBytes   633 Mbits/sec                  
[  5]   2.00-3.00   sec  73.3 MBytes   615 Mbits/sec                  
[  5]   3.00-4.01   sec  74.9 MBytes   625 Mbits/sec                  
[  5]   4.01-5.00   sec  48.6 MBytes   409 Mbits/sec                  
[  5]   5.00-6.00   sec  50.0 MBytes   420 Mbits/sec                  
[  5]   6.00-7.00   sec  51.9 MBytes   436 Mbits/sec                  
[  5]   7.00-8.00   sec  59.2 MBytes   496 Mbits/sec                  
[  5]   8.00-9.00   sec  59.0 MBytes   495 Mbits/sec                  
[  5]   9.00-10.00  sec  59.1 MBytes   495 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   624 MBytes   523 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44568
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44570
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  72.9 MBytes   611 Mbits/sec                  
[  5]   1.00-2.00   sec  74.8 MBytes   627 Mbits/sec                  
[  5]   2.00-3.00   sec  74.5 MBytes   626 Mbits/sec                  
[  5]   3.00-4.00   sec  75.2 MBytes   631 Mbits/sec                  
[  5]   4.00-5.00   sec  75.0 MBytes   629 Mbits/sec                  
[  5]   5.00-6.00   sec  73.3 MBytes   616 Mbits/sec                  
[  5]   6.00-7.00   sec  74.7 MBytes   626 Mbits/sec                  
[  5]   7.00-8.00   sec  74.6 MBytes   626 Mbits/sec                  
[  5]   8.00-9.00   sec  74.1 MBytes   621 Mbits/sec                  
[  5]   9.00-10.00  sec  74.9 MBytes   628 Mbits/sec                  
[  5]  10.00-10.00  sec   506 KBytes  1.43 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   745 MBytes   624 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44572
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44574
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  74.0 MBytes   621 Mbits/sec                  
[  5]   1.00-2.01   sec  75.1 MBytes   626 Mbits/sec                  
[  5]   2.01-3.00   sec  74.7 MBytes   629 Mbits/sec                  
[  5]   3.00-4.00   sec  67.0 MBytes   561 Mbits/sec                  
[  5]   4.00-5.00   sec  55.9 MBytes   469 Mbits/sec                  
[  5]   5.00-6.00   sec  59.0 MBytes   495 Mbits/sec                  
[  5]   6.00-7.00   sec  59.0 MBytes   495 Mbits/sec                  
[  5]   7.00-8.00   sec  59.1 MBytes   495 Mbits/sec                  
[  5]   8.00-9.00   sec  71.5 MBytes   602 Mbits/sec                  
[  5]   9.00-10.01  sec  74.4 MBytes   621 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec   670 MBytes   561 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44576
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44578
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  72.8 MBytes   611 Mbits/sec                  
[  5]   1.00-2.00   sec  74.1 MBytes   621 Mbits/sec                  
[  5]   2.00-3.00   sec  74.3 MBytes   623 Mbits/sec                  
[  5]   3.00-4.00   sec  74.3 MBytes   624 Mbits/sec                  
[  5]   4.00-5.00   sec  74.6 MBytes   626 Mbits/sec                  
[  5]   5.00-6.00   sec  74.4 MBytes   624 Mbits/sec                  
[  5]   6.00-7.00   sec  74.4 MBytes   625 Mbits/sec                  
[  5]   7.00-8.00   sec  74.7 MBytes   624 Mbits/sec                  
[  5]   8.00-9.00   sec  74.3 MBytes   623 Mbits/sec                  
[  5]   9.00-10.00  sec  74.6 MBytes   626 Mbits/sec                  
[  5]  10.00-10.01  sec   506 KBytes  1.06 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec   743 MBytes   623 Mbits/sec                  receiver
-----------------------------------------------------------

Before patching the opensbi:

--------------------
OpenSBI v0.5-407-g7e1952c
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : StarFive VIC7100
Platform Features         : timer,mfdeleg
Platform HART Count       : 2
Firmware Base             : 0x80000000
Firmware Size             : 108 KB
Runtime SBI Version       : 0.3

Domain0 Name              : root
Domain0 Boot HART         : 1
Domain0 HARTs             : 0*,1*
Domain0 Region00          : 0x0000000080000000-0x000000008001ffff ()
Domain0 Region01          : 0x0000000000000000-0x0000007fffffffff (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000088000000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes

Boot HART ID              : 1
Boot HART Domain          : root
Boot HART ISA             : rv64imafdcsux
Boot HART Features        : scounteren,mcounteren
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4096
Boot HART PMP Address Bits: 36
Boot HART MHPM Count      : 0
Boot HART MHPM Count      : 0
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109


U-Boot 2021.04-13662-ga18449847e (May 31 2021 - 20:11:58 +0900)

CPU:   rv64imafdc
DRAM:  8 GiB
MMC:   sdio0@10000000: 0, sdio1@10010000: 1
Net:   dwmac.10020000
Autoboot in 2 seconds
MMC CD is 0x0, force to True.
MMC CD is 0x0, force to True.
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found /extlinux/extlinux.conf
Retrieving file: /extlinux/extlinux.conf
1964 bytes read in 9 ms (212.9 KiB/s)

The network performance is good.

---------------------
root@fedora-starfive ~]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44608
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44610
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   100 MBytes   841 Mbits/sec                  
[  5]   1.00-2.00   sec   107 MBytes   894 Mbits/sec                  
[  5]   2.00-3.00   sec   102 MBytes   854 Mbits/sec                  
[  5]   3.00-4.00   sec   103 MBytes   860 Mbits/sec                  
[  5]   4.00-5.00   sec   107 MBytes   894 Mbits/sec                  
[  5]   5.00-6.00   sec   108 MBytes   903 Mbits/sec                  
[  5]   6.00-7.00   sec   107 MBytes   900 Mbits/sec                  
[  5]   7.00-8.00   sec   104 MBytes   873 Mbits/sec                  
[  5]   8.00-9.00   sec   107 MBytes   901 Mbits/sec                  
[  5]   9.00-10.00  sec   106 MBytes   885 Mbits/sec                  
[  5]  10.00-10.01  sec   478 KBytes   801 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  1.03 GBytes   881 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.153, port 44612
[  5] local 192.168.1.112 port 5201 connected to 192.168.1.153 port 44614
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   105 MBytes   879 Mbits/sec                  
[  5]   1.00-2.00   sec   108 MBytes   904 Mbits/sec                  
[  5]   2.00-3.00   sec   107 MBytes   901 Mbits/sec                  
[  5]   3.00-4.00   sec   108 MBytes   902 Mbits/sec                  
[  5]   4.00-5.00   sec   108 MBytes   906 Mbits/sec                  
[  5]   5.00-6.00   sec   107 MBytes   900 Mbits/sec                  
[  5]   6.00-7.00   sec   108 MBytes   906 Mbits/sec                  
[  5]   7.00-8.00   sec   108 MBytes   904 Mbits/sec                  
[  5]   8.00-9.00   sec   108 MBytes   902 Mbits/sec                  
[  5]   9.00-10.00  sec   108 MBytes   905 Mbits/sec                  
[  5]  10.00-10.01  sec   612 KBytes   902 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  1.05 GBytes   901 Mbits/sec                  receiver
-----------------------------------------------------------

The kernel is build with unalign access optimization at here https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved-misaligh-ok-old

It require more investigation. It looks likt it is related to exception handling.

pdp7 · 2021-06-03T01:25:28Z

Known good and bad cards: https://forum.beagleboard.org/t/known-good-sd-cards/29958
@MichaelZhuxx @tekkamanninja please take a look

pdp7 · 2021-06-03T01:28:25Z

@kraj have you tried changing the frequency in the DTS for SD card down lower that they default? StarFive is interested if that might help. Also, do you have a product link for the card? StarFive would like to get a card that is exhibiting this problem

kraj · 2021-06-03T01:47:45Z

I think we have a solution for this riscv/meta-riscv#294
I am currently running this locally under a reboot cycle test http://bec-systems.com/site/942/running-a-reboot-cycle-test-shell-script-with-systemd

its been going well, it has been successfully rebooting for past 1 hr ( 111 reboots )

root@beaglev-starlight-jh7100:~# cat /cycle-count 
111
root@beaglev-starlight-jh7100:~#

lbmeng · 2021-06-03T02:00:12Z

@kraj That's good news! Thanks for testing.

pdp7 · 2021-06-03T02:52:00Z

I think we have a solution for this riscv/meta-riscv#294
I am currently running this locally under a reboot cycle test http://bec-systems.com/site/942/running-a-reboot-cycle-test-shell-script-with-systemd

its been going well, it has been successfully rebooting for past 1 hr ( 111 reboots )
root@beaglev-starlight-jh7100:~# cat /cycle-count 
111
root@beaglev-starlight-jh7100:~# 

great feedback

@tekkamanninja @MichaelZhuxx please take a look

kraj · 2021-06-03T04:39:40Z

after 4hrs of reboot cycle test and 252 reboots, I think we are good with this issue when we switch to mainline OpenSBI + starfive-tech/u-boot#17

Having said that SD card read/write still might be a problem when system is under heavy load but that problem will be different than the reboot issue.

lbmeng · 2021-06-03T04:53:06Z

Thank you @kraj for the testing. I have just merged starfive-tech/u-boot#17 to the StarFive U-Boot repo.

pdp7 · 2021-06-11T01:36:42Z

It seems this is now resolved. For reference, there is a fw_paylouad.bin.out that people can use in:
https://forum.beagleboard.org/t/new-firmware-boot-hang-is-fixed/29986/3

pdp7 · 2021-06-11T05:17:27Z

@kraj it sounds like the issue is solved but please reopen if SD card problem arises again

[ Upstream commit 99d4850 ] Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 #2 0x556701d70589 in perf_env__cpuid util/env.c:465 #3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 #6 0x556701ef5872 in evlist__config util/record.c:108 #7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 #8 0x556701cacd07 in run_test tests/builtin-test.c:236 #9 0x556701cacfac in test_and_print tests/builtin-test.c:265 #10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 #11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 #12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 #13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 #14 0x556701d3be90 in run_argv tools/perf/perf.c:421 #15 0x556701d3c3f8 in main tools/perf/perf.c:537 #16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20230613235416.1650755-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…attrs() Running kunit test for 6.5-rc1 hits one bug: ok 10 damon_test_update_monitoring_result general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:damon_set_attrs+0xb9/0x120 Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d 8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00 49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89 RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246 RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70 RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78 R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> damon_test_set_attrs+0x63/0x1f0 kunit_generic_run_threadfn_adapter+0x17/0x30 kthread+0xfd/0x130 The problem seems to be related with the damon_ctx was used without being initialized. Fix it by adding the initialization. Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com Fixes: aa13779 ("mm/damon/core-test: add a test for damon_set_attrs()") Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

The cited commit holds encap tbl lock unconditionally when setting up dests. But it may cause the following deadlock: PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8" #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91 #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528 #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core] #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core] #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core] #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core] #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core] #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core] #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core] #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core] #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower] #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower] #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0 [1031218.028143] wait_for_completion+0x24/0x30 [1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core] [1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core] [1031218.029885] process_one_work+0x24e/0x510 Actually no need to hold encap tbl lock if there is no encap action. Fix it by checking if encap action exists or not before holding encap tbl lock. Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as the flow is duplicated to the peer eswitch, the related neighbour information on the peer uplink representor is created as well. In the cited commit, eswitch devcom unpair is moved to uplink unload API, specifically the profile->cleanup_tx. If there is a encap rule offloaded in ECMP mode, when one eswitch does unpair (because of unloading the driver, for instance), and the peer rule from the peer eswitch is going to be deleted, the use-after-free error is triggered while accessing neigh info, as it is already cleaned up in uplink's profile->disable, which is before its profile->cleanup_tx. To fix this issue, move the neigh cleanup to profile's cleanup_tx callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh init is moved to init_tx for symmeter. [ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496 [ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G B 6.4.0-rc7+ #15 [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2453.384335] Call Trace: [ 2453.384625] <TASK> [ 2453.384891] dump_stack_lvl+0x33/0x50 [ 2453.385285] print_report+0xc2/0x610 [ 2453.385667] ? __virt_addr_valid+0xb1/0x130 [ 2453.386091] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.386757] kasan_report+0xae/0xe0 [ 2453.387123] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.387798] mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.388465] mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core] [ 2453.389111] mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core] [ 2453.389706] mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core] [ 2453.390361] mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core] [ 2453.391015] ? complete_all+0x43/0xd0 [ 2453.391398] ? free_flow_post_acts+0x38/0x120 [mlx5_core] [ 2453.392004] mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core] [ 2453.392618] mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core] [ 2453.393276] mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core] [ 2453.393925] mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core] [ 2453.394546] ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core] [ 2453.395268] ? down_write+0xaa/0x100 [ 2453.395652] mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core] [ 2453.396317] mlx5_devcom_send_event+0xbb/0x190 [mlx5_core] [ 2453.396917] mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core] [ 2453.397582] mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core] [ 2453.398182] mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core] [ 2453.398768] mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core] [ 2453.399367] mlx5e_detach_netdev+0xee/0x120 [mlx5_core] [ 2453.399957] mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core] [ 2453.400598] mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core] [ 2453.403781] mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core] [ 2453.404479] ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core] [ 2453.405170] ? up_write+0x39/0x60 [ 2453.405529] ? kernfs_remove_by_name_ns+0xb7/0xe0 [ 2453.405985] auxiliary_bus_remove+0x2e/0x40 [ 2453.406405] device_release_driver_internal+0x243/0x2d0 [ 2453.406900] ? kobject_put+0x42/0x2d0 [ 2453.407284] bus_remove_device+0x128/0x1d0 [ 2453.407687] device_del+0x240/0x550 [ 2453.408053] ? waiting_for_supplier_show+0xe0/0xe0 [ 2453.408511] ? kobject_put+0xfa/0x2d0 [ 2453.408889] ? __kmem_cache_free+0x14d/0x280 [ 2453.409310] mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core] [ 2453.409973] mlx5_unregister_device+0x40/0x50 [mlx5_core] [ 2453.410561] mlx5_uninit_one+0x3d/0x110 [mlx5_core] [ 2453.411111] remove_one+0x89/0x130 [mlx5_core] [ 2453.411628] pci_device_remove+0x59/0xf0 [ 2453.412026] device_release_driver_internal+0x243/0x2d0 [ 2453.412511] ? parse_option_str+0x14/0x90 [ 2453.412915] driver_detach+0x7b/0xf0 [ 2453.413289] bus_remove_driver+0xb5/0x160 [ 2453.413685] pci_unregister_driver+0x3f/0xf0 [ 2453.414104] mlx5_cleanup+0xc/0x20 [mlx5_core] Fixes: 2be5bd4 ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

[ Upstream commit 8a96c02 ] When ring_buffer_swap_cpu was called during resize process, the cpu buffer was swapped in the middle, resulting in incorrect state. Continuing to run in the wrong state will result in oops. This issue can be easily reproduced using the following two scripts: /tmp # cat test1.sh //#! /bin/sh for i in `seq 0 100000` do echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb sleep 0.5 echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb sleep 0.5 done /tmp # cat test2.sh //#! /bin/sh for i in `seq 0 100000` do echo irqsoff > /sys/kernel/debug/tracing/current_tracer sleep 1 echo nop > /sys/kernel/debug/tracing/current_tracer sleep 1 done /tmp # ./test1.sh & /tmp # ./test2.sh & A typical oops log is as follows, sometimes with other different oops logs. [ 231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8 [ 231.713375] Modules linked in: [ 231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 starfive-tech#15 [ 231.716750] Hardware name: linux,dummy-virt (DT) [ 231.718152] Workqueue: events update_pages_handler [ 231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 231.721171] pc : rb_update_pages+0x378/0x3f8 [ 231.722212] lr : rb_update_pages+0x25c/0x3f8 [ 231.723248] sp : ffff800082b9bd50 [ 231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000 [ 231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0 [ 231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a [ 231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000 [ 231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510 [ 231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002 [ 231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558 [ 231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001 [ 231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000 [ 231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208 [ 231.744196] Call trace: [ 231.744892] rb_update_pages+0x378/0x3f8 [ 231.745893] update_pages_handler+0x1c/0x38 [ 231.746893] process_one_work+0x1f0/0x468 [ 231.747852] worker_thread+0x54/0x410 [ 231.748737] kthread+0x124/0x138 [ 231.749549] ret_from_fork+0x10/0x20 [ 231.750434] ---[ end trace 0000000000000000 ]--- [ 233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 233.721696] Mem abort info: [ 233.721935] ESR = 0x0000000096000004 [ 233.722283] EC = 0x25: DABT (current EL), IL = 32 bits [ 233.722596] SET = 0, FnV = 0 [ 233.722805] EA = 0, S1PTW = 0 [ 233.723026] FSC = 0x04: level 0 translation fault [ 233.723458] Data abort info: [ 233.723734] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 233.724176] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 233.724589] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000 [ 233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 233.726720] Modules linked in: [ 233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 starfive-tech#15 [ 233.727777] Hardware name: linux,dummy-virt (DT) [ 233.728225] Workqueue: events update_pages_handler [ 233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 233.729054] pc : rb_update_pages+0x1a8/0x3f8 [ 233.729334] lr : rb_update_pages+0x154/0x3f8 [ 233.729592] sp : ffff800082b9bd50 [ 233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000 [ 233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418 [ 233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003 [ 233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58 [ 233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001 [ 233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000 [ 233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c [ 233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0 [ 233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 [ 233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000 [ 233.734418] Call trace: [ 233.734593] rb_update_pages+0x1a8/0x3f8 [ 233.734853] update_pages_handler+0x1c/0x38 [ 233.735148] process_one_work+0x1f0/0x468 [ 233.735525] worker_thread+0x54/0x410 [ 233.735852] kthread+0x124/0x138 [ 233.736064] ret_from_fork+0x10/0x20 [ 233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060) [ 233.736959] ---[ end trace 0000000000000000 ]--- After analysis, the seq of the error is as follows [1-5]: int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size, int cpu_id) { for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; //1. get cpu_buffer, aka cpu_buffer(A) ... ... schedule_work_on(cpu, &cpu_buffer->update_pages_work); //2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to // update_pages_handler, do the update process, set 'update_done' in // complete(&cpu_buffer->update_done) and to wakeup resize process. //----> //3. Just at this moment, ring_buffer_swap_cpu is triggered, //cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer. //ring_buffer_swap_cpu is called as the 'Call trace' below. Call trace: dump_backtrace+0x0/0x2f8 show_stack+0x18/0x28 dump_stack+0x12c/0x188 ring_buffer_swap_cpu+0x2f8/0x328 update_max_tr_single+0x180/0x210 check_critical_timing+0x2b4/0x2c8 tracer_hardirqs_on+0x1c0/0x200 trace_hardirqs_on+0xec/0x378 el0_svc_common+0x64/0x260 do_el0_svc+0x90/0xf8 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb8 el0_sync+0x180/0x1c0 //<---- /* wait for all the updates to complete */ for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; //4. get cpu_buffer, cpu_buffer(B) is used in the following process, //the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong. //for example, cpu_buffer(A)->update_done will leave be set 1, and will //not 'wait_for_completion' at the next resize round. if (!cpu_buffer->nr_pages_to_update) continue; if (cpu_online(cpu)) wait_for_completion(&cpu_buffer->update_done); cpu_buffer->nr_pages_to_update = 0; } ... } //5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong, //Continuing to run in the wrong state, then oops occurs. Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn Signed-off-by: Chen Lin <chen.lin5@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit c83ad36 upstream. Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios. And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ starfive-tech#15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something. Link: https://lkml.kernel.org/r/20230904180806.1002832-2-joel@joelfernandes.org Fixes: 98f1808 ("mm: Make mem_dump_obj() handle vmalloc() memory") Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0b0747d upstream. The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 starfive-tech#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 starfive-tech#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 starfive-tech#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 starfive-tech#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 starfive-tech#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 starfive-tech#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 starfive-tech#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 starfive-tech#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 starfive-tech#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 starfive-tech#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 starfive-tech#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 starfive-tech#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 starfive-tech#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 starfive-tech#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 starfive-tech#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 starfive-tech#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 starfive-tech#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 starfive-tech#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 starfive-tech#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 starfive-tech#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 starfive-tech#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 starfive-tech#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 starfive-tech#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 starfive-tech#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 starfive-tech#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 starfive-tech#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…roy() commit 46a9ea6 upstream. After the commit in Fixes:, if a module that created a slab cache does not release all of its allocated objects before destroying the cache (at rmmod time), we might end up releasing the kmem_cache object without removing it from the slab_caches list thus corrupting the list as kmem_cache_destroy() ignores the return value from shutdown_cache(), which in turn never removes the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails to release all of the cache's slabs. This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y as after that ill release the system will immediately trip on list_add, or list_del, assertions similar to the one shown below as soon as another kmem_cache gets created, or destroyed: [ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268) [ 1041.219165] ------------[ cut here ]------------ [ 1041.221517] kernel BUG at lib/list_debug.c:62! [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G B W OE 6.5.0 starfive-tech#15 [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0 Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y, is to set slub_debug to poison the released objects and then just run cat /proc/slabinfo after removing the module that leaks slab objects, in which case the kernel will panic: [ 50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI [ 50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G B W OE 6.5.0 starfive-tech#15 [ 50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 50.972663] RIP: 0010:get_slabinfo+0x42/0xf0 This patch fixes this issue by properly checking shutdown_cache()'s return value before taking the kmem_cache_release() branch. Fixes: 0495e33 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit a154f5f ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee starfive-tech#3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 starfive-tech#4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 starfive-tech#5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c starfive-tech#6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] starfive-tech#7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] starfive-tech#8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f starfive-tech#9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 starfive-tech#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] starfive-tech#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc starfive-tech#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] starfive-tech#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] starfive-tech#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] starfive-tech#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] starfive-tech#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 starfive-tech#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] starfive-tech#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] starfive-tech#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 starfive-tech#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…roy() After the commit in Fixes:, if a module that created a slab cache does not release all of its allocated objects before destroying the cache (at rmmod time), we might end up releasing the kmem_cache object without removing it from the slab_caches list thus corrupting the list as kmem_cache_destroy() ignores the return value from shutdown_cache(), which in turn never removes the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails to release all of the cache's slabs. This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y as after that ill release the system will immediately trip on list_add, or list_del, assertions similar to the one shown below as soon as another kmem_cache gets created, or destroyed: [ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268) [ 1041.219165] ------------[ cut here ]------------ [ 1041.221517] kernel BUG at lib/list_debug.c:62! [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0 Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y, is to set slub_debug to poison the released objects and then just run cat /proc/slabinfo after removing the module that leaks slab objects, in which case the kernel will panic: [ 50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI [ 50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 50.972663] RIP: 0010:get_slabinfo+0x42/0xf0 This patch fixes this issue by properly checking shutdown_cache()'s return value before taking the kmem_cache_release() branch. Fixes: 0495e33 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Fix an error detected by memory sanitizer: ``` ==4033==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6 #1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2 #2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9 #3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32 #4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6 #5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8 #6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8 #7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9 #8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8 #9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15 #10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9 #11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8 #12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5 #13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9 #14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11 #15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8 #16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2 #17 0x55fb0f941dab in main tools/perf/perf.c:535:3 ``` Fixes: 7b723db ("perf pmu: Be lazy about loading event info files from sysfs") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>

The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f starfive-tech#3 [fffffe0000549eb8] do_nmi at ffffffff81079582 starfive-tech#4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- starfive-tech#5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b starfive-tech#6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 starfive-tech#7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 starfive-tech#8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] starfive-tech#9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] starfive-tech#10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] starfive-tech#11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] starfive-tech#12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb starfive-tech#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 starfive-tech#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 starfive-tech#15 [ffff88aa841efb88] device_del at ffffffff82179d23 starfive-tech#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] starfive-tech#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] starfive-tech#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a starfive-tech#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff starfive-tech#20 [ffff88aa841eff10] kthread at ffffffff811d87a0 starfive-tech#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…s_del_by_dev() I got the below warning trace: WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0 Call Trace: rtnl_dellink rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg __sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 entry_SYSCALL_64_after_hwframe It can be repoduced via: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode 0 ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 ip netns exec ns1 ip link set bond_slave_1 master bond0 [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0 [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0 [4] ip netns exec ns1 ip link set bond_slave_1 nomaster [5] ip netns exec ns1 ip link del veth2 ip netns del ns1 This is all caused by command [1] turning off the rx-vlan-filter function of bond0. The reason is the same as commit 01f4fd2 ("bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands [2] [3] add the same vid to slave and master respectively, causing command [4] to empty slave->vlan_info. The following command [5] triggers this problem. To fix this problem, we should add VLAN_FILTER feature checks in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect addition or deletion of vlan_vid information. Fixes: 348a144 ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

…s_del_by_dev() [ Upstream commit 01a564b ] I got the below warning trace: WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ starfive-tech#15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0 Call Trace: rtnl_dellink rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg __sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 entry_SYSCALL_64_after_hwframe It can be repoduced via: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode 0 ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 ip netns exec ns1 ip link set bond_slave_1 master bond0 [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0 [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0 [4] ip netns exec ns1 ip link set bond_slave_1 nomaster [5] ip netns exec ns1 ip link del veth2 ip netns del ns1 This is all caused by command [1] turning off the rx-vlan-filter function of bond0. The reason is the same as commit 01f4fd2 ("bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands [2] [3] add the same vid to slave and master respectively, causing command [4] to empty slave->vlan_info. The following command [5] triggers this problem. To fix this problem, we should add VLAN_FILTER feature checks in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect addition or deletion of vlan_vid information. Fixes: 348a144 ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 697dd49 ] Add a SPDIF audio-graph-card to ROCK Pi 4 device tree. It's not enabled by default since all dma channels are used by the (already) enabled i2s0/1/2 and the pin is muxed with GPIO4_C5 which might be in use already. If enabled SPDIF_TX will be available at pin starfive-tech#15. Signed-off-by: Alex Bee <knaerzche@gmail.com> Link: https://lore.kernel.org/r/20210618181256.27992-6-knaerzche@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e starfive-tech#3 [fffffe00003fced0] do_nmi at ffffffff8922660d starfive-tech#4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 starfive-tech#5 [ffffa655314979e8] io_serial_in at ffffffff89792594 starfive-tech#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 starfive-tech#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 starfive-tech#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 starfive-tech#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 starfive-tech#10 [ffffa65531497ac8] console_unlock at ffffffff89316124 starfive-tech#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 starfive-tech#12 [ffffa65531497b68] printk at ffffffff89318306 starfive-tech#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 starfive-tech#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] starfive-tech#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] starfive-tech#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] starfive-tech#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] starfive-tech#18 [ffffa65531497f10] kthread at ffffffff892d2e72 starfive-tech#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) starfive-tech#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 starfive-tech#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 starfive-tech#5 0x55c94045c776 in ui_browser__show ui/browser.c:288 starfive-tech#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 starfive-tech#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 starfive-tech#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 starfive-tech#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 starfive-tech#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 starfive-tech#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 starfive-tech#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 starfive-tech#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 starfive-tech#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 starfive-tech#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 starfive-tech#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 starfive-tech#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 starfive-tech#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 starfive-tech#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 starfive-tech#20 0x55c9400e53ad in main tools/perf/perf.c:561 starfive-tech#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 starfive-tech#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 starfive-tech#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 starfive-tech#1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) starfive-tech#2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) starfive-tech#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 starfive-tech#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 starfive-tech#5 0x55c94045c776 in ui_browser__show ui/browser.c:288 starfive-tech#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 starfive-tech#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 starfive-tech#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 starfive-tech#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 starfive-tech#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 starfive-tech#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 starfive-tech#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 starfive-tech#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 starfive-tech#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 starfive-tech#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 starfive-tech#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 starfive-tech#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 starfive-tech#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 starfive-tech#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 starfive-tech#20 0x55c9400e53ad in main tools/perf/perf.c:561 starfive-tech#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 starfive-tech#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 starfive-tech#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [starfive-tech#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 starfive-tech#6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) starfive-tech#1 btrfs_drop_extents (fs/btrfs/file.c:411:4) starfive-tech#2 log_one_extent (fs/btrfs/tree-log.c:4732:9) starfive-tech#3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) starfive-tech#4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) starfive-tech#5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) starfive-tech#6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) starfive-tech#7 btrfs_sync_file (fs/btrfs/file.c:1933:8) starfive-tech#8 vfs_fsync_range (fs/sync.c:188:9) starfive-tech#9 vfs_fsync (fs/sync.c:202:9) starfive-tech#10 do_fsync (fs/sync.c:212:9) starfive-tech#11 __do_sys_fdatasync (fs/sync.c:225:9) starfive-tech#12 __se_sys_fdatasync (fs/sync.c:223:1) starfive-tech#13 __x64_sys_fdatasync (fs/sync.c:223:1) starfive-tech#14 do_syscall_x64 (arch/x86/entry/common.c:52:14) starfive-tech#15 do_syscall_64 (arch/x86/entry/common.c:83:7) starfive-tech#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 starfive-tech#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] starfive-tech#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] starfive-tech#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] starfive-tech#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] starfive-tech#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] starfive-tech#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] starfive-tech#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] starfive-tech#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] starfive-tech#11 dio_complete at ffffffff8c2b9fa7 starfive-tech#12 do_blockdev_direct_IO at ffffffff8c2bc09f starfive-tech#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] starfive-tech#14 generic_file_direct_write at ffffffff8c1dcf14 starfive-tech#15 __generic_file_write_iter at ffffffff8c1dd07b starfive-tech#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] starfive-tech#17 aio_write at ffffffff8c2cc72e starfive-tech#18 kmem_cache_alloc at ffffffff8c248dde starfive-tech#19 do_io_submit at ffffffff8c2ccada starfive-tech#20 do_syscall_64 at ffffffff8c004984 starfive-tech#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 starfive-tech#6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) starfive-tech#3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) starfive-tech#4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) starfive-tech#5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) starfive-tech#6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) starfive-tech#7 btrfs_sync_file (fs/btrfs/file.c:1933:8) starfive-tech#8 vfs_fsync_range (fs/sync.c:188:9) starfive-tech#9 vfs_fsync (fs/sync.c:202:9) starfive-tech#10 do_fsync (fs/sync.c:212:9) starfive-tech#11 __do_sys_fdatasync (fs/sync.c:225:9) starfive-tech#12 __se_sys_fdatasync (fs/sync.c:223:1) starfive-tech#13 __x64_sys_fdatasync (fs/sync.c:223:1) starfive-tech#14 do_syscall_x64 (arch/x86/entry/common.c:52:14) starfive-tech#15 do_syscall_64 (arch/x86/entry/common.c:83:7) starfive-tech#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 starfive-tech#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] starfive-tech#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] starfive-tech#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] starfive-tech#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] starfive-tech#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] starfive-tech#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] starfive-tech#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] starfive-tech#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] starfive-tech#11 dio_complete at ffffffff8c2b9fa7 starfive-tech#12 do_blockdev_direct_IO at ffffffff8c2bc09f starfive-tech#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] starfive-tech#14 generic_file_direct_write at ffffffff8c1dcf14 starfive-tech#15 __generic_file_write_iter at ffffffff8c1dd07b starfive-tech#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] starfive-tech#17 aio_write at ffffffff8c2cc72e starfive-tech#18 kmem_cache_alloc at ffffffff8c248dde starfive-tech#19 do_io_submit at ffffffff8c2ccada starfive-tech#20 do_syscall_64 at ffffffff8c004984 starfive-tech#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pdp7 assigned tekkamanninja May 18, 2021

pdp7 added the bug Something isn't working label May 18, 2021

mcd500 mentioned this issue May 31, 2021

USB: sometime fails to detect builtin USB hub starfive-tech/Fedora_on_StarFive#40

Closed

mcd500 mentioned this issue May 31, 2021

dts: jh7100: Fixing boot hang caused by booting from e24 instead of u74 esmil/linux#15

Closed

This was referenced Jun 2, 2021

Revert the OpenSBI and u-boot switch and ignore textrels in fish shell riscv/meta-riscv#293

Closed

Update StarFive U-Boot to work with upstream OpenSBI generic platform starfive-tech/u-boot#17

Merged

This was referenced Jun 3, 2021

System freeze after running disk heavy compile operations with 5.13-rc1 #20

Closed

workqueue lockup in rcu_sched #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD card boot gets stuck often resulting in board not booting. #15

SD card boot gets stuck often resulting in board not booting. #15

kraj commented May 12, 2021

pdp7 commented May 12, 2021

kraj commented May 12, 2021

pdp7 commented May 20, 2021

kraj commented May 25, 2021

MichaelZhuxx commented May 26, 2021 •

edited

Loading

esmil commented May 26, 2021 •

edited

Loading

kraj commented May 26, 2021

pdp7 commented May 26, 2021

pdp7 commented May 29, 2021

mcd500 commented May 31, 2021

mcd500 commented May 31, 2021 •

edited

Loading

mcd500 commented May 31, 2021

pdp7 commented Jun 3, 2021

pdp7 commented Jun 3, 2021

kraj commented Jun 3, 2021

lbmeng commented Jun 3, 2021

pdp7 commented Jun 3, 2021

kraj commented Jun 3, 2021

lbmeng commented Jun 3, 2021

pdp7 commented Jun 11, 2021

pdp7 commented Jun 11, 2021

SD card boot gets stuck often resulting in board not booting. #15

SD card boot gets stuck often resulting in board not booting. #15

Comments

kraj commented May 12, 2021

pdp7 commented May 12, 2021

kraj commented May 12, 2021

pdp7 commented May 20, 2021

kraj commented May 25, 2021

MichaelZhuxx commented May 26, 2021 • edited Loading

esmil commented May 26, 2021 • edited Loading

kraj commented May 26, 2021

pdp7 commented May 26, 2021

pdp7 commented May 29, 2021

mcd500 commented May 31, 2021

mcd500 commented May 31, 2021 • edited Loading

mcd500 commented May 31, 2021

pdp7 commented Jun 3, 2021

pdp7 commented Jun 3, 2021

kraj commented Jun 3, 2021

lbmeng commented Jun 3, 2021

pdp7 commented Jun 3, 2021

kraj commented Jun 3, 2021

lbmeng commented Jun 3, 2021

pdp7 commented Jun 11, 2021

pdp7 commented Jun 11, 2021

MichaelZhuxx commented May 26, 2021 •

edited

Loading

esmil commented May 26, 2021 •

edited

Loading

mcd500 commented May 31, 2021 •

edited

Loading