Skip to content

USB storage device causing occasional block/genhd.c panic #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zerxy opened this issue Feb 5, 2013 · 36 comments
Closed

USB storage device causing occasional block/genhd.c panic #213

zerxy opened this issue Feb 5, 2013 · 36 comments
Assignees

Comments

@zerxy
Copy link

zerxy commented Feb 5, 2013

Error occurs during boot on my 256 MB Raspberry Pi. dmesg output includes:

WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164()

Operating system is fully up-to-date using rpi-update. I'm using two USB storage devices which are connected to my Pi via a powered hub. Using LVM2 to join them together. I have patched LVM2 as described here: https://lists.fedorahosted.org/pipermail/lvm2-commits/2012-November/000391.html

EDIT: Previously thought issue was influenced by gpu_mem setting in config.txt. This does not appear to be the case.

@popcornmix
Copy link
Collaborator

I've just tried gpu_mem=8, 16 on a freshly imaged, and rpi-updated image and all booted happily.
Do you have any other options on config.txt?
Can you put gpu_mem back to 64 and check it still boots.

@zerxy
Copy link
Author

zerxy commented Feb 5, 2013

Couldn't reproduce this error with gpu_mem=8 or gpu_mem=16 but succeeded with gpu_mem=32. Full dmesg output:

[    0.000000] Booting Linux on physical CPU 0
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.6.11+ (dc4@dc4-arm-01) (gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1+bzr2458 - Linaro GCC 2012.08) ) #368 PREEMPT Sun Feb 3 18:35:57 GMT 2013
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: BCM2708
[    0.000000] cma: CMA: reserved 16 MiB at 0d000000
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] On node 0 totalpages: 57344
[    0.000000] free_area_init_node: node 0, pgdat c053b834, node_mem_map c05e5000
[    0.000000]   Normal zone: 448 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 56896 pages, LIFO batch:15
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 56896
[    0.000000] Kernel command line: dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1920 bcm2708_fb.fbheight=1200 bcm2708.boardrev=0x1000002 bcm2708.serial=0x47d1d2b8 smsc95xx.macaddr=B8:27:EB:D1:D2:B8 sdhci-bcm2708.emmc_clock_freq=100000000 vc_mem.mem_base=0xec00000 vc_mem.mem_size=0x10000000  smsc95xx.turbo_mode=N sdhci-bcm2708.enable_llm=0 dwc_otg.lpm_enable=0 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=cfq rootwait ro
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 224MB = 224MB total
[    0.000000] Memory: 204880k/204880k available, 24496k reserved, 0K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xce800000 - 0xff000000   ( 776 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xce000000   ( 224 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0xc0008000 - 0xc04e5460   (4982 kB)
[    0.000000]       .init : 0xc04e6000 - 0xc0506f24   ( 132 kB)
[    0.000000]       .data : 0xc0508000 - 0xc053c060   ( 209 kB)
[    0.000000]        .bss : 0xc053c084 - 0xc05e4738   ( 674 kB)
[    0.000000] NR_IRQS:330
[    0.000000] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 4294967ms
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [tty1] enabled
[    0.001042] Calibrating delay loop... 464.48 BogoMIPS (lpj=2322432)
[    0.060061] pid_max: default: 32768 minimum: 301
[    0.060406] Mount-cache hash table entries: 512
[    0.061164] Initializing cgroup subsys cpuacct
[    0.061220] Initializing cgroup subsys devices
[    0.061253] Initializing cgroup subsys freezer
[    0.061284] Initializing cgroup subsys blkio
[    0.061386] CPU: Testing write buffer coherency: ok
[    0.061719] hw perfevents: enabled with v6 PMU driver, 3 counters available
[    0.061871] Setting up static identity map for 0x39d198 - 0x39d1f4
[    0.063375] devtmpfs: initialized
[    0.074433] NET: Registered protocol family 16
[    0.080913] DMA: preallocated 4096 KiB pool for atomic coherent allocations
[    0.082139] bcm2708.uart_clock = 0
[    0.083478] hw-breakpoint: found 6 breakpoint and 1 watchpoint registers.
[    0.083533] hw-breakpoint: maximum watchpoint size is 4 bytes.
[    0.083571] mailbox: Broadcom VideoCore Mailbox driver
[    0.083665] bcm2708_vcio: mailbox at f200b880
[    0.083766] bcm_power: Broadcom power driver
[    0.083805] bcm_power_open() -> 0
[    0.083830] bcm_power_request(0, 8)
[    0.584520] bcm_mailbox_read -> 00000080, 0
[    0.584560] bcm_power_request -> 0
[    0.584587] Serial: AMBA PL011 UART driver
[    0.584728] dev:f1: ttyAMA0 at MMIO 0x20201000 (irq = 83) is a PL011 rev3
[    0.917727] console [ttyAMA0] enabled
[    0.941350] bio: create slab <bio-0> at 0
[    0.946222] SCSI subsystem initialized
[    0.950319] usbcore: registered new interface driver usbfs
[    0.955900] usbcore: registered new interface driver hub
[    0.961498] usbcore: registered new device driver usb
[    0.967980] Switching to clocksource stc
[    0.972229] FS-Cache: Loaded
[    0.975366] CacheFiles: Loaded
[    0.990178] NET: Registered protocol family 2
[    0.995461] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    1.002809] TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
[    1.009393] TCP: Hash tables configured (established 8192 bind 8192)
[    1.015823] TCP: reno registered
[    1.019076] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    1.024973] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    1.031520] NET: Registered protocol family 1
[    1.036439] RPC: Registered named UNIX socket transport module.
[    1.042486] RPC: Registered udp transport module.
[    1.047209] RPC: Registered tcp transport module.
[    1.051921] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.059195] bcm2708_dma: DMA manager at f2007000
[    1.063980] bcm2708_gpio: bcm2708_gpio_probe c0515d98
[    1.069445] vc-mem: phys_addr:0x00000000 mem_base=0x0ec00000 mem_size:0x10000000(256 MiB)
[    1.078601] audit: initializing netlink socket (disabled)
[    1.084202] type=2000 audit(0.930:1): initialized
[    1.207033] VFS: Disk quotas dquot_6.5.2
[    1.211068] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    1.218124] FS-Cache: Netfs 'nfs' registered for caching
[    1.223886] NFS: Registering the id_resolver key type
[    1.229079] Key type id_resolver registered
[    1.233377] Key type id_legacy registered
[    1.237746] msgmni has been set to 432
[    1.243302] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    1.251016] io scheduler noop registered
[    1.255065] io scheduler deadline registered
[    1.259393] io scheduler cfq registered (default)
[    1.307508] Console: switching to colour frame buffer device 240x75
[    1.339978] kgdb: Registered I/O driver kgdboc.
[    1.345363] vc-cma: Videocore CMA driver
[    1.349408] vc-cma: vc_cma_base      = 0x00000000
[    1.354276] vc-cma: vc_cma_size      = 0x00000000 (0 MiB)
[    1.359806] vc-cma: vc_cma_initial   = 0x00000000 (0 MiB)
[    1.374691] brd: module loaded
[    1.383060] loop: module loaded
[    1.386679] vchiq: vchiq_init_state: slot_zero = 0xcd000000, is_master = 0
[    1.394573] Loading iSCSI transport class v2.0-870.
[    1.400488] usbcore: registered new interface driver smsc95xx
[    1.406664] dwc_otg: version 3.00a 10-AUG-2012 (platform bus)
[    1.617789] Core Release: 2.80a
[    1.621050] Setting default values for core params
[    1.626072] Finished setting default values for core params
[    1.836896] Using Buffer DMA mode
[    1.840321] Periodic Transfer Interrupt Enhancement - disabled
[    1.846317] Multiprocessor Interrupt Enhancement - disabled
[    1.852025] OTG VER PARAM: 0, OTG VER FLAG: 0
[    1.856519] Dedicated Tx FIFOs mode
[    1.861016] dwc_otg: Microframe scheduler enabled
[    1.861397] dwc_otg bcm2708_usb: DWC OTG Controller
[    1.866609] dwc_otg bcm2708_usb: new USB bus registered, assigned bus number 1
[    1.874134] dwc_otg bcm2708_usb: irq 32, io mem 0x00000000
[    1.879818] Init: Port Power? op_state=1
[    1.883876] Init: Power Port (0)
[    1.887307] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    1.894304] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.901696] usb usb1: Product: DWC OTG Controller
[    1.906547] usb usb1: Manufacturer: Linux 3.6.11+ dwc_otg_hcd
[    1.912467] usb usb1: SerialNumber: bcm2708_usb
[    1.917911] hub 1-0:1.0: USB hub found
[    1.921813] hub 1-0:1.0: 1 port detected
[    1.926247] dwc_otg: FIQ enabled
[    1.926267] dwc_otg: NAK holdoff enabled
[    1.926288] Module dwc_common_port init
[    1.926523] Initializing USB Mass Storage driver...
[    1.931745] usbcore: registered new interface driver usb-storage
[    1.938004] USB Mass Storage support registered.
[    1.942942] usbcore: registered new interface driver libusual
[    1.949134] mousedev: PS/2 mouse device common for all mice
[    1.955653] bcm2835-cpufreq: min=700000 max=700000 cur=700000
[    1.961475] bcm2835-cpufreq: switching to governor powersavebcm2835-cpufreq: switching to governor powersave
[    1.971637] cpuidle: using governor ladder
[    1.976068] cpuidle: using governor menu
[    1.987041] sdhci: Secure Digital Host Controller Interface driver
[    2.000150] sdhci: Copyright(c) Pierre Ossman
[    2.011477] sdhci: Disable low-latency mode
[    2.062349] mmc0: SDHCI controller on BCM2708_Arasan [platform] using platform's DMA
[    2.077354] mmc0: BCM2708 SDHC host at 0x20300000 DMA 2 IRQ 77
[    2.092375] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.110759] usbcore: registered new interface driver usbhid
[    2.123484] Indeed it is in host mode hprt0 = 00021501
[    2.140474] usbhid: USB HID core driver
[    2.172621] TCP: cubic registered
[    2.192214] Initializing XFRM netlink socket
[    2.222144] NET: Registered protocol family 17
[    2.244245] mmc0: could read SD Status register (SSR) at the 3th attempt
[    2.262341] Key type dns_resolver registered
[    2.282715] VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
[    2.297678] mmc0: new SDHC card at address e624
[    2.312166] mmcblk0: mmc0:e624 SD08G 7.40 GiB
[    2.324520] registered taskstats version 1
[    2.340012]  mmcblk0: p1 p2
[    2.370625] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[    2.386042] VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
[    2.403926] devtmpfs: mounted
[    2.414547] Freeing init memory: 128K
[    2.425367] usb 1-1: new high-speed USB device number 2 using dwc_otg
[    2.439770] Indeed it is in host mode hprt0 = 00001101
[    2.652483] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
[    2.666962] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.682154] hub 1-1:1.0: USB hub found
[    2.693276] hub 1-1:1.0: 3 ports detected
[    2.982349] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
[    3.112719] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    3.130066] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.149153] smsc95xx v1.0.4
[    3.216941] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:d1:d2:b8
[    3.332406] usb 1-1.2: new low-speed USB device number 4 using dwc_otg
[    3.444395] usb 1-1.2: New USB device found, idVendor=060b, idProduct=0595
[    3.472900] usb 1-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    3.506237] usb 1-1.2: Product: USB Keyboard
[    3.538207] input: USB Keyboard as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2:1.0/input/input0
[    3.565853] hid-generic 0003:060B:0595.0001: input,hidraw0: USB HID v1.10 Keyboard [USB Keyboard] on usb-bcm2708_usb-1.2/input0
[    3.605168] input: USB Keyboard as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2:1.1/input/input1
[    3.623455] hid-generic 0003:060B:0595.0002: input,hiddev0,hidraw1: USB HID v1.10 Device [USB Keyboard] on usb-bcm2708_usb-1.2/input1
[    3.722440] usb 1-1.3: new high-speed USB device number 5 using dwc_otg
[    3.842686] usb 1-1.3: New USB device found, idVendor=0409, idProduct=005a
[    3.857871] usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.876193] hub 1-1.3:1.0: USB hub found
[    3.888768] hub 1-1.3:1.0: 4 ports detected
[    4.182353] usb 1-1.3.1: new high-speed USB device number 6 using dwc_otg
[    4.320184] usb 1-1.3.1: New USB device found, idVendor=0bda, idProduct=0119
[    4.342704] usb 1-1.3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    4.360799] usb 1-1.3.1: Product: USB2.0-CRW
[    4.373474] usb 1-1.3.1: Manufacturer: Generic
[    4.386053] usb 1-1.3.1: SerialNumber: 20090815198100000
[    4.402410] scsi0 : usb-storage 1-1.3.1:1.0
[    4.512401] usb 1-1.3.2: new high-speed USB device number 7 using dwc_otg
[    4.644524] usb 1-1.3.2: New USB device found, idVendor=13fe, idProduct=3800
[    4.662027] usb 1-1.3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    4.679292] usb 1-1.3.2: Product: Patriot Memory
[    4.692607] usb 1-1.3.2: Manufacturer:
[    4.710230] usb 1-1.3.2: SerialNumber: 0701257A10B25A31
[    4.727330] udevd[145]: starting version 175
[    4.760087] scsi1 : usb-storage 1-1.3.2:1.0
[    5.414676] scsi 0:0:0:0: Direct-Access     Generic- SD/MMC           1.00 PQ: 0 ANSI: 0 CCS
[    5.783746] scsi 1:0:0:0: Direct-Access              Patriot Memory   PMAP PQ: 0 ANSI: 0 CCS
[    5.824779] sd 1:0:0:0: [sdb] 62570496 512-byte logical blocks: (32.0 GB/29.8 GiB)
[    5.863038] sd 1:0:0:0: [sdb] Write Protect is off
[    5.892258] sd 1:0:0:0: [sdb] Mode Sense: 23 00 00 00
[    5.893042] sd 1:0:0:0: [sdb] No Caching mode page present
[    5.922255] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[    5.956667] sd 1:0:0:0: [sdb] No Caching mode page present
[    5.988197] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[    6.015686]  sdb: sdb1
[    6.042427] sd 1:0:0:0: [sdb] No Caching mode page present
[    6.071557] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[    6.102240] sd 1:0:0:0: [sdb] Attached SCSI removable disk
[    6.154594] sd 0:0:0:0: [sda] 62333952 512-byte logical blocks: (31.9 GB/29.7 GiB)
[    6.191948] sd 0:0:0:0: [sda] Write Protect is off
[    6.222209] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
[    6.223235] sd 0:0:0:0: [sda] No Caching mode page present
[    6.250779] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    6.286949] sd 0:0:0:0: [sda] No Caching mode page present
[    6.313645] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    6.348988]  sda: sda1
[    6.376076] sd 0:0:0:0: [sda] No Caching mode page present
[    6.407014] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    6.432186] sd 0:0:0:0: [sda] Attached SCSI removable disk
[    6.507982] Registered led device: led0
[    8.906027] device-mapper: ioctl: 4.23.0-ioctl (2012-07-25) initialised: dm-devel@redhat.com
[    9.146250] ------------[ cut here ]------------
[    9.159509] WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164()
[    9.174813] Modules linked in: dm_mod evdev leds_gpio led_class
[    9.189316] [<c0013a7c>] (unwind_backtrace+0x0/0xf0) from [<c001e2c8>] (warn_slowpath_common+0x4c/0x64)
[    9.207304] [<c001e2c8>] (warn_slowpath_common+0x4c/0x64) from [<c001e2fc>] (warn_slowpath_null+0x1c/0x24)
[    9.225602] [<c001e2fc>] (warn_slowpath_null+0x1c/0x24) from [<c01de260>] (disk_clear_events+0x148/0x164)
[    9.243824] [<c01de260>] (disk_clear_events+0x148/0x164) from [<c00f130c>] (check_disk_change+0x1c/0x54)
[    9.262046] [<c00f130c>] (check_disk_change+0x1c/0x54) from [<c0261ec0>] (sd_open+0x68/0x138)
[    9.279321] [<c0261ec0>] (sd_open+0x68/0x138) from [<c00f2118>] (__blkdev_get+0x2b0/0x3f0)
[    9.296388] [<c00f2118>] (__blkdev_get+0x2b0/0x3f0) from [<c00f23e0>] (blkdev_get+0x188/0x31c)
[    9.313828] [<c00f23e0>] (blkdev_get+0x188/0x31c) from [<c00bfc80>] (do_dentry_open.isra.16+0x1d4/0x254)
[    9.332160] [<c00bfc80>] (do_dentry_open.isra.16+0x1d4/0x254) from [<c00bfdd0>] (finish_open+0x20/0x38)
[    9.350440] [<c00bfdd0>] (finish_open+0x20/0x38) from [<c00cf07c>] (do_last.isra.35+0x4e8/0xba8)
[    9.368136] [<c00cf07c>] (do_last.isra.35+0x4e8/0xba8) from [<c00cf7e4>] (path_openat+0xa8/0x480)
[    9.385935] [<c00cf7e4>] (path_openat+0xa8/0x480) from [<c00cfe6c>] (do_filp_open+0x2c/0x80)
[    9.403254] [<c00cfe6c>] (do_filp_open+0x2c/0x80) from [<c00c09ec>] (do_sys_open+0xe8/0x184)
[    9.420462] [<c00c09ec>] (do_sys_open+0xe8/0x184) from [<c000da60>] (ret_fast_syscall+0x0/0x30)
[    9.437888] ---[ end trace 46c5c93ac717e025 ]---
[   12.214742] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   12.610455] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   13.279127] bcm2708 watchdog, heartbeat=10 sec (nowayout=0)
[   19.268393] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[   23.674924] smsc95xx 1-1.1:1.0: eth0: link up, 100Mbps, full-duplex, lpa 0x49E1
[   26.558188] Adding 102396k swap on /var/swap.  Priority:-1 extents:1 across:102396k SS
[   30.563963] watchdog stopped

@zerxy
Copy link
Author

zerxy commented Feb 5, 2013

config.txt contents:

# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1

# uncomment this if your display has a black border of unused pixels visible
# and your display can output without overscan
#disable_overscan=1

# uncomment the following to adjust overscan. Use positive numbers if console
# goes off screen, and negative if there is too much border
overscan_left=0
overscan_right=0
overscan_top=0
overscan_bottom=0

# uncomment to force a console size. By default it will be display's size minus
# overscan.
#framebuffer_width=1920
#framebuffer_height=1200
#framebuffer_depth=32

# uncomment if hdmi display is not detected and composite is being output
#hdmi_force_hotplug=1

# uncomment to force a specific HDMI mode (this will force VGA)
#hdmi_group=1
#hdmi_mode=1

# uncomment to force a HDMI mode rather than DVI. This can make audio work in
# DMT (computer monitor) modes
#hdmi_drive=2

# uncomment to increase signal to HDMI, if you have interference, blanking, or
# no display
#config_hdmi_boost=4

# uncomment for composite PAL
#sdtv_mode=2

#uncomment to overclock the arm. 700 MHz is the default.
#arm_freq=980

# for more options see http://elinux.org/RPi_config.txt
#core_freq=250
#h264_freq=250
#isp_freq=250
#v3d_freq=250
#sdram_freq=480
#over_voltage=8
#over_voltage_sdram=7
force_turbo=1
initial_turbo=60
#init_emmc_clock=100000000
current_limit_override=0x5A000020
gpu_mem=32

@zerxy
Copy link
Author

zerxy commented Feb 5, 2013

Please note that the WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164() error isn't produced at every boot, just now and then.

@popcornmix
Copy link
Collaborator

I'm not convinced it is related to gpu_mem=32.
Do you have a USB disk attached? Any different without it?
Can you try multiple boots with gpu_mem=32 and gpu_mem=64 and report how many succeed.

@zerxy
Copy link
Author

zerxy commented Feb 5, 2013

I'm also not convinced it is related to gpu_mem setting. Two 32 GB USB storage devices attached. Using LVM2 to join them into a single volume.

Just booted five times with gpu_mem=32 and only once did the error occur.

@popcornmix
Copy link
Collaborator

Might be worth checking voltage between TP1 and TP2 or using a powered hub to connect the USB disks.

@zerxy
Copy link
Author

zerxy commented Feb 6, 2013

I am using a powered hub to connect the USB storage.

@zerxy
Copy link
Author

zerxy commented Feb 8, 2013

Updated to Feb 7 kernel 3.6.11+ #371 via rpi-update. Nothing has changed. Error still occurs occasionally, only ever during boot.

@zerxy
Copy link
Author

zerxy commented Feb 8, 2013

Please note that this error has only recently appeared for me. Only thing changed recently hardware-wise is the addition of another USB storage device which is connected along with previous USB storage device to my Pi via a powered hub. Using LVM2 to join them together. I have patched LVM2 as described here: https://lists.fedorahosted.org/pipermail/lvm2-commits/2012-November/000391.html

@popcornmix
Copy link
Collaborator

I don't think this is related to gpu_mem. Can you close this and open a USB issue about the occasional block/genhd.c panic?

@zerxy
Copy link
Author

zerxy commented Feb 8, 2013

Renamed and edited this issue. Hopefully this is now acceptable to you.

@ghost ghost assigned ghollingworth Feb 9, 2013
@zerxy
Copy link
Author

zerxy commented Feb 12, 2013

Updated to Feb 12 kernel 3.6.11+ #375 via rpi-update. Nothing has changed. Error still occurs occasionally, only ever during boot.

@zerxy
Copy link
Author

zerxy commented Feb 16, 2013

Error still present in Feb 16 kernel 3.6.11+ #377.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

Error still present in Mar 1 kernel 3.6.11+ #385.

@ghollingworth
Copy link

What is the WARN in disk_clear_events? What does it say?

On Saturday, 2 March 2013, zerxy wrote:

Error still present in Mar 1 kernel 3.6.11+ #385.


Reply to this email directly or view it on GitHubhttps://github.com//issues/213#issuecomment-14323893
.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

dmesg output:

[    9.146250] ------------[ cut here ]------------
[    9.159509] WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164()
[    9.174813] Modules linked in: dm_mod evdev leds_gpio led_class
[    9.189316] [<c0013a7c>] (unwind_backtrace+0x0/0xf0) from [<c001e2c8>] (warn_slowpath_common+0x4c/0x64)
[    9.207304] [<c001e2c8>] (warn_slowpath_common+0x4c/0x64) from [<c001e2fc>] (warn_slowpath_null+0x1c/0x24)
[    9.225602] [<c001e2fc>] (warn_slowpath_null+0x1c/0x24) from [<c01de260>] (disk_clear_events+0x148/0x164)
[    9.243824] [<c01de260>] (disk_clear_events+0x148/0x164) from [<c00f130c>] (check_disk_change+0x1c/0x54)
[    9.262046] [<c00f130c>] (check_disk_change+0x1c/0x54) from [<c0261ec0>] (sd_open+0x68/0x138)
[    9.279321] [<c0261ec0>] (sd_open+0x68/0x138) from [<c00f2118>] (__blkdev_get+0x2b0/0x3f0)
[    9.296388] [<c00f2118>] (__blkdev_get+0x2b0/0x3f0) from [<c00f23e0>] (blkdev_get+0x188/0x31c)
[    9.313828] [<c00f23e0>] (blkdev_get+0x188/0x31c) from [<c00bfc80>] (do_dentry_open.isra.16+0x1d4/0x254)
[    9.332160] [<c00bfc80>] (do_dentry_open.isra.16+0x1d4/0x254) from [<c00bfdd0>] (finish_open+0x20/0x38)
[    9.350440] [<c00bfdd0>] (finish_open+0x20/0x38) from [<c00cf07c>] (do_last.isra.35+0x4e8/0xba8)
[    9.368136] [<c00cf07c>] (do_last.isra.35+0x4e8/0xba8) from [<c00cf7e4>] (path_openat+0xa8/0x480)
[    9.385935] [<c00cf7e4>] (path_openat+0xa8/0x480) from [<c00cfe6c>] (do_filp_open+0x2c/0x80)
[    9.403254] [<c00cfe6c>] (do_filp_open+0x2c/0x80) from [<c00c09ec>] (do_sys_open+0xe8/0x184)
[    9.420462] [<c00c09ec>] (do_sys_open+0xe8/0x184) from [<c000da60>] (ret_fast_syscall+0x0/0x30)
[    9.437888] ---[ end trace 46c5c93ac717e025 ]---

@ghollingworth
Copy link

Yes that's already in the bug, I was asking what the code is in genhd.c
that caused the warn to be triggered

On Saturday, 2 March 2013, zerxy wrote:

dmesg output:

[ 9.146250] ------------[ cut here ]------------
[ 9.159509] WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164()
[ 9.174813] Modules linked in: dm_mod evdev leds_gpio led_class
[ 9.189316] from
[ 9.207304] from
[ 9.225602] from
[ 9.243824] from
[ 9.262046] from
[ 9.279321] from
[ 9.296388] from
[ 9.313828] from
[ 9.332160] from
[ 9.350440] from
[ 9.368136] from
[ 9.385935] from
[ 9.403254] from
[ 9.420462] from
[ 9.437888] ---[ end trace 46c5c93ac717e025 ]---


Reply to this email directly or view it on GitHubhttps://github.com//issues/213#issuecomment-14324610
.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

I don't understand why you need my assistance in determining the problem with genhd.c

@ghollingworth
Copy link

Because I'm only one person and have a thousand issues to deal with, it may
help you solve the problem.

Otherwise I'll probably get around to it in a year or so

Gordon

Director of software, Raspberry Pi

On Saturday, 2 March 2013, zerxy wrote:

I don't understand why you need my assistance in determining the problem
with genhd.c


Reply to this email directly or view it on GitHubhttps://github.com//issues/213#issuecomment-14324773
.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

Don't take your frustration out on me. If your employers actually knew what's good for them you would already have plenty of assistance from paid co-workers.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

What exactly do you want me to do to help you? I will try my best to help.

@licaon-kter
Copy link

@zerxy: Take it down a notch, would you? Remember that the "Raspberry Pi Foundation" in a charity first, they are all volunteers ( or most of them anyway ). Yes, they might be Broadcom employed but that's the day job, this is the after hours part. ;)

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

The charity thing is used as an excuse. Lots of Raspberry Pi's sold = lots of money made = no excuse for poor support.

@ghollingworth
Copy link

If more than about ten people have the same problem

On Saturday, 2 March 2013, zerxy wrote:

The charity thing is used as an excuse. Lots of Raspberry Pi's sold = lots
of money made = no excuse for poor support.


Reply to this email directly or view it on GitHubhttps://github.com//issues/213#issuecomment-14325121
.

@ghollingworth
Copy link

Sorry, bad iPhone!

If more than about ten people have the same problem then it's worth
investing some of my time into it otherwise I will try to enable you to
help yourself. (Which I was trying to do)

We do spend significant money on software, but there are a lot of minor
bugs that may be finger trouble, Linux drivers problems or just hardware
problems. The point of communicating through this is to try to help you
diagnose the problem and put it into the right box first

Gordon

On Saturday, 2 March 2013, Gordon Hollingworth wrote:

If more than about ten people have the same problem

On Saturday, 2 March 2013, zerxy wrote:

The charity thing is used as an excuse. Lots of Raspberry Pi's sold =
lots of money made = no excuse for poor support.


Reply to this email directly or view it on GitHubhttps://github.com//issues/213#issuecomment-14325121
.

@zerxy
Copy link
Author

zerxy commented Mar 2, 2013

Looking around it appears that the problem I have, which has only been cosmetic for me so far, seems to not be Raspberry Pi specific anyway.

@zerxy
Copy link
Author

zerxy commented Mar 4, 2013

Lines 1580 to 1585 from block/genhd.c

 /* then, fetch and clear pending events */
        spin_lock_irq(&ev->lock);
        WARN_ON_ONCE(ev->clearing & mask);      /* cleared by workfn */
        pending = ev->pending & mask;
        ev->pending &= ~mask;
        spin_unlock_irq(&ev->lock);

@zerxy
Copy link
Author

zerxy commented Mar 4, 2013

@popcornmix
Copy link
Collaborator

It does look like that is likely the fix. Unfortunately the patch doesn't apply cleanly.
Might be worth testing the 3.8 kernel which has this in.
#225

@zerxy
Copy link
Author

zerxy commented Mar 5, 2013

Tried moving WARN_ON_ONCE(ev->clearing & mask); /* cleared by workfn */ line to the bottom, like so:

 /* then, fetch and clear pending events */
        spin_lock_irq(&ev->lock);
        pending = ev->pending & mask;
        ev->pending &= ~mask;
        spin_unlock_irq(&ev->lock);
        WARN_ON_ONCE(ev->clearing & mask);      /* cleared by workfn */

genhd.c compiles without error and the kernel produced has booted without the WARNING: at block/genhd.c:1582 disk_clear_events+0x148/0x164() error or any other error related to genhd.c in the few test boots I've performed.

@zerxy
Copy link
Author

zerxy commented Mar 28, 2013

Tests have shown that the above change actually does not eliminate this error but instead makes it much less likely to occur. It is now quite rare.

@psergiu
Copy link

psergiu commented Apr 7, 2013

Same issue - errors appear only when:

  • two USB storage devices are present at boot (USB Flash + USB HDD).
  • last field of /etc/fstab for filesystems on those devices is not 0 (thus forcing fsck at boot time)

Changing last field of /etc/fstab back to 0 (preventing fsck to be run) causes the error to no longer appear.

Using the Raspibian with "stable" 3.6.11+ rev 371

@ghollingworth
Copy link

Can you test again with the latest build?

@P33M
Copy link
Contributor

P33M commented May 5, 2014

Two 3TB hard disks with /etc/fstab fsck column set nonzero works for me.

@P33M P33M closed this as completed May 5, 2014
@zerxy
Copy link
Author

zerxy commented May 5, 2014

This stopped being an issue for me with kernel 3.8 and later.

popcornmix pushed a commit that referenced this issue Jun 7, 2014
[ Upstream commit 4f4178f ]

Fixes this warning introduced by commit 5b8f15f
("net: cdc_mbim: handle IPv6 Neigbor Solicitations"):

===============================
[ INFO: suspicious RCU usage. ]
3.15.0-rc3 #213 Tainted: G        W  O
-------------------------------
net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
no locks held by ksoftirqd/0/3.

stack backtrace:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W  O  3.15.0-rc3 #213
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006
 ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081
 0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50
Call Trace:
 [<ffffffff813a5ee6>] dump_stack+0x4e/0x68
 [<ffffffff81076b94>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffff813978a6>] __vlan_find_dev_deep+0x54/0x94
 [<ffffffffa04a1938>] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim]
 [<ffffffff813ab76f>] ? _raw_spin_unlock_irqrestore+0x3a/0x49
 [<ffffffff81079671>] ? trace_hardirqs_on_caller+0x192/0x1a1
 [<ffffffffa059bd10>] usbnet_bh+0x59/0x287 [usbnet]
 [<ffffffff8104067d>] tasklet_action+0xbb/0xcd
 [<ffffffff81040057>] __do_softirq+0x14c/0x30d
 [<ffffffff81040237>] run_ksoftirqd+0x1f/0x50
 [<ffffffff8105f13e>] smpboot_thread_fn+0x172/0x18e
 [<ffffffff8105efcc>] ? SyS_setgroups+0xdf/0xdf
 [<ffffffff810594b0>] kthread+0xb5/0xbd
 [<ffffffff813a84b1>] ? __wait_for_common+0x13b/0x170
 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c
 [<ffffffff813b147c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c

Fixes: 5b8f15f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jun 8, 2014
Fixes this warning introduced by commit 5b8f15f
("net: cdc_mbim: handle IPv6 Neigbor Solicitations"):

===============================
[ INFO: suspicious RCU usage. ]
3.15.0-rc3 #213 Tainted: G        W  O
-------------------------------
net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
no locks held by ksoftirqd/0/3.

stack backtrace:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W  O  3.15.0-rc3 #213
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006
 ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081
 0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50
Call Trace:
 [<ffffffff813a5ee6>] dump_stack+0x4e/0x68
 [<ffffffff81076b94>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffff813978a6>] __vlan_find_dev_deep+0x54/0x94
 [<ffffffffa04a1938>] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim]
 [<ffffffff813ab76f>] ? _raw_spin_unlock_irqrestore+0x3a/0x49
 [<ffffffff81079671>] ? trace_hardirqs_on_caller+0x192/0x1a1
 [<ffffffffa059bd10>] usbnet_bh+0x59/0x287 [usbnet]
 [<ffffffff8104067d>] tasklet_action+0xbb/0xcd
 [<ffffffff81040057>] __do_softirq+0x14c/0x30d
 [<ffffffff81040237>] run_ksoftirqd+0x1f/0x50
 [<ffffffff8105f13e>] smpboot_thread_fn+0x172/0x18e
 [<ffffffff8105efcc>] ? SyS_setgroups+0xdf/0xdf
 [<ffffffff810594b0>] kthread+0xb5/0xbd
 [<ffffffff813a84b1>] ? __wait_for_common+0x13b/0x170
 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c
 [<ffffffff813b147c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c

Fixes: 5b8f15f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Mar 12, 2017
commit ecdd095 upstream.

Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [raspberrypi#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ raspberrypi#213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Mar 12, 2017
commit ecdd095 upstream.

Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Mar 13, 2017
commit ecdd095 upstream.

Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Apr 10, 2017
Zero sized buffer objects tend to make various bits of the GEM
infrastructure complain:

 WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0
 Modules linked in:

 CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G        W 4.9.0-rc4-00906-g693af44 #213
 Hardware name: Qualcomm Technologies, Inc. DB820c (DT)
 task: ffff8000d7353400 task.stack: ffff8000d7720000
 PC is at drm_mm_insert_node_generic+0x258/0x2f0
 LR is at drm_vma_offset_add+0x4c/0x70

Zero sized buffers serve no appreciable value to the user so disallow
them at create time.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
dlech pushed a commit to ev3dev/rpi-kernel that referenced this issue Apr 14, 2017
commit ecdd095 upstream.

Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ raspberrypi#213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anholt pushed a commit to anholt/linux that referenced this issue May 4, 2017
Zero sized buffer objects tend to make various bits of the GEM
infrastructure complain:

 WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0
 Modules linked in:

 CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G        W 4.9.0-rc4-00906-g693af44 raspberrypi#213
 Hardware name: Qualcomm Technologies, Inc. DB820c (DT)
 task: ffff8000d7353400 task.stack: ffff8000d7720000
 PC is at drm_mm_insert_node_generic+0x258/0x2f0
 LR is at drm_vma_offset_add+0x4c/0x70

Zero sized buffers serve no appreciable value to the user so disallow
them at create time.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
popcornmix pushed a commit that referenced this issue Sep 1, 2017
rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call
it preallocates, but does add it to the rxrpc net namespace call list.
This, however, causes rxrpc_put_call() to oops when the call is discarded
when the socket is closed.  rxrpc_put_call() needs the socket to be able to
reach the namespace so that it can use a lock held therein.

Fix this by setting a call's socket pointer immediately before discarding
it.

This can be triggered by unloading the kafs module, resulting in an oops
like the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: rxrpc_put_call+0x1e2/0x32d
PGD 0
P4D 0
Oops: 0000 [#1] SMP
Modules linked in: kafs(E-)
CPU: 3 PID: 3037 Comm: rmmod Tainted: G            E   4.12.0-fscache+ #213
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8803fc92e2c0 task.stack: ffff8803fef74000
RIP: 0010:rxrpc_put_call+0x1e2/0x32d
RSP: 0018:ffff8803fef77e08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8803fab99ac0 RCX: 000000000000000f
RDX: ffffffff81c50a40 RSI: 000000000000000c RDI: ffff8803fc92ea88
RBP: ffff8803fef77e30 R08: ffff8803fc87b941 R09: ffffffff82946d20
R10: ffff8803fef77d10 R11: 00000000000076fc R12: 0000000000000005
R13: ffff8803fab99c20 R14: 0000000000000001 R15: ffffffff816c6aee
FS:  00007f915a059700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000003fef39000 CR4: 00000000001406e0
Call Trace:
 rxrpc_discard_prealloc+0x325/0x341
 rxrpc_listen+0xf9/0x146
 kernel_listen+0xb/0xd
 afs_close_socket+0x3e/0x173 [kafs]
 afs_exit+0x1f/0x57 [kafs]
 SyS_delete_module+0x10f/0x19a
 do_syscall_64+0x8a/0x149
 entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 2baec2c ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nathanchance pushed a commit to nathanchance/pi-kernel that referenced this issue Apr 19, 2018
commit 082f230 upstream.

Local random address needs to be updated before creating connection if
RPA from LE Direct Advertising Report was resolved in host. Otherwise
remote device might ignore connection request due to address mismatch.

This was affecting following qualification test cases:
GAP/CONN/SCEP/BV-03-C, GAP/CONN/GCEP/BV-05-C, GAP/CONN/DCEP/BV-05-C

Before patch:
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6          #11350 [hci0] 84680.231216
        Address: 56:BC:E8:24:11:68 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
> HCI Event: Command Complete (0x0e) plen 4                        #11351 [hci0] 84680.246022
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7         #11352 [hci0] 84680.246417
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                        #11353 [hci0] 84680.248854
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11354 [hci0] 84680.249466
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                        #11355 [hci0] 84680.253222
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                          #11356 [hci0] 84680.458387
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:D6:76:8C:DF:82 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
        RSSI: -74 dBm (0xb6)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11357 [hci0] 84680.458737
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                        #11358 [hci0] 84680.469982
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25          #11359 [hci0] 84680.470444
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                          #11360 [hci0] 84680.474971
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0    #11361 [hci0] 84682.545385
> HCI Event: Command Complete (0x0e) plen 4                        #11362 [hci0] 84682.551014
      LE Create Connection Cancel (0x08|0x000e) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                          #11363 [hci0] 84682.551074
      LE Connection Complete (0x01)
        Status: Unknown Connection Identifier (0x02)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Public (0x00)
        Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
        Connection interval: 0.00 msec (0x0000)
        Connection latency: 0 (0x0000)
        Supervision timeout: 0 msec (0x0000)
        Master clock accuracy: 0x00

After patch:
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7    raspberrypi#210 [hci0] 667.152459
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                   raspberrypi#211 [hci0] 667.153613
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2        raspberrypi#212 [hci0] 667.153704
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                   raspberrypi#213 [hci0] 667.154584
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                     raspberrypi#214 [hci0] 667.182619
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
        RSSI: -70 dBm (0xba)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2       raspberrypi#215 [hci0] 667.182704
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                  raspberrypi#216 [hci0] 667.183599
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6    raspberrypi#217 [hci0] 667.183645
        Address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
> HCI Event: Command Complete (0x0e) plen 4                  raspberrypi#218 [hci0] 667.184590
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25    raspberrypi#219 [hci0] 667.184613
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                    raspberrypi#220 [hci0] 667.186558
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                    raspberrypi#221 [hci0] 667.485824
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13          {0x0002} [hci0] 667.485996
        LE Address: 11:22:33:44:55:66 (OUI 11-22-33)
        Flags: 0x00000000
        Data length: 0

Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Apr 26, 2018
commit 082f230 upstream.

Local random address needs to be updated before creating connection if
RPA from LE Direct Advertising Report was resolved in host. Otherwise
remote device might ignore connection request due to address mismatch.

This was affecting following qualification test cases:
GAP/CONN/SCEP/BV-03-C, GAP/CONN/GCEP/BV-05-C, GAP/CONN/DCEP/BV-05-C

Before patch:
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6          #11350 [hci0] 84680.231216
        Address: 56:BC:E8:24:11:68 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
> HCI Event: Command Complete (0x0e) plen 4                        #11351 [hci0] 84680.246022
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7         #11352 [hci0] 84680.246417
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                        #11353 [hci0] 84680.248854
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11354 [hci0] 84680.249466
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                        #11355 [hci0] 84680.253222
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                          #11356 [hci0] 84680.458387
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:D6:76:8C:DF:82 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
        RSSI: -74 dBm (0xb6)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11357 [hci0] 84680.458737
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                        #11358 [hci0] 84680.469982
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25          #11359 [hci0] 84680.470444
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                          #11360 [hci0] 84680.474971
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0    #11361 [hci0] 84682.545385
> HCI Event: Command Complete (0x0e) plen 4                        #11362 [hci0] 84682.551014
      LE Create Connection Cancel (0x08|0x000e) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                          #11363 [hci0] 84682.551074
      LE Connection Complete (0x01)
        Status: Unknown Connection Identifier (0x02)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Public (0x00)
        Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
        Connection interval: 0.00 msec (0x0000)
        Connection latency: 0 (0x0000)
        Supervision timeout: 0 msec (0x0000)
        Master clock accuracy: 0x00

After patch:
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7    #210 [hci0] 667.152459
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                   #211 [hci0] 667.153613
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2        #212 [hci0] 667.153704
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                   #213 [hci0] 667.154584
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                     #214 [hci0] 667.182619
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
        RSSI: -70 dBm (0xba)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2       #215 [hci0] 667.182704
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                  #216 [hci0] 667.183599
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6    #217 [hci0] 667.183645
        Address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
> HCI Event: Command Complete (0x0e) plen 4                  #218 [hci0] 667.184590
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25    #219 [hci0] 667.184613
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                    #220 [hci0] 667.186558
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                    #221 [hci0] 667.485824
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13          {0x0002} [hci0] 667.485996
        LE Address: 11:22:33:44:55:66 (OUI 11-22-33)
        Flags: 0x00000000
        Data length: 0

Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue May 5, 2018
commit 082f230 upstream.

Local random address needs to be updated before creating connection if
RPA from LE Direct Advertising Report was resolved in host. Otherwise
remote device might ignore connection request due to address mismatch.

This was affecting following qualification test cases:
GAP/CONN/SCEP/BV-03-C, GAP/CONN/GCEP/BV-05-C, GAP/CONN/DCEP/BV-05-C

Before patch:
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6          #11350 [hci0] 84680.231216
        Address: 56:BC:E8:24:11:68 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
> HCI Event: Command Complete (0x0e) plen 4                        #11351 [hci0] 84680.246022
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7         #11352 [hci0] 84680.246417
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                        #11353 [hci0] 84680.248854
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11354 [hci0] 84680.249466
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                        #11355 [hci0] 84680.253222
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                          #11356 [hci0] 84680.458387
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:D6:76:8C:DF:82 (Resolvable)
          Identity type: Random (0x01)
          Identity: F2:F1:06:3D:9C:42 (Static)
        RSSI: -74 dBm (0xb6)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2             #11357 [hci0] 84680.458737
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                        #11358 [hci0] 84680.469982
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25          #11359 [hci0] 84680.470444
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 53:38:DA:46:8C:45 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                          #11360 [hci0] 84680.474971
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0    #11361 [hci0] 84682.545385
> HCI Event: Command Complete (0x0e) plen 4                        #11362 [hci0] 84682.551014
      LE Create Connection Cancel (0x08|0x000e) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                          #11363 [hci0] 84682.551074
      LE Connection Complete (0x01)
        Status: Unknown Connection Identifier (0x02)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Public (0x00)
        Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
        Connection interval: 0.00 msec (0x0000)
        Connection latency: 0 (0x0000)
        Supervision timeout: 0 msec (0x0000)
        Master clock accuracy: 0x00

After patch:
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7    #210 [hci0] 667.152459
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 30.000 msec (0x0030)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4                   #211 [hci0] 667.153613
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2        #212 [hci0] 667.153704
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                   #213 [hci0] 667.154584
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18                     #214 [hci0] 667.182619
      LE Direct Advertising Report (0x0b)
        Num reports: 1
        Event type: Connectable directed - ADV_DIRECT_IND (0x01)
        Address type: Random (0x01)
        Address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Direct address type: Random (0x01)
        Direct address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
        RSSI: -70 dBm (0xba)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2       #215 [hci0] 667.182704
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                  #216 [hci0] 667.183599
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6    #217 [hci0] 667.183645
        Address: 7C:C1:57:A5:B7:A8 (Resolvable)
          Identity type: Random (0x01)
          Identity: F4:28:73:5D:38:B0 (Static)
> HCI Event: Command Complete (0x0e) plen 4                  #218 [hci0] 667.184590
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25    #219 [hci0] 667.184613
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                    #220 [hci0] 667.186558
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19                    #221 [hci0] 667.485824
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: 50:52:D9:A6:48:A0 (Resolvable)
          Identity type: Public (0x00)
          Identity: 11:22:33:44:55:66 (OUI 11-22-33)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13          {0x0002} [hci0] 667.485996
        LE Address: 11:22:33:44:55:66 (OUI 11-22-33)
        Flags: 0x00000000
        Data length: 0

Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Nov 21, 2022
We got a syzkaller problem because of aarch64 alignment fault
if KFENCE enabled. When the size from user bpf program is an odd
number, like 399, 407, etc, it will cause the struct skb_shared_info's
unaligned access. As seen below:

  BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

  Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
   bpf_prog_d3839dd9068ceb51+0x80/0x330
   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

  kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

  allocated by task 15074 on cpu 0 at 1342.585390s:
   kmalloc include/linux/slab.h:568 [inline]
   kzalloc include/linux/slab.h:675 [inline]
   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we adjust @SiZe so that (@SiZe + @hearoom) is a
multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
is aligned to a cache line.

Fixes: 1cf1cae ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Nov 26, 2022
[ Upstream commit d3fd203 ]

We got a syzkaller problem because of aarch64 alignment fault
if KFENCE enabled. When the size from user bpf program is an odd
number, like 399, 407, etc, it will cause the struct skb_shared_info's
unaligned access. As seen below:

  BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

  Use-after-free read at 0xffff6254fffac077 (in kfence-raspberrypi#213):
   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
   bpf_prog_d3839dd9068ceb51+0x80/0x330
   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

  kfence-raspberrypi#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

  allocated by task 15074 on cpu 0 at 1342.585390s:
   kmalloc include/linux/slab.h:568 [inline]
   kzalloc include/linux/slab.h:675 [inline]
   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we adjust @SiZe so that (@SiZe + @hearoom) is a
multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
is aligned to a cache line.

Fixes: 1cf1cae ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 29, 2022
[ Upstream commit d3fd203 ]

We got a syzkaller problem because of aarch64 alignment fault
if KFENCE enabled. When the size from user bpf program is an odd
number, like 399, 407, etc, it will cause the struct skb_shared_info's
unaligned access. As seen below:

  BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

  Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
   bpf_prog_d3839dd9068ceb51+0x80/0x330
   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

  kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

  allocated by task 15074 on cpu 0 at 1342.585390s:
   kmalloc include/linux/slab.h:568 [inline]
   kzalloc include/linux/slab.h:675 [inline]
   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we adjust @SiZe so that (@SiZe + @hearoom) is a
multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
is aligned to a cache line.

Fixes: 1cf1cae ("bpf: introduce BPF_PROG_TEST_RUN command")
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 21, 2023
[ Upstream commit 3531b27 ]

Fix a missed "goto out" to unlock on error to cleanup this splat:

    WARNING: lock held when returning to user space!
    6.6.0-rc3-lizhijian+ #213 Not tainted
    ------------------------------------------------
    cxl/673 is leaving the kernel with locks still held!
    1 lock held by cxl/673:
     #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core]

In terms of user visible impact of this bug for backports:

cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a
problematic instruction for virtualized environments. So, on virtualized
x86, cxl_region_invalidate_memregion() returns an error. This failure
case got missed because CXL memory-expander device passthrough is not a
production use case, and emulation of CXL devices is typically limited
to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y,
that makes cxl_region_invalidate_memregion() succeed.

In other words, the expected exposure of this bug is limited to CXL
subsystem development environments using QEMU that neglected
CONFIG_CXL_REGION_INVALIDATION_TEST=y.

Fixes: d1257d0 ("cxl/region: Move cache invalidation before region teardown, and before setup")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 21, 2023
[ Upstream commit 3531b27 ]

Fix a missed "goto out" to unlock on error to cleanup this splat:

    WARNING: lock held when returning to user space!
    6.6.0-rc3-lizhijian+ #213 Not tainted
    ------------------------------------------------
    cxl/673 is leaving the kernel with locks still held!
    1 lock held by cxl/673:
     #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core]

In terms of user visible impact of this bug for backports:

cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a
problematic instruction for virtualized environments. So, on virtualized
x86, cxl_region_invalidate_memregion() returns an error. This failure
case got missed because CXL memory-expander device passthrough is not a
production use case, and emulation of CXL devices is typically limited
to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y,
that makes cxl_region_invalidate_memregion() succeed.

In other words, the expected exposure of this bug is limited to CXL
subsystem development environments using QEMU that neglected
CONFIG_CXL_REGION_INVALIDATION_TEST=y.

Fixes: d1257d0 ("cxl/region: Move cache invalidation before region teardown, and before setup")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Apr 23, 2024
When the mirred action is used on a classful egress qdisc and a packet is
mirrored or redirected to self we hit a qdisc lock deadlock.
See trace below.

[..... other info removed for brevity....]
[   82.890906]
[   82.890906] ============================================
[   82.890906] WARNING: possible recursive locking detected
[   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
[   82.890906] --------------------------------------------
[   82.890906] ping/418 is trying to acquire lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] but task is already holding lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] other info that might help us debug this:
[   82.890906]  Possible unsafe locking scenario:
[   82.890906]
[   82.890906]        CPU0
[   82.890906]        ----
[   82.890906]   lock(&sch->q.lock);
[   82.890906]   lock(&sch->q.lock);
[   82.890906]
[   82.890906]  *** DEADLOCK ***
[   82.890906]
[..... other info removed for brevity....]

Example setup (eth0->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

Another example(eth0->eth1->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth1

tc qdisc add dev eth1 root handle 1: htb default 30
tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

We fix this by adding an owner field (CPU id) to struct Qdisc set after
root qdisc is entered. When the softirq enters it a second time, if the
qdisc owner is the same CPU, the packet is dropped to break the loop.

Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
Fixes: 3bcb846 ("net: get rid of spin_trylock() in net_tx_action()")
Fixes: e578d9c ("net: sched: use counter to break reclassify loops")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240415210728.36949-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
popcornmix pushed a commit that referenced this issue Apr 29, 2024
[ Upstream commit 0f022d3 ]

When the mirred action is used on a classful egress qdisc and a packet is
mirrored or redirected to self we hit a qdisc lock deadlock.
See trace below.

[..... other info removed for brevity....]
[   82.890906]
[   82.890906] ============================================
[   82.890906] WARNING: possible recursive locking detected
[   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
[   82.890906] --------------------------------------------
[   82.890906] ping/418 is trying to acquire lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] but task is already holding lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] other info that might help us debug this:
[   82.890906]  Possible unsafe locking scenario:
[   82.890906]
[   82.890906]        CPU0
[   82.890906]        ----
[   82.890906]   lock(&sch->q.lock);
[   82.890906]   lock(&sch->q.lock);
[   82.890906]
[   82.890906]  *** DEADLOCK ***
[   82.890906]
[..... other info removed for brevity....]

Example setup (eth0->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

Another example(eth0->eth1->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth1

tc qdisc add dev eth1 root handle 1: htb default 30
tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

We fix this by adding an owner field (CPU id) to struct Qdisc set after
root qdisc is entered. When the softirq enters it a second time, if the
qdisc owner is the same CPU, the packet is dropped to break the loop.

Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
Fixes: 3bcb846 ("net: get rid of spin_trylock() in net_tx_action()")
Fixes: e578d9c ("net: sched: use counter to break reclassify loops")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240415210728.36949-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants