Crash in i2c_bcm2835 in 4.10 due to NULL pointer dereference #89

nullr0ute · 2017-02-15T09:32:31Z

After some time of uptime on the Raspberry Pi 3 we're seeing a crash with 4.10 in the i2c_bcm2835 driver. The RPI2 is stable. It happens even on a minimal install with no GUI running.

The crash I captured on the serial console is:

[18589.291098] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[18600.555282] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[18611.819194] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[18623.083248] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[18624.107255] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[18635.371300] i2c-bcm2835 3f805000.i2c: i2c transfer timed out
[26274.450689] Unable to handle kernel NULL pointer dereference at virtual address 00000002
[26274.461681] pgd = c0204000
[26274.465918] [00000002] *pgd=00000000
[26274.471137] Internal error: Oops: 5 [#1] SMP ARM
[26274.477542] Modules linked in: i2c_dev ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_broute ebtable_nat ip6table_raw ip6table_mangle ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_raw iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables ip_set nfnetlink bridge stp llc libcrc32c crc32_arm_ce smsc95xx usbnet mii vc4 drm_kms_helper drm fb_sys_fops syscopyarea dwc2 sysfillrect sysimgblt udc_core bcm2835_wdt bcm2835_rng bcm2835_dma leds_gpio mmc_block sdhci_iproc sdhci_pltfm sdhci mmc_core pwm_bcm2835 i2c_bcm2835 [last unloaded: ip6_tables]
[26274.562827] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[26274.578954] Hardware name: Generic DT based system
[26274.587990] task: c0dc7a80 task.stack: c0dbe000
[26274.596870] PC is at bcm2835_i2c_isr+0x34/0x168 [i2c_bcm2835]
[26274.607158] LR is at __handle_irq_event_percpu+0xbc/0x208
[26274.617195] pc : []    lr : []    psr: 000f0193
[26274.617195] sp : c0dbfea8  ip : 00000015  fp : ffffe000
[26274.638400] r10: c0dbfeec  r9 : ef084200  r8 : 00000000
[26274.648584] r7 : 00000014  r6 : 00000053  r5 : ef084200  r4 : ef3ee018
[26274.660256] r3 : 00000052  r2 : 00000000  r1 : ef3ee018  r0 : 00000053
[26274.672004] Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[26274.684649] Control: 10c5383d  Table: 2d9d406a  DAC: 00000051
[26274.695916] Process swapper/0 (pid: 0, stack limit = 0xc0dbe220)
[26274.707595] Stack: (0xc0dbfea8 to 0xc0dc0000)
[26274.717695] fea0:                   ee518fc0 ef084200 00000053 c02b0064 ef084200 c0dbfeec
[26274.737688] fec0: ef084200 ef084200 ef084200 c0dbb298 00000014 ef02d880 c0dbe000 00000000
[26274.757801] fee0: 00000000 c02b01d0 ef084200 00000000 ef084200 ef084200 ef084260 c02b0258
[26274.778567] ff00: ef084200 ef084210 c0dbb298 c02b3bc0 c0dc4300 00000000 c0dbb298 c02af330
[26274.799527] ff20: c0dc4300 c055fe7c 00000000 c02af330 00000000 c02af914 c02108d8 600f0013
[26274.820453] ff40: ffffffff c0dbff84 c0dc2ddc c091c9cc 00000001 00000000 00000000 c0224360
[26274.841356] ff60: 00000000 00000000 ffffe000 c0dc2d74 c0dc2ddc 410fd034 00000000 00000000
[26274.862230] ff80: 00000000 c0dbffa0 c02108e8 c02108d8 600f0013 ffffffff 00000051 00000000
[26274.883081] ffa0: 00000000 c0299cc4 000000bb 00000001 c0dc2d40 ffffffff c0d35e30 c0299ff4
[26274.904270] ffc0: c0ed4cc0 c0c9ac38 ffffffff ffffffff 00000000 c0c9a690 00000000 c0d35e30
[26274.925889] ffe0: c0ed5694 c0dc2d5c c0d35e2c c0dc97cc 0020406a 0020807c 00000000 00000000
[26274.947902] [] (bcm2835_i2c_isr [i2c_bcm2835]) from [] (__handle_irq_event_percpu+0xbc/0x208)
[26274.972652] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x20/0x5c)
[26274.997275] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x4c/0x70)
[26275.021345] [] (handle_irq_event) from [] (handle_level_irq+0xf0/0x124)
[26275.045175] [] (handle_level_irq) from [] (generic_handle_irq+0x20/0x30)
[26275.069247] [] (generic_handle_irq) from [] (bcm2836_chained_handle_irq+0x38/0x40)
[26275.094256] [] (bcm2836_chained_handle_irq) from [] (generic_handle_irq+0x20/0x30)
[26275.119292] [] (generic_handle_irq) from [] (__handle_domain_irq+0xa8/0xbc)
[26275.143715] [] (__handle_domain_irq) from [] (__irq_svc+0x6c/0x90)
[26275.159678] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[26275.175051] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[26275.190263] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[26275.205692] [] (cpu_startup_entry) from [] (start_kernel+0x364/0x3dc)
[26275.229119] Code: 1a000011 e3130002 0a00001c e5912290 (e1d220b2)
[26275.250980] ---[ end trace be30787d8deafdda ]---
[26275.263100] Kernel panic - not syncing: Fatal exception in interrupt
[26275.276975] CPU1: stopping
[26275.286950] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[26275.310857] Hardware name: Generic DT based system
[26275.322801] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[26275.337806] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[26275.352203] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[26275.366654] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[26275.380719] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[26275.394992] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[26275.409108] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[26275.423439] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[26275.436869] CPU2: stopping
[26275.445968] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[26275.468107] Hardware name: Generic DT based system
[26275.479153] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[26275.493255] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[26275.506750] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[26275.520298] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[26275.533469] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[26275.546848] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[26275.560125] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[26275.573677] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[26275.586404] CPU3: stopping
[26275.594848] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[26275.615761] Hardware name: Generic DT based system
[26275.626320] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[26275.640021] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[26275.653192] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[26275.666501] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[26275.679472] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[26275.692761] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[26275.705926] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[26275.719374] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[26275.732064] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

nullr0ute · 2017-02-15T12:58:23Z

If I boot with cpuidle.off=1 to see if it's cpu idle coming into play I get a very similar crash after about 30 mins of uptime too:

[ 1819.171351] Unable to handle kernel NULL pointer dereference at virtual address 00000002
[ 1819.179873] pgd = c0204000
[ 1819.182713] [00000002] *pgd=00000000
[ 1819.186475] Internal error: Oops: 5 [#1] SMP ARM
[ 1819.191322] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6tab
le_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32_arm_ce iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip
6table_filter ip6_tables smsc95xx usbnet mii vc4 drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt dwc2 udc_core bcm2835_wdt bcm2835_rng bcm2835_dma leds_gpio i2c_dev mmc_block sdhci_iproc sdhci_p
ltfm sdhci mmc_core pwm_bcm2835 i2c_bcm2835
[ 1819.253338] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[ 1819.261928] Hardware name: Generic DT based system
[ 1819.266958] task: c0dc7a80 task.stack: c0dbe000
[ 1819.271725] PC is at bcm2835_i2c_isr+0x34/0x168 [i2c_bcm2835]
[ 1819.283681] LR is at __handle_irq_event_percpu+0xbc/0x208
[ 1819.295326] pc : []    lr : []    psr: 000f0193
[ 1819.295326] sp : c0dbfe18  ip : 00000015  fp : ffffe000
[ 1819.319345] r10: c0dbfe5c  r9 : ef084200  r8 : 00000000
[ 1819.330647] r7 : 00000014  r6 : 00000053  r5 : ef084200  r4 : ee932818
[ 1819.343199] r3 : 00000052  r2 : 00000000  r1 : ee932818  r0 : 00000053
[ 1819.355637] Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 1819.368883] Control: 10c5383d  Table: 2d0b806a  DAC: 00000051
[ 1819.380618] Process swapper/0 (pid: 0, stack limit = 0xc0dbe220)
[ 1819.392596] Stack: (0xc0dbfe18 to 0xc0dc0000)
[ 1819.402860] fe00:                                                       eeb58900 ef084200
[ 1819.422574] fe20: 00000053 c02b0064 ef084200 c0dbfe5c ef084200 ef084200 ef084200 c0dbb298
[ 1819.442330] fe40: 00000014 ef02d880 c0dbe000 c09178dc c0dbff4c c02b01d0 ef084200 00000000
[ 1819.462737] fe60: ef084200 ef084200 ef084260 c02b0258 ef084200 ef084210 c0dbb298 c02b3bc0
[ 1819.483322] fe80: c0dc4300 00000000 c0dbb298 c02af330 c0dc4300 c055fe7c 00000000 c02af330
[ 1819.503893] fea0: 00000000 c02af914 c027b4f8 600f0013 ffffffff c0dbfef4 ef7010c0 c091c9cc
[ 1819.524449] fec0: ef303a00 00000002 c0dbe000 0000ed88 ef303a00 ee93cc40 c0dbd0c0 c0dc7a80
[ 1819.544987] fee0: ef7010c0 2e944000 c09178dc c0dbff4c 00000000 c0dbff10 c09178dc c027b4f8
[ 1819.565526] ff00: 600f0013 ffffffff 00000051 00000000 00000001 c0275790 ffffffff 00000000
[ 1819.586413] ff20: 00000002 00000002 eeb8a018 c0dc7a80 ef303a00 ee93cc40 00000000 ef7010c0
[ 1819.607730] ff40: ef7010c0 c09261e0 c0dbffa4 c09178dc 000001a7 c0dc2ddc 410fd034 00000000
[ 1819.629411] ff60: 00000000 c02d3e74 c0917aac 00000000 8f0ef8f4 000001a7 ef6fdb00 8b14da52
[ 1819.651639] ff80: 000001a7 ffffe000 00000001 c0dc2d40 ffffffff c0d35e30 410fd034 00000000
[ 1819.674348] ffa0: c0dbffb4 c0917aac 000000bb 00000001 c0dbffbc c0917df0 00000000 c0299ff4
[ 1819.697300] ffc0: c0ed4cc0 c0c9ac38 ffffffff ffffffff 00000000 c0c9a690 00000000 c0d35e30
[ 1819.720615] ffe0: c0ed5694 c0dc2d5c c0d35e2c c0dc97cc 0020406a 0020807c 00000000 00000000
[ 1819.744346] [] (bcm2835_i2c_isr [i2c_bcm2835]) from [] (__handle_irq_event_percpu+0xbc/0x208)
[ 1819.770266] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x20/0x5c)
[ 1819.795564] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x4c/0x70)
[ 1819.820056] [] (handle_irq_event) from [] (handle_level_irq+0xf0/0x124)
[ 1819.844022] [] (handle_level_irq) from [] (generic_handle_irq+0x20/0x30)
[ 1819.868100] [] (generic_handle_irq) from [] (bcm2836_chained_handle_irq+0x38/0x40)
[ 1819.893115] [] (bcm2836_chained_handle_irq) from [] (generic_handle_irq+0x20/0x30)
[ 1819.918149] [] (generic_handle_irq) from [] (__handle_domain_irq+0xa8/0xbc)
[ 1819.942571] [] (__handle_domain_irq) from [] (__irq_svc+0x6c/0x90)
[ 1819.958545] [] (__irq_svc) from [] (finish_task_switch+0x104/0x21c)
[ 1819.974554] [] (finish_task_switch) from [] (__schedule+0x614/0x74c)
[ 1819.998167] [] (__schedule) from [] (schedule+0x98/0xbc)
[ 1820.013104] [] (schedule) from [] (schedule_preempt_disabled+0x14/0x18)
[ 1820.036858] [] (schedule_preempt_disabled) from [] (cpu_startup_entry+0x20/0x24)
[ 1820.061448] [] (cpu_startup_entry) from [] (start_kernel+0x364/0x3dc)
[ 1820.085057] Code: 1a000011 e3130002 0a00001c e5912290 (e1d220b2)
[ 1820.107028] ---[ end trace b95fea60a12456d3 ]---
[ 1820.119235] Kernel panic - not syncing: Fatal exception in interrupt
[ 1820.133194] CPU3: stopping
[ 1820.143265] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[ 1820.167332] Hardware name: Generic DT based system
[ 1820.179347] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[ 1820.194425] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[ 1820.208896] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[ 1820.223416] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[ 1820.237567] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[ 1820.251917] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[ 1820.266102] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[ 1820.280493] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[ 1820.293997] CPU2: stopping
[ 1820.303163] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[ 1820.325433] Hardware name: Generic DT based system
[ 1820.336563] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[ 1820.350811] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[ 1820.364489] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[ 1820.378227] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[ 1820.391621] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[ 1820.405268] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[ 1820.418749] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[ 1820.432444] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[ 1820.445238] CPU1: stopping
[ 1820.453696] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.10.0-0.rc8.git0.1.fc26.armv7hl #1
[ 1820.474640] Hardware name: Generic DT based system
[ 1820.485210] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[ 1820.498929] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[ 1820.512112] [] (dump_stack) from [] (handle_IPI+0x158/0x2a4)
[ 1820.525440] [] (handle_IPI) from [] (__irq_svc+0x6c/0x90)
[ 1820.538431] [] (__irq_svc) from [] (arch_cpu_idle+0x24/0x40)
[ 1820.551731] [] (arch_cpu_idle) from [] (do_idle+0xfc/0x1d8)
[ 1820.564903] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24)
[ 1820.578345] [] (cpu_startup_entry) from [<00209a2c>] (0x209a2c)
[ 1820.591031] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

lategoodbye · 2017-02-15T13:52:10Z

Hi Peter,
recently i posted an untested (i don't have a RPi 3) fix for this on linux-rpi-kernel. Unfortunately the message got blocked because of a suspicious header. Btw this issue is directly related to the "i2c-bcm2835 3f805000.i2c: i2c transfer timed out" messages.

Could you please try this patch?

nullr0ute · 2017-02-15T14:13:34Z

Building a test kernel with it now

nullr0ute · 2017-02-15T15:36:38Z

So the "i2c-bcm2835 3f805000.i2c: i2c transfer timed out" errors seem to be gone but I get this crash on boot (this is a minimal install so no X/Wayland GUI). The system is still usable. The only difference is the patch.

[   52.462794] ------------[ cut here ]------------
[   52.471797] ------------[ cut here ]------------
[   52.471829] WARNING: CPU: 1 PID: 841 at lib/dma-debug.c:612 debug_dma_assert_idle+0x148/0x1e0
[   52.471838] vc4-drm soc:gpu: DMA-API: cpu touching an active dma mapped cacheline [cln=0x00c421c0]
[   52.471842] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32_arm_ce iptable_security ebtable_filter ebtables ip6table_filter ip6_tables smsc95xx usbnet mii vc4 drm_kms_helper drm fb_sys_fops syscopyarea dwc2 sysfillrect sysimgblt udc_core leds_gpio bcm2835_dma bcm2835_rng bcm2835_wdt mmc_block sdhci_iproc sdhci_pltfm sdhci mmc_core pwm_bcm2835 i2c_bcm2835 i2c_dev
[   52.472058] CPU: 1 PID: 841 Comm: systemd-udevd Not tainted 4.10.0-0.rc8.git0.1.rpi1.fc25.armv7hl #1
[   52.472062] Hardware name: Generic DT based system
[   52.472092] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[   52.472109] [] (show_stack) from [] (dump_stack+0x9c/0xd0)
[   52.472129] [] (dump_stack) from [] (__warn+0xe4/0x104)
[   52.472148] [] (__warn) from [] (warn_slowpath_fmt+0x3c/0x4c)
[   52.472167] [] (warn_slowpath_fmt) from [] (debug_dma_assert_idle+0x148/0x1e0)
[   52.472186] [] (debug_dma_assert_idle) from [] (wp_page_copy+0xfc/0x644)
[   52.472201] [] (wp_page_copy) from [] (do_wp_page+0x3a4/0x5ac)
[   52.472214] [] (do_wp_page) from [] (handle_mm_fault+0xd78/0xe3c)
[   52.472232] [] (handle_mm_fault) from [] (do_page_fault+0x240/0x3b4)
[   52.472250] [] (do_page_fault) from [] (do_DataAbort+0x40/0xc0)
[   52.472265] [] (do_DataAbort) from [] (__dabt_usr+0x40/0x60)
[   52.472271] Exception stack(0xec53ffb0 to 0xec53fff8)
[   52.472282] ffa0:                                     8142a9d8 81423580 81423660 00000000
[   52.472296] ffc0: 8142a9d8 bed507f4 bed507f0 b6f4ac88 b6f4ac88 8142c2b8 8142bf50 8142c2b8
[   52.472307] ffe0: 00000000 bed507e0 7f5ab3fc 7f5e05a4 a00f0010 ffffffff
[   52.472401] ---[ end trace e2930fa58afe35f3 ]---
[   52.472405] Mapped at:
[   52.472482]  vc4_bo_create+0x44/0x108 [vc4]
[   52.472546]  vc4_overflow_mem_work+0x20/0x170 [vc4]
[   52.472555]  process_one_work+0x324/0x84c
[   52.472563]  worker_thread+0x294/0x3f4
[   52.472572]  kthread+0x130/0x14c
[   52.896654] WARNING: CPU: 3 PID: 465 at lib/dma-debug.c:612 debug_dma_assert_idle+0x148/0x1e0
[   52.917240] vc4-drm soc:gpu: DMA-API: cpu touching an active dma mapped cacheline [cln=0x00c421c0]
[   52.938505] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32_arm_ce iptable_security ebtable_filter ebtables ip6table_filter ip6_tables smsc95xx usbnet mii vc4 drm_kms_helper drm fb_sys_fops syscopyarea dwc2 sysfillrect sysimgblt udc_core leds_gpio bcm2835_dma bcm2835_rng bcm2835_wdt mmc_block sdhci_iproc sdhci_pltfm sdhci mmc_core pwm_bcm2835 i2c_bcm2835 i2c_dev
[   53.050163] CPU: 3 PID: 465 Comm: systemd-udevd Tainted: G        W       4.10.0-0.rc8.git0.1.rpi1.fc25.armv7hl #1
[   53.073586] Hardware name: Generic DT based system
[   53.084817] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[   53.099196] [] (show_stack) from [] (dump_stack+0x9c/0xd0)
[   53.113038] [] (dump_stack) from [] (__warn+0xe4/0x104)
[   53.126566] [] (__warn) from [] (warn_slowpath_fmt+0x3c/0x4c)
[   53.140765] [] (warn_slowpath_fmt) from [] (debug_dma_assert_idle+0x148/0x1e0)
[   53.162452] [] (debug_dma_assert_idle) from [] (wp_page_copy+0xfc/0x644)
[   53.184062] [] (wp_page_copy) from [] (do_wp_page+0x3a4/0x5ac)
[   53.198517] [] (do_wp_page) from [] (handle_mm_fault+0xd78/0xe3c)
[   53.213312] [] (handle_mm_fault) from [] (do_page_fault+0x240/0x3b4)
[   53.235632] [] (do_page_fault) from [] (do_DataAbort+0x40/0xc0)
[   53.250651] [] (do_DataAbort) from [] (__dabt_usr+0x40/0x60)
[   53.265603] Exception stack(0xc2291fb0 to 0xc2291ff8)
[   53.277933] 1fa0:                                     8142a518 00000009 00000024 00000000
[   53.300414] 1fc0: b6df17a4 00000020 00020338 7f5abb04 00000000 8142a518 813d5cf4 8142c2b8
[   53.322682] 1fe0: bed507bc bed507f0 00000003 b6d0f184 20070010 ffffffff
[   53.336720] ---[ end trace e2930fa58afe35f4 ]---
[   53.348371] Mapped at:
[   53.357587]  vc4_bo_create+0x44/0x108 [vc4]
[   53.368574]  vc4_overflow_mem_work+0x20/0x170 [vc4]
[   53.380095]  process_one_work+0x324/0x84c
[   53.390631]  worker_thread+0x294/0x3f4
[   53.400620]  kthread+0x130/0x14c

nullr0ute · 2017-02-15T15:50:52Z

And after a while of running I've got this one (which I expected given the warning in the patch) but posting it for reference:

[ 2168.372043] ------------[ cut here ]------------
[ 2168.382726] WARNING: CPU: 0 PID: 317 at drivers/i2c/busses/i2c-bcm2835.c:199 bcm2835_i2c_isr+0x80/0x190 [i2c_bcm2835]
[ 2168.405576] Got interrupt, but curr_msg is NULL
[ 2168.416141] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32_arm_ce iptable_security ebtable_filter ebtables ip6table_filter ip6_tables smsc95xx usbnet mii vc4 drm_kms_helper drm fb_sys_fops syscopyarea dwc2 sysfillrect sysimgblt udc_core leds_gpio bcm2835_dma bcm2835_rng bcm2835_wdt mmc_block sdhci_iproc sdhci_pltfm sdhci mmc_core pwm_bcm2835 i2c_bcm2835 i2c_dev
[ 2168.526466] CPU: 0 PID: 317 Comm: kworker/0:2 Tainted: G        W       4.10.0-0.rc8.git0.1.rpi1.fc25.armv7hl #1
[ 2168.549331] Hardware name: Generic DT based system
[ 2168.560469] Workqueue: events output_poll_execute [drm_kms_helper]
[ 2168.572952] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[ 2168.587086] [] (show_stack) from [] (dump_stack+0x9c/0xd0)
[ 2168.600643] [] (dump_stack) from [] (__warn+0xe4/0x104)
[ 2168.613905] [] (__warn) from [] (warn_slowpath_fmt+0x3c/0x4c)
[ 2168.627668] [] (warn_slowpath_fmt) from [] (bcm2835_i2c_isr+0x80/0x190 [i2c_bcm2835])
[ 2168.649538] [] (bcm2835_i2c_isr [i2c_bcm2835]) from [] (__handle_irq_event_percpu+0x118/0x4a8)
[ 2168.672485] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x20/0x5c)
[ 2168.695293] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x40/0x64)
[ 2168.717857] [] (handle_irq_event) from [] (handle_level_irq+0xe4/0x118)
[ 2168.740506] [] (handle_level_irq) from [] (generic_handle_irq+0x20/0x30)
[ 2168.763426] [] (generic_handle_irq) from [] (bcm2836_chained_handle_irq+0x3c/0x44)
[ 2168.787525] [] (bcm2836_chained_handle_irq) from [] (generic_handle_irq+0x20/0x30)
[ 2168.811673] [] (generic_handle_irq) from [] (__handle_domain_irq+0xa8/0xbc)
[ 2168.835207] [] (__handle_domain_irq) from [] (__irq_svc+0x70/0x98)
[ 2168.850739] [] (__irq_svc) from [] (_raw_spin_unlock_irqrestore+0x60/0x64)
[ 2168.874053] [] (_raw_spin_unlock_irqrestore) from [] (__debug_object_init+0x2ac/0x374)
[ 2168.898587] [] (__debug_object_init) from [] (init_timer_on_stack_key+0x24/0x5c)
[ 2168.922919] [] (init_timer_on_stack_key) from [] (schedule_timeout+0x84/0x4e0)
[ 2168.947281] [] (schedule_timeout) from [] (wait_for_common+0x10c/0x1a4)
[ 2168.971395] [] (wait_for_common) from [] (bcm2835_i2c_xfer+0x104/0x190 [i2c_bcm2835])
[ 2168.996971] [] (bcm2835_i2c_xfer [i2c_bcm2835]) from [] (__i2c_transfer+0x3dc/0x9cc)
[ 2169.022448] [] (__i2c_transfer) from [] (i2c_transfer+0x90/0xd8)
[ 2169.038630] [] (i2c_transfer) from [] (drm_do_probe_ddc_edid+0xa8/0xf8 [drm])
[ 2169.063870] [] (drm_do_probe_ddc_edid [drm]) from [] (drm_probe_ddc+0x1c/0x2c [drm])
[ 2169.089550] [] (drm_probe_ddc [drm]) from [] (vc4_hdmi_connector_detect+0x4c/0x74 [vc4])
[ 2169.115441] [] (vc4_hdmi_connector_detect [vc4]) from [] (output_poll_execute+0xfc/0x1a4 [drm_kms_helper])
[ 2169.142904] [] (output_poll_execute [drm_kms_helper]) from [] (process_one_work+0x324/0x84c)
[ 2169.168964] [] (process_one_work) from [] (worker_thread+0x294/0x3f4)
[ 2169.192909] [] (worker_thread) from [] (kthread+0x130/0x14c)
[ 2169.208354] [] (kthread) from [] (ret_from_fork+0x14/0x24)
[ 2169.223530] ---[ end trace e2930fa58afe35f5 ]---

lategoodbye · 2017-02-15T15:57:32Z

This patch doesn't fix the timeout issue. I've heard they aren't reliable to reproduce.

Could you please attach your kernel config?
Is this issue reproducible with a stock kernel?

lategoodbye · 2017-02-15T16:28:14Z

@notro @msperl
What's your opinion about my workaround patch above?

nullr0ute · 2017-02-15T17:00:41Z

@lategoodbye define a "stock kernel". This is the Fedora kernel which is pretty close to upstream vanilla 4.10 linus kernel. We carry a couple of patches for the RPi but nothing around i2c and only a single patch for vc4 (to fix a OOPs which anolt has already sent for upstream).

The kernel config is currently https://pbrobinson.fedorapeople.org/config-4.10.0-0.rc8.git0.1.fc26.armv7hl

This config works fine with <= 4.9 and has only appeared in 4.10. I did pull back some of the i2c patches to 4.9 to try and fix the timeout issue but we saw the same lock ups on the RPi3 so I dropped them as it was the quickest way to move it back to stability.

In terms of reproducible, it occurs fairly regularly on a Fedora 25 minimal (or GUI based) install with the 4.10 kernel. In the case of a minimal install it's basically just sitting there doing nothing. With that it'll basically crash within 2 hours, generally a lot less and often within a few minutes.

With the i2c patches I back ported to 4.9 (https://pbrobinson.fedorapeople.org/bcm283x-fixes-i2c.patch) we had dozens of reports of issues so I would say at least for Fedora it's very reproducible.

msperl · 2017-02-15T17:03:33Z

What is the procedure under which this happens - i.e. what is the driver that is producing this i2c message?
Can the issue be reproduced with an upstream kernel?

What strikes me as strange is that curr_msg is never set to NULL in the code so besides when initializing the I2C driver
Looks to me as if it is some memory corruption - a use after free?
Or the driver is never used and hence bcm2835_i2c_xfer is never called and still the interrupt is triggered.

Does the 4.8 version of the driver work on 4.10 without any issues when you replace the 4.10 version?

nullr0ute · 2017-02-15T17:13:18Z

Well the difference between 4.9 and 4.10 are the following patches. If you revert them I fully expect it to operate fine. When I applied these patches to 4.9 we did see the issue on 4.9:

9446f62 i2c: bcm2835: Add support for dynamic clock
e13e19e i2c: bcm2835: Support i2c-dev ioctl I2C_TIMEOUT
ee05fea i2c: bcm2835: Add support for Repeated Start Condition
8d2cc5c i2c: bcm2835: Can't support I2C_M_IGNORE_NAK
23c9540 i2c: bcm2835: Use dev_dbg logging on transfer errors
d4030d7 i2c: bcm2835: Protect against unexpected TXW/RXR interrupts
e247454 i2c: bcm2835: Fix hang for writing messages larger than 16 bytes

lategoodbye · 2017-02-15T17:16:49Z

@msperl The issue is only reproducible with RPi 3. I've the suspicion that the i2c driver is never used and the interrupt is triggered by the faulty pin configuration in DT in possible combination with the firmware. Please compare this and that.

msperl · 2017-02-15T17:17:41Z

Looking at the traces some more - it seems to be related to hdmi scanning:
[ 2169.115441] [] (vc4_hdmi_connector_detect [vc4]) from [] (output_poll_execute+0xfc/0x1a4 [drm_kms_helper])

So I wonder if HDMI i2c gets "probed" and later "removed" in this sequence somehow?
Add a dev_warn() or similar to log that this happens...

Maybe we need to clean BCM2835_I2C_S and Fifos during initialization.

nullr0ute · 2017-02-15T17:25:41Z

I believe that i2c is used on the Pi3 for EDID detection via an expander and that might explain the issue happening on the RPi3 and not the RPi2. From the commit message of 9d44abb

"On Pi0/1/2, we use an external GPIO line for hotplug detection, since
the HDMI_HOTPLUG register isn't connected to anything. However, with
the Pi3 the HPD GPIO line has moved off to a GPIO expander that will
be tricky to get to (the firmware is constantly polling the expander
using i2c0, so we'll need to coordinate with it)."

So it might be using i2c0 but @anholt might be able to comment further with more details

anholt · 2017-02-15T19:23:38Z

A long time ago Gerd was submitting the pi3 patch series with the fxl6408 gpio expander driver present, which would fight over i2c0 (iirc) with the firmware. I think that patch isn't in your tree, though, given that you've got that series of 10s apart i2c timeout log messages (which we still need to delete) that look like our fallback pi3 monitor detection.

So, a curr_msg = NULL means that we haven't even started a transfer on the i2c that's catching the interrupt, since we set it early in bcm2835_i2c_xfer, and don't ever NULL it back out. Here's my guess for what's going on: The firmware is doing activity on i2c0 talking to fxl6408 (undervoltage detection). Generally, the firmware's irq chain is much shorter than Linux's, so it tends to win the race to process the interrupt for i2c0. Occasionally, though, something else is going on and the firmware loses, at which point Linux gets into its irq handler and looks for the current message on i2c0.

@popcornmix has told me that the firmware will hit i2c0 on all boards at various times: "e.g. following under-voltage, over-temperature, a turbo request from arm, or a hard GPU job (e.g. MVC video decode triggers turbo mode)". Some of those don't apply to upstream Linux today (VCHI not enabled means no video decode, and we don't have turbo requests because the cpufreq driver never landed), but I think this also forgot instances like "the 7" display is attached and the firmware is hitting i2c0 for touchscreen".

I think we need to keep i2c0 status="disabled" in the bcm2835-rpi.dtsi, so that we don't ever try to play on their bus. It would be nice if we could get some confirmation that i2c0 is really the bus that's crashing -- do you think you could get some printk in here for figuring out which one it is?

nullr0ute · 2017-02-15T19:40:42Z

If you've got a patch to add the printk to the places you want I'm happy to do a build and provide any information it outputs.

anholt · 2017-02-15T19:59:47Z

I think this should do it. Without the curr_msg check, it gave me a bunch of nicely formatted DONE messages.

0001-i2c-bcm2835-Debug-test-for-curr_msg.patch.txt

nullr0ute · 2017-02-16T00:24:10Z

@anholt with it running with that patch for 90 mins I've had the following two debugs. I'll leave it running over night (or longer if it's useful). Do they provide confirmation as to whether it's i2c0 or not?

[ 4752.959711] i2c-bcm2835 3f205000.i2c: Processing DONE interrupt with no msg
[ 4855.360126] i2c-bcm2835 3f205000.i2c: Processing DONE interrupt with no msg

msperl · 2017-02-16T06:19:35Z

3f205000.i2c is i2c0.

Maybe we should disable the Irq while i2c is not used to avoid such race issues?
(Enable_irq on Line 170 and disable_irq on line 178 and 297)
Problem is that the irq is shared between all 3 i2c busses, so it may disable irq for all of them

I guess the only safe way is disabling i2c0 on the rpi3 in the dt.

lategoodbye · 2017-02-16T07:19:56Z

I think we should fix the pin configuration for i2c and pwm on RPi 3 before we play with those irqs.

nullr0ute · 2017-02-16T09:50:41Z

Is there a patch proposed for the i2c and pwm issues on the RPi3?

lategoodbye · 2017-02-16T09:57:40Z

I'll prepare a patch. Please give me some time.

notro · 2017-02-16T14:47:21Z

So much for me thinking that I had hardened the isr against problems...
I didn't catch that i2c0 was enabled in mainline, and I was the one who restructured the downstream dts base files for 4.9 and did study those files in depth...

Well well, yes, disabling i2c0 in mainline is a first step.
Maybe something like this at the top of the isr

/* Some comment about the firmware and i2c0 */
if (WARN_ON_ONCE(!i2c_dev->curr_msg))
        return IRQ_NONE;

The downstream 2708 driver handles this situation: https://github.com/raspberrypi/linux/blob/rpi-4.9.y/drivers/i2c/busses/i2c-bcm2708.c#L192

We should probably set i2c_dev->curr_msg = NULL after the transfer as well to increase the chance of getting an error message if someone uses i2c0 together with the firmware.

I think we need a comment somehwere that explains exactly in which situations the firmware uses i2c0.

And I guess this means that pre 4.10 the driver did occasionally acknowledge the i2c0 interrupts which the firmware was waiting for, silently timing out it's transfers.

Thanks for fixing it up @lategoodbye.

lategoodbye · 2017-02-16T18:16:06Z

@notro
Thanks for your suggestion, so my patch wasn't quite correct and in case curr_msg is NULL we should return with IRQ_NONE immediately? But i like Eric's idea of using a dev_warn instead of WARN.

nullr0ute · 2017-02-16T18:29:49Z

I prefer Eric's too, the log warning is much less scary for the average user (which means less people filing bug reports). I've pushed Eric's patch into the Fedora 4.10 kernel for a work around to the hard crash while we work out the best course of action in terms of DT, irq or what ever the final outcome is.

notro · 2017-02-16T18:53:14Z

I think we should bail out at once if we get an unexpected interrupt, since that means the firmware is using the device. And if I'm not mistaken, the interrupt will fire again since we don't acknowledge it and the firmware will probably catch it first the next time.
But that means that we will get an interrupt storm if the firmware isn't responsible for the interrupt, since no one will acknowledge it. I don't know what the firmware does with unexpected interrupts.
So what's most important, not break the firmware's use of i2c0 or to avoid an unlikely (famous last words) interrupt storm?
Maybe your solution is best after all.

dev_warn is ok by me, maybe dev_warn_ratelimited if we don't ack the interrupt.
Or rather dev_warn_ratelimited if we don't touch anything, and dev_err if we ack the interrupt since we most likely steal the firmware's interrupt by doing that.

anholt · 2017-02-16T20:12:22Z

We shouldn't try to harden our isr against spurious IRQs, we should just disable the controller in the DT. We don't get to drive it, and if we ever do anything with it from Linux, only suffering will come of it.

lategoodbye · 2017-02-16T21:32:16Z

Okay, i send a first patch series to fix the NULL pointer dereference and the invalid pin configuration (without disabling i2c0). If the i2c timeouts still occurs i would add an additional patch to disable i2c0. Since the issue would occur with 4.9 DT blobs and a 4.10 kernel, we also need to fix i2c-bcm2835.

When aacraid init fails with "AAC0: adapter self-test failed.", shutdown leads to UBSAN warning and then oops: [154316.118423] ================================================================================ [154316.118508] UBSAN: Undefined behaviour in drivers/scsi/scsi_lib.c:2328:27 [154316.118566] member access within null pointer of type 'struct Scsi_Host' [154316.118631] CPU: 2 PID: 14530 Comm: reboot Tainted: G W 4.15.0-dirty anholt#89 [154316.118701] Hardware name: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003 [154316.118774] Call Trace: [154316.118848] dump_stack+0x48/0x65 [154316.118916] ubsan_epilogue+0xe/0x40 [154316.118976] __ubsan_handle_type_mismatch+0xfb/0x180 [154316.119043] scsi_block_requests+0x20/0x30 [154316.119135] aac_shutdown+0x18/0x40 [aacraid] [154316.119196] pci_device_shutdown+0x33/0x50 [154316.119269] device_shutdown+0x18a/0x390 [...] [154316.123435] BUG: unable to handle kernel NULL pointer dereference at 000000f4 [154316.123515] IP: scsi_block_requests+0xa/0x30 This is because aac_shutdown() does struct Scsi_Host *shost = pci_get_drvdata(dev); scsi_block_requests(shost); and that assumes shost has been assigned with pci_set_drvdata(). However, pci_set_drvdata(pdev, shost) is done in aac_probe_one() far after bailing out with error from calling the init function ((*aac_drivers[index].init)(aac)), and when the init function fails, no error is returned from aac_probe_one() so PCI layer assumes there is driver attached, and tries to shut it down later. Fix it by returning error from aac_probe_one() when card-specific init function fails. This fixes reboot on my HP NetRAID-4M with dead battery. Signed-off-by: Meelis Roos <mroos@linux.ee> Reviewed-by: Dave Carroll <david.carroll@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

syzkaller was able to trigger the following panic for AF_XDP: BUG: KASAN: null-ptr-deref in atomic64_sub include/asm-generic/atomic-instrumented.h:144 [inline] BUG: KASAN: null-ptr-deref in atomic_long_sub include/asm-generic/atomic-long.h:199 [inline] BUG: KASAN: null-ptr-deref in xdp_umem_unaccount_pages.isra.4+0x3d/0x80 net/xdp/xdp_umem.c:135 Write of size 8 at addr 0000000000000060 by task syz-executor246/4527 CPU: 1 PID: 4527 Comm: syz-executor246 Not tainted 4.17.0+ #89 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 kasan_report_error mm/kasan/report.c:352 [inline] kasan_report.cold.7+0x6d/0x2fe mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278 atomic64_sub include/asm-generic/atomic-instrumented.h:144 [inline] atomic_long_sub include/asm-generic/atomic-long.h:199 [inline] xdp_umem_unaccount_pages.isra.4+0x3d/0x80 net/xdp/xdp_umem.c:135 xdp_umem_reg net/xdp/xdp_umem.c:334 [inline] xdp_umem_create+0xd6c/0x10f0 net/xdp/xdp_umem.c:349 xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531 __sys_setsockopt+0x1bd/0x390 net/socket.c:1935 __do_sys_setsockopt net/socket.c:1946 [inline] __se_sys_setsockopt net/socket.c:1943 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe In xdp_umem_reg() the call to xdp_umem_account_pages() passed with CAP_IPC_LOCK where we didn't need to end up charging rlimit on memlock for the current user and therefore umem->user continues to be NULL. Later on through fault injection syzkaller triggered a failure in either umem->pgs or umem->pages allocation such that we bail out and undo accounting in xdp_umem_unaccount_pages() where we eventually hit the panic since it tries to deref the umem->user. The code is pretty close to mm_account_pinned_pages() and mm_unaccount_pinned_pages() pair and potentially could reuse it even in a later cleanup, and it appears that the initial commit c0c77d8 ("xsk: add user memory registration support sockopt") got this right while later follow-up introduced the bug via a49049e ("xsk: simplified umem setup"). Fixes: a49049e ("xsk: simplified umem setup") Reported-by: syzbot+979217770b09ebf5c407@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

…from a pageblock syzbot reported the following error from a tree with a head commit of baf76f0 ("slip: make slhc_free() silently accept an error pointer") BUG: unable to handle kernel paging request at ffffea0003348000 #PF error: [normal kernel read fault] PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline] RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline] RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579 Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49 c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246 RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000 RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001 RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000 R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000 FS: 00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0 Call Trace: fast_isolate_around mm/compaction.c:1243 [inline] fast_isolate_freepages mm/compaction.c:1418 [inline] isolate_freepages mm/compaction.c:1438 [inline] compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550 There is no reproducer and it is difficult to hit -- 1 crash every few days. The issue is very similar to the fix in commit 6b0868c ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints"). When isolating free pages around a target pageblock, the boundary handling is off by one and can stray into the next pageblock. Triggering the syzbot error requires that the end of pageblock is section or zone aligned, and that the next section is unpopulated. A more subtle consequence of the bug is that pageblocks were being improperly used as migration targets which potentially hurts fragmentation avoidance in the long-term one page at a time. A debugging patch revealed that it's definitely possible to stray outside of a pageblock which is not intended. While syzbot cannot be used to verify this patch, it was confirmed that the debugging warning no longer triggers with this patch applied. It has also been confirmed that the THP allocation stress tests are not degraded by this patch. Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net Fixes: e332f74 ("mm, compaction: be selective about what pageblocks to clear skip hints") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Qian Cai <cai@lca.pw> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lategoodbye mentioned this issue Nov 15, 2017

DT: Disable i2c0 lategoodbye/rpi-zero#22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash in i2c_bcm2835 in 4.10 due to NULL pointer dereference #89

Crash in i2c_bcm2835 in 4.10 due to NULL pointer dereference #89

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

msperl commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017 •

edited

Loading

msperl commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

anholt commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

anholt commented Feb 15, 2017

nullr0ute commented Feb 16, 2017

msperl commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

nullr0ute commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

notro commented Feb 16, 2017

lategoodbye commented Feb 16, 2017 •

edited

Loading

nullr0ute commented Feb 16, 2017

notro commented Feb 16, 2017

anholt commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

Crash in i2c_bcm2835 in 4.10 due to NULL pointer dereference #89

Crash in i2c_bcm2835 in 4.10 due to NULL pointer dereference #89

Comments

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

lategoodbye commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

msperl commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

lategoodbye commented Feb 15, 2017 • edited Loading

msperl commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

anholt commented Feb 15, 2017

nullr0ute commented Feb 15, 2017

anholt commented Feb 15, 2017

nullr0ute commented Feb 16, 2017

msperl commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

nullr0ute commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

notro commented Feb 16, 2017

lategoodbye commented Feb 16, 2017 • edited Loading

nullr0ute commented Feb 16, 2017

notro commented Feb 16, 2017

anholt commented Feb 16, 2017

lategoodbye commented Feb 16, 2017

lategoodbye commented Feb 15, 2017 •

edited

Loading

lategoodbye commented Feb 16, 2017 •

edited

Loading