Firmware sometimes fails to initialize DSI display after reboot #127

fluffysheap · 2018-01-03T11:04:18Z

Hello,

This is a copy of an issue I originally filed with raspberrypi/firmware, which they closed saying it is your issue. It is possible for the KMS driver to put the DSI display into a state where the firmware does not reset it on reboot, at which point it stops working until power cycled. It seems like a firmware issue to me, but popcornmix does not agree.

Sometimes, on a warm reboot, the firmware doesn't re-initialize the DSI panel. It happens only (as far as I have seen) when the KMS driver is in use - never with the closed driver, and I have never seen it in my (limited) use of the FKMS driver.

When this occurs, the panel does not work until the panel is power-cycled and the pi is rebooted. The backlight is on, sometimes patterns on the display, but there is no colored square at reboot, and no way I have found to recover the display from software. Once put into this state, warm rebooting with the closed-source video driver will not fix it. Power-cycling the pi (but not the panel) does not fix it. It occurs for me on both Pi 2 and Pi 3, not tested on other models.

While this is an intermittent problem it is by no means rare. It happens about half the time for me using official 4.9 kernel. It can't be reproduced with 4.14 because the panel is not enabled in the KMS driver in the official beta 4.14 kernel. However, in a 4.15rc5 kernel that I built myself and enabled the driver in, I can reproduce the problem. It does seem to run in streaks, I can have several good reboots in a row before a fail, but e.g. putting a reboot command in an every-other-minute cron job will reproduce the problem quickly.

Things I have noticed:

Most of the time, rebooting triggers the big rainbow square as soon as the kernel executes the reboot (within 1/4 second or so). Sometimes, however, it goes through the white patterns for a longer time. When this happens, it always fails on the current reboot.
I built my 4.15 kernel with console rotation support, and enabled lcd_rotate=2 in config.txt. Most of the time the console doesn't rotate and I have put no effort into making it work. But once in a while, it comes up rotated (this causes the X display to also be rotated). When this happens, it always fails on the next reboot.
It seems to happen less often when I remove the backlight and ft5406 drivers before rebooting, but this could be my imagination.
It seems to happen less often using a pure upstream kernel rather than the kernel from the raspberrypi tree. This is what led me to try disabling the backlight and ft5406 drivers. However, I think I have seen it a few times even with upstream kernel (upstream kernel doesn't work well for me so I don't test it as much).
When the display fails, I always get the "Unknown Atmel firmware revision: 0x0" error in dmesg, and it will appear in every reboot afterward using the KMS driver.
As a workaround, is there anything that I can do from the ARM side - or even using the panel i2c pins with jumper wires - to "hard reset" the panel? I poked around in vcgencmd and vcmailbox but didn't find anything useful.

The KMS driver is in a good state now. I believe it is ready for, if not exactly prime time, at least wider experimental deployment. This is the most serious problem I have experienced in my testing, and the only one that I haven't found a workaround for. This would make it usable for many people full-time. Thanks!

lategoodbye · 2018-01-03T17:28:29Z

@fluffysheap Could you please provide more information:

Please make an estimation how often this issue occurs (downstream and upstream case)
Please provide a dmesg of an error case
Which kernel config did you use (downstream and upstream case)
Which I2C pins did you use on Raspberry Pi 2?

stschake · 2018-01-03T17:47:04Z

It might also be helpful to dump the dsi1 regs before a reboot where the issue occurs. There is a debugfs file for that (dsi1_regs, usually in /sys/kernel/debug/dri/0/)

When syzkaller team brought us a C repro for the crash [1] that had been reported many times in the past, I finally could find the root cause. If FlowLabel info is merged by fl6_merge_options(), we leave part of the opt_space storage provided by udp/raw/l2tp with random value in opt_space.tot_len, unless a control message was provided at sendmsg() time. Then ip6_setup_cork() would use this random value to perform a kzalloc() call. Undefined behavior and crashes. Fix is to properly set tot_len in fl6_merge_options() At the same time, we can also avoid consuming memory and cpu cycles to clear it, if every option is copied via a kmemdup(). This is the change in ip6_setup_cork(). [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [anholt#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ anholt#127 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cb64a100 task.stack: ffff8801cc350000 RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: 0018:ffff8801cc357550 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010 RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014 RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10 R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0 R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0 FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0 DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x358/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4520a9 RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9 RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016 RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029 Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

This is to fix warning got as: [ 6730.476938] ------------[ cut here ]------------ [ 6730.476979] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'gvt-g_vgpu_workload' (offset 120, size 4)! [ 6730.477021] WARNING: CPU: 2 PID: 441 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0 [ 6730.477042] Modules linked in: tun(E) bridge(E) stp(E) llc(E) kvmgt(E) x86_pkg_temp_thermal(E) vfio_mdev(E) intel_powerclamp(E) mdev(E) coretemp(E) vfio_iommu_type1(E) vfio(E) kvm_intel(E) kvm(E) hid_generic(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) usbhid(E) i915(E) crc32c_intel(E) hid(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) intel_cstate(E) idma64(E) evdev(E) virt_dma(E) iTCO_wdt(E) intel_uncore(E) intel_rapl_perf(E) intel_lpss_pci(E) sg(E) shpchp(E) mei_me(E) pcspkr(E) iTCO_vendor_support(E) intel_lpss(E) intel_pch_thermal(E) prime_numbers(E) mei(E) mfd_core(E) video(E) acpi_pad(E) button(E) binfmt_misc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) e1000e(E) xhci_pci(E) sdhci_pci(E) [ 6730.477244] ptp(E) cqhci(E) xhci_hcd(E) pps_core(E) sdhci(E) mmc_core(E) i2c_i801(E) usbcore(E) thermal(E) fan(E) [ 6730.477276] CPU: 2 PID: 441 Comm: gvt workload 0 Tainted: G E 4.16.0-rc1-gvt-staging-0213+ anholt#127 [ 6730.477303] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0039.2016.0316.1747 03/16/2016 [ 6730.477326] RIP: 0010:usercopy_warn+0x7e/0xa0 [ 6730.477340] RSP: 0018:ffffba6301223d18 EFLAGS: 00010286 [ 6730.477355] RAX: 0000000000000000 RBX: ffff8f41caae9838 RCX: 0000000000000006 [ 6730.477375] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8f41dad166f0 [ 6730.477395] RBP: 0000000000000004 R08: 0000000000000576 R09: 0000000000000000 [ 6730.477415] R10: ffffffffb1293fb2 R11: 00000000ffffffff R12: 0000000000000001 [ 6730.477447] R13: ffff8f41caae983c R14: ffff8f41caae9838 R15: 00007f183ca2b000 [ 6730.477467] FS: 0000000000000000(0000) GS:ffff8f41dad00000(0000) knlGS:0000000000000000 [ 6730.477489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6730.477506] CR2: 0000559462817291 CR3: 000000028b46c006 CR4: 00000000003626e0 [ 6730.477526] Call Trace: [ 6730.477537] __check_object_size+0x9c/0x1a0 [ 6730.477562] __kvm_write_guest_page+0x45/0x90 [kvm] [ 6730.477585] kvm_write_guest+0x46/0x80 [kvm] [ 6730.477599] kvmgt_rw_gpa+0x9b/0xf0 [kvmgt] [ 6730.477642] workload_thread+0xa38/0x1040 [i915] [ 6730.477659] ? do_wait_intr_irq+0xc0/0xc0 [ 6730.477673] ? finish_wait+0x80/0x80 [ 6730.477707] ? clean_workloads+0x120/0x120 [i915] [ 6730.477722] kthread+0x111/0x130 [ 6730.477733] ? _kthread_create_worker_on_cpu+0x60/0x60 [ 6730.477750] ? exit_to_usermode_loop+0x6f/0xb0 [ 6730.477766] ret_from_fork+0x35/0x40 [ 6730.477777] Code: 48 c7 c0 20 e3 25 b1 48 0f 44 c2 41 50 51 41 51 48 89 f9 49 89 f1 4d 89 d8 4c 89 d2 48 89 c6 48 c7 c7 78 e3 25 b1 e8 b2 bc e4 ff <0f> ff 48 83 c4 18 c3 48 c7 c6 09 d0 26 b1 49 89 f1 49 89 f3 eb [ 6730.477849] ---[ end trace cae869c1c323e45a ]--- By whitelist guest page write from workload struct allocated from kmem cache. Reviewed-by: Hang Yuan <hang.yuan@linux.intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> (cherry picked from commit 5627705406874df57fdfad3b4e0c9aedd3b007df)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firmware sometimes fails to initialize DSI display after reboot #127

Firmware sometimes fails to initialize DSI display after reboot #127

fluffysheap commented Jan 3, 2018 •

edited

Loading

lategoodbye commented Jan 3, 2018

stschake commented Jan 3, 2018

Firmware sometimes fails to initialize DSI display after reboot #127

Firmware sometimes fails to initialize DSI display after reboot #127

Comments

fluffysheap commented Jan 3, 2018 • edited Loading

lategoodbye commented Jan 3, 2018

stschake commented Jan 3, 2018

fluffysheap commented Jan 3, 2018 •

edited

Loading