Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display turns off after sporadic 'flip_done timeout error' #128

Closed
nkichukov opened this issue Feb 11, 2018 · 8 comments
Closed

display turns off after sporadic 'flip_done timeout error' #128

nkichukov opened this issue Feb 11, 2018 · 8 comments

Comments

@nkichukov
Copy link

Hi folks,

The reported issue is sporadic and not reproducible on demand. As per the log below, it has happened 3 times in 21 days. All 3 times, the display went blank and got powered off until the Xorg session was restarted.

Details follow below:
kernel: 4.15.0-rc7-v7+ + patch *1.) which fixed memory leak issue addressed at #123
compiler: gcc-6.4
raspberry pi running in 32bit mode
CMA=256MB
OS: Gentoo Linux

Error printed in the kernel log:
[1042617.315980] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [CRTC:53:crtc-2] flip_done timed out
[1042842.595966] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [PLANE:52:plane-20] flip_done timed out
[1042959.331986] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [CRTC:53:crtc-2] flip_done timed out
[1043354.595985] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [PLANE:52:plane-20] flip_done timed out
[1761553.894440] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [CRTC:53:crtc-2] flip_done timed out
[1762164.198463] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [PLANE:52:plane-20] flip_done timed out

Workaround:
terminate X11 session and restart it.

Thank you,
-Nikolay

*1.) https://lists.freedesktop.org/archives/dri-devel/2018-January/161625.html

@lategoodbye
Copy link

anholt pushed a commit that referenced this issue Mar 1, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ #128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@nkichukov
Copy link
Author

Thanks for the pointers Stefan,
I have attempted rpi-4.16.0-rc7 kernel which does include the clk-patches from https://patchwork.kernel.org/patch/10207161/raw/ and drm/atomic fixes from https://patchwork.kernel.org/patch/10169179/raw/ however, I am still getting the same sporadic results:

[ 5433.053889] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:68:crtc-2] flip_done timed out [ 5551.325851] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:67:plane-20] flip_done timed out
Any other pointers/patches?
-Nik

@nkichukov
Copy link
Author

Reproduced with drm.debug=0x3f. A lot of repeated messages, not sure if this will help. Leaving the kern.log.gz file available for download(10MB), uncompressed 142MB.

http://www.oldum.net/~nikolay/kern.log.gz

First occurrence:
Apr 3 23:34:23 grpi kernel: [ 906.205803] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [CRTC:68:crtc-2] flip_done timed out
Second:
Apr 3 23:38:18 grpi kernel: [ 1141.213846] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [PLANE:67:plane-20] flip_done timed out

Cheers,
-N

@nkichukov
Copy link
Author

Upgraded to rpi-4.16.1 which seems to have received multiple vc4 patches and I am unable to reproduce the flip_done timeout for the last 3 days.

@nkichukov
Copy link
Author

day 6, issue reoccurred:

[516937.439594] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:68:crtc-2] flip_done timed out
[517547.743601] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:67:plane-20] flip_done timed out

@nkichukov
Copy link
Author

This continues to occur, so sporadic as it was...
Changes attempted:
Running 4.16.10 kernel now, noticed loads of vc4 commits made it there, to no avail for this bug unfortunately.
Running latest version of libdrm, from https://anongit.freedesktop.org/git/mesa/drm.git, no change.
Version of mesa: mesa-18.1.0, tried with experimental from git, same results.
Xorg server: xorg-server-1.19.5-r2
compiler upgraded to: gcc-7.3.0-r3 and all of the above recompiled and with -O2, same results.

[74697.182124] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:68:crtc-2] flip_done timed out [74814.686141] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:67:plane-20] flip_done timed out [124903.134330] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:68:crtc-2] flip_done timed out [125513.438329] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:67:plane-20] flip_done timed out

Any pointers? Where to go next?
Cheers,
-N

@nkichukov
Copy link
Author

This is no longer happening, so I am closing the case. It seems like it was not due to the driver/kernel side, but userland, as the few changes I've made about 20 days ago were to upgrade xorg-server and xorg-drivers to version 1.20.0 and also disable some of the enabled but unused --configure switches for mesa when compiling it. This got the system back to a stable state and the flip_done timeout is no longer present, at least not in the last 20 days...

Cheers,
-N

@lategoodbye
Copy link

lategoodbye commented Jan 25, 2019

This "flip_done timed out" still occurs e.g. next-20190125.

Unfortunately i don't know the used display on the kernelci machine (Raspberry Pi 3 B, arm64/defconfig):
https://kernelci.org/boot/id/5c4ab5b359b514f4f39ad006/

07:07:28.667279  [   26.147366] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:139:crtc-2] flip_done timed out
07:07:28.668117  [   26.147369] Bluetooth: hci0: Failed to set baudrate
07:07:28.702757  [   26.147527] Console: switching to colour frame buffer device 90x30
07:07:28.705750  [   26.259404] Bluetooth: hci0: BCM: chip id 94
07:07:28.706766  [   26.261556] Bluetooth: hci0: BCM: features 0x2e
07:07:28.707637  [   26.284543] Bluetooth: hci0: BCM43430A1
07:07:28.708475  [   26.284556] Bluetooth: hci0: BCM43430A1 (001.002.009) build 0000
07:07:28.709290  [   26.284658] bluetooth hci0: Direct firmware load for brcm/BCM43430A1.hcd failed with error -2
07:07:28.710129  [   26.284668] Bluetooth: hci0: BCM: Patch brcm/BCM43430A1.hcd not found
07:07:28.749202  [   36.387352] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:139:crtc-2] flip_done timed out
07:07:28.751899  [   45.569912] NET: Registered protocol family 10
07:07:28.752647  [   45.571503] Segment Routing with IPv6
07:07:28.753346  [   46.627353] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:46:Composite-1] flip_done timed out
07:07:28.754030  [   56.867345] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:135:plane-20] flip_done timed out
07:07:28.756490  [   57.030633] vc4-drm soc:gpu: fb0: DRM emulated frame buffer device

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants