Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vc4: refcount API complaint due to misuse in MADV #129

Closed
nullr0ute opened this issue Feb 12, 2018 · 9 comments
Closed

vc4: refcount API complaint due to misuse in MADV #129

nullr0ute opened this issue Feb 12, 2018 · 9 comments

Comments

@nullr0ute
Copy link

On a Raspberry Pi 3 running Fedora 27 on ARMv7 32 bit I've seen this use after free. Not sure it it's reproducible but will keep an eye out as I test 4.15 more widely.

[  224.202345] alloc_contig_range: 4 callbacks suppressed
[  224.202354] alloc_contig_range: [2c200, 2d200) PFNs busy
[  224.216771] alloc_contig_range: [2c200, 2d300) PFNs busy
[  224.226177] alloc_contig_range: [2c400, 2d400) PFNs busy
[  224.238606] alloc_contig_range: [2c400, 2d500) PFNs busy
[  224.254055] alloc_contig_range: [2c400, 2d600) PFNs busy
[  224.266467] alloc_contig_range: [2c400, 2d700) PFNs busy
[  224.275460] alloc_contig_range: [2c800, 2d800) PFNs busy
[  224.284391] alloc_contig_range: [2c800, 2d900) PFNs busy
[  224.293236] alloc_contig_range: [2c800, 2da00) PFNs busy
[  224.302083] alloc_contig_range: [2c800, 2db00) PFNs busy
[  227.950421] ------------[ cut here ]------------
[  227.955220] WARNING: CPU: 0 PID: 1317 at lib/refcount.c:281 refcount_dec_not_one+0x8c/0xb8
[  227.963696] refcount_t: underflow; use-after-free.
[  227.968631] Modules linked in: vfat fat rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc rc_cec vc4 snd_soc_core ac97_bus snd_pcm_dmaengine snd_seq snd_seq_device snd_pcm snd_timer snd soundcore cec rc_core drm_kms_helper joydev drm hci_uart brcmfmac btbcm btqca brcmutil btintel bluetooth fb_sys_fops syscopyarea cfg80211 sysfillrect sysimgblt ecdh_generic rfkill bcm2835_thermal
[  228.040690]  bcm2835_wdt bcm2835_rng leds_gpio hid_logitech_hidpp hid_logitech_dj smsc95xx usbnet mii mmc_block dwc2 crc32_arm_ce sdhci_iproc sdhci_pltfm udc_core sdhci bcm2835_dma pwm_bcm2835 i2c_bcm2835 bcm2835 phy_generic
[  228.061023] CPU: 0 PID: 1317 Comm: gnome-shell Not tainted 4.15.2-300.fc27.armv7hl #1
[  228.068966] Hardware name: BCM2835
[  228.072448] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[  228.080315] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[  228.087651] [] (dump_stack) from [] (__warn+0xdc/0xf8)
[  228.094635] [] (__warn) from [] (warn_slowpath_fmt+0x3c/0x4c)
[  228.102240] [] (warn_slowpath_fmt) from [] (refcount_dec_not_one+0x8c/0xb8)
[  228.111163] [] (refcount_dec_not_one) from [] (vc4_bo_dec_usecnt+0x1c/0x78 [vc4])
[  228.120784] [] (vc4_bo_dec_usecnt [vc4]) from [] (drm_atomic_helper_cleanup_planes+0x60/0x68 [drm_kms_helper])
[  228.132902] [] (drm_atomic_helper_cleanup_planes [drm_kms_helper]) from [] (vc4_atomic_complete_commit+0x84/0xc8 [vc4])
[  228.145747] [] (vc4_atomic_complete_commit [vc4]) from [] (vc4_atomic_commit+0x118/0x124 [vc4])
[  228.156528] [] (vc4_atomic_commit [vc4]) from [] (drm_atomic_helper_disable_plane+0xbc/0xc0 [drm_kms_helper])
[  228.168815] [] (drm_atomic_helper_disable_plane [drm_kms_helper]) from [] (__setplane_internal+0x48/0x1e0 [drm])
[  228.181486] [] (__setplane_internal [drm]) from [] (drm_mode_cursor_universal+0x158/0x1bc [drm])
[  228.192717] [] (drm_mode_cursor_universal [drm]) from [] (drm_mode_cursor_common+0xd8/0x1d0 [drm])
[  228.204119] [] (drm_mode_cursor_common [drm]) from [] (drm_ioctl+0x2b8/0x348 [drm])
[  228.213934] [] (drm_ioctl [drm]) from [] (vfs_ioctl+0x28/0x3c)
[  228.221631] [] (vfs_ioctl) from [] (do_vfs_ioctl+0x8c/0x850)
[  228.229145] [] (do_vfs_ioctl) from [] (SyS_ioctl+0x58/0x74)
[  228.236576] [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x54)
[  228.244318] ---[ end trace 2f6e6444c7159640 ]---
[  290.725119] [drm] Resetting GPU.
[  308.441650] alloc_contig_range: 6 callbacks suppressed
[  308.441661] alloc_contig_range: [2c200, 2d200) PFNs busy
[  308.462519] alloc_contig_range: [2df00, 2ef00) PFNs busy
[  308.469304] alloc_contig_range: [2e000, 2f000) PFNs busy
[  308.476298] alloc_contig_range: [2e000, 2f100) PFNs busy
[  308.483442] alloc_contig_range: [2e000, 2f200) PFNs busy
[  308.490400] alloc_contig_range: [2e000, 2f300) PFNs busy
[  308.499431] alloc_contig_range: [2e400, 2f400) PFNs busy
[  308.506774] alloc_contig_range: [2e400, 2f500) PFNs busy
[  308.514067] alloc_contig_range: [2e600, 2f600) PFNs busy
[  308.521645] alloc_contig_range: [2e700, 2f700) PFNs busy
[  353.633899] alloc_contig_range: 5 callbacks suppressed
[  353.633910] alloc_contig_range: [20c54, 20c55) PFNs busy
[  355.596194] alloc_contig_range: [20ce6, 20ce7) PFNs busy
[  377.041318] alloc_contig_range: [2e600, 2edbc) PFNs busy
[  377.047977] alloc_contig_range: [2e600, 2eebc) PFNs busy
[  377.065662] alloc_contig_range: [2e800, 2efbc) PFNs busy
[  377.072379] alloc_contig_range: [2e800, 2f0bc) PFNs busy
[  377.079176] alloc_contig_range: [2e800, 2f1bc) PFNs busy
[  377.085692] alloc_contig_range: [2e800, 2f2bc) PFNs busy
[  377.091665] alloc_contig_range: [2ec00, 2f3bc) PFNs busy
[  397.019076] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  397.026153] [drm]                         kernel:   8100kb BOs (1)
[  397.026162] [drm]                            V3D: 196524kb BOs (440)
[  397.026167] [drm]                     V3D shader:    356kb BOs (89)
[  397.026172] [drm]                           dumb:    272kb BOs (17)
[  397.026177] [drm]                            RCL:      8kb BOs (1)
[  397.026182] [drm]                            BCL:     16kb BOs (1)
[  397.026201] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.

On the CMA note it's got 256Mb of CMA allocated:

[    0.000000] Linux version 4.15.2-300.1.fc27.armv7hl (mockbuild@buildvm-armv7-05.arm.fedoraproject.org) (gcc version 7.3.1 20180130 (Red Hat 7.3.1-2) (GCC)) #1 SMP Sun Feb 11 15:12:45 UTC 2018

[    0.000000] Kernel command line: ro root=UUID=3293611e-970f-46ae-9b1d-e29eae96e079  cma=192MB cma=256MB LANG=en_GB.UTF-8
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 715924K/1021952K available (7875K kernel code, 1325K rwdata, 3764K rodata, 2048K init, 520K bss, 43884K reserved, 262144K cma-reserved, 235520K highmem)
[    0.000000] Virtual kernel memory layout:
                   vector  : 0xffff0000 - 0xffff1000   (   4 kB)
                   fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
                   vmalloc : 0xf0800000 - 0xff800000   ( 240 MB)
                   lowmem  : 0xc0000000 - 0xf0000000   ( 768 MB)
                   pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
                   modules : 0xbf000000 - 0xbfe00000   (  14 MB)
                     .text : 0x(ptrval) - 0x(ptrval)   (8868 kB)
                     .init : 0x(ptrval) - 0x(ptrval)   (2048 kB)
                     .data : 0x(ptrval) - 0x(ptrval)   (1326 kB)
                      .bss : 0x(ptrval) - 0x(ptrval)   ( 521 kB)
@nullr0ute
Copy link
Author

Running GNOME Desktop as Wayland

@lategoodbye
Copy link

lategoodbye commented Feb 12, 2018

Please report this to Eric Anholt, Boris Brezillon, dri-devel per mail.

@anholt
Copy link
Owner

anholt commented Mar 8, 2018

There was a discussion about this and the conclusion was that we need to switch back to atomic_t. We lost track of the bug, it seems.

@anholt anholt changed the title use after free on 4.15.2 vc4: refcount API complaint due to misuse in MADV Mar 8, 2018
@bbrezillon
Copy link

Hm, not sure this is the same issue here. A false positive has been fixed in vc4_bo_inc_usecnt() https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/vc4?h=v4.15.7&id=5bfd40139d55790cbc8e56ad1ce4f974f1fa186d, but maybe this one is a real use after free issue.

I'll have a closer Look.

stschake pushed a commit to stschake/linux that referenced this issue Mar 16, 2018
Previously, if a tunnel was closed, we called inet_shutdown to mark
the socket as unconnected such that userspace would get errors and
then close the socket. This could race with userspace closing the
socket. Instead, leave userspace to close the socket in its own time
(our tunnel will be detached anyway).

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: __lock_acquire+0x263/0x1630
PGD 0 P4D 0
Oops: 0000 [anholt#1] SMP KASAN
Modules linked in:
CPU: 2 PID: 42 Comm: kworker/u8:2 Not tainted 4.15.0-rc7+ anholt#129
Workqueue: l2tp l2tp_tunnel_del_work
RIP: 0010:__lock_acquire+0x263/0x1630
RSP: 0018:ffff88001a37fc70 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000088 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001a37fd18 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000076fd R12: 00000000000000a0
R13: ffff88001a3722c0 R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88001ad00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 000000001730b000 CR4: 00000000000006e0
Call Trace:
 ? __lock_acquire+0xc77/0x1630
 ? console_trylock+0x11/0xa0
 lock_acquire+0x117/0x230
 ? lock_sock_nested+0x3a/0xa0
 _raw_spin_lock_bh+0x3a/0x50
 ? lock_sock_nested+0x3a/0xa0
 lock_sock_nested+0x3a/0xa0
 inet_shutdown+0x33/0xf0
 l2tp_tunnel_del_work+0x60/0xef
 process_one_work+0x1ea/0x5f0
 ? process_one_work+0x162/0x5f0
 worker_thread+0x48/0x3e0
 ? trace_hardirqs_on+0xd/0x10
 kthread+0x108/0x140
 ? process_one_work+0x5f0/0x5f0
 ? kthread_stop+0x2a0/0x2a0
 ret_from_fork+0x24/0x30
Code: 00 41 81 ff ff 1f 00 00 0f 87 7a 13 00 00 45 85 f6 49 8b 85
68 08 00 00 0f 84 ae 03 00 00 c7 44 24 18 00 00 00 00 e9 f0 00 00 00 <49> 81 3c
24 80 93 3f 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f
RIP: __lock_acquire+0x263/0x1630 RSP: ffff88001a37fc70
CR2: 00000000000000a0

Fixes: 309795f ("l2tp: Add netlink control API for L2TP")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@bbrezillon
Copy link

@nullr0ute, did you find an easy way to reproduce the problem?

I had a look at the code this morning and couldn't find a case where we could hit this problem.
Everytime you attach a BO to a plane ->usecnt is incremented and everytime you detach it from the plane it is decremented, so assuming the ->prepare_fb()/->cleanup_fb() are balanced we shouldn't see this kind of issue.

I'll keep digging, but that'd be easier to debug if you have a way to reproduce the bug.

@bbrezillon
Copy link

@anholt, looks like the async-plane-update path is not calling drm_atomic_helper_{prepare,cleanup}_planes() which might explain why we get an inconsistent ->usecnt.

@nullr0ute
Copy link
Author

I don't and I've not seen it regularly on 4.16, although I have been traveling so my testing with GUI has been minimal, should be doing more RSN but 4.16/17 focused.

@lategoodbye
Copy link

@nullr0ute Is this still reproducible with Fedora 29?

@nullr0ute
Copy link
Author

I think we can close it off, I don't remember seeing it, can always re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants