Understand OABUFFER_UDW workaround #20

rib · 2016-03-02T09:03:14Z

On SKL we have a workaround for correctly programming the OABUFFER base address which wasn't expected, so need to investigate this further.

Remove spinlock and use the "rtc->ops_lock" from RTC subsystem instead. spin_lock_irqsave() is not needed here because we do not have hard IRQs. This patch fixes the following issue: root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc root@GE004097290448 b850v3:~# hwclock --systohc [ 82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855 [ 82.113660] lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1 [ 82.121329] CPU: 0 PID: 855 Comm: hwclock Not tainted 4.8.0-00042-g09d5410-dirty rib#20 [ 82.129078] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 82.135609] Backtrace: [ 82.138090] [<8010d378>] (dump_backtrace) from [<8010d5c0>] (show_stack+0x20/0x24) [ 82.145664] r7:ec936000 r6:600a0013 r5:00000000 r4:81031680 [ 82.151402] [<8010d5a0>] (show_stack) from [<80401518>] (dump_stack+0xb4/0xe8) [ 82.158636] [<80401464>] (dump_stack) from [<8017b8b0>] (spin_dump+0x84/0xcc) [ 82.165775] r10:00000000 r9:ec936000 r8:81056090 r7:600a0013 r6:edb4899c r5:edb4899c [ 82.173691] r4:e5033e00 r3:00000000 [ 82.177308] [<8017b82c>] (spin_dump) from [<8017bcb0>] (do_raw_spin_unlock+0x108/0x130) [ 82.185314] r5:edb4899c r4:edb4899c [ 82.188938] [<8017bba8>] (do_raw_spin_unlock) from [<8094b93c>] (_raw_spin_unlock_irqrestore+0x34/0x54) [ 82.198333] r5:edb4899c r4:600a0013 [ 82.201953] [<8094b908>] (_raw_spin_unlock_irqrestore) from [<8065b090>] (rx8010_set_time+0x14c/0x188) [ 82.211261] r5:00000020 r4:edb48990 [ 82.214882] [<8065af44>] (rx8010_set_time) from [<80653fe4>] (rtc_set_time+0x70/0x104) [ 82.222801] r7:00000051 r6:edb39da0 r5:edb39c00 r4:ec937e8c [ 82.228535] [<80653f74>] (rtc_set_time) from [<80655774>] (rtc_dev_ioctl+0x3c4/0x674) [ 82.236368] r7:00000051 r6:7ecf1b74 r5:00000000 r4:edb39c00 [ 82.242106] [<806553b0>] (rtc_dev_ioctl) from [<80284034>] (do_vfs_ioctl+0xa4/0xa6c) [ 82.249851] r8:00000003 r7:80284a40 r6:ed1e9c80 r5:edb44e60 r4:7ecf1b74 [ 82.256642] [<80283f90>] (do_vfs_ioctl) from [<80284a40>] (SyS_ioctl+0x44/0x6c) [ 82.263953] r10:00000000 r9:ec936000 r8:7ecf1b74 r7:4024700a r6:ed1e9c80 r5:00000003 [ 82.271869] r4:ed1e9c80 [ 82.274432] [<802849fc>] (SyS_ioctl) from [<80108520>] (ret_fast_syscall+0x0/0x1c) [ 82.282005] r9:ec936000 r8:801086c4 r7:00000036 r6:00000000 r5:00000003 r4:0008e1bc root@GE004097290448 b850v3:~# Message from syslogd@GE004097290448 at Dec 3 11:17:08 ... kernel:[ 82.108175] BUG: spinlock wrong CPU on CPU#0, hwclock/855 Message from syslogd@GE004097290448 at Dec 3 11:17:08 ... kernel:[ 82.113660] lock: 0xedb4899c, .magic: dead4ead, .owner: hwclock/855, .owner_cpu: 1 hwclock --systohc root@GE004097290448 b850v3:~# Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.co.uk> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>

As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Commit 57e5568 ("sata_via: Implement hotplug for VT6421") adds hotplug IRQ handler for VT6421 but enables hotplug on all chips. This is a bug because it causes "irq xx: nobody cared" error on VT6420 when hot-(un)plugging a drive: [ 381.839948] irq 20: nobody cared (try booting with the "irqpoll" option) [ 381.840014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc5+ torvalds#148 [ 381.840066] Hardware name: P4VM800/P4VM800, BIOS P1.60 05/29/2006 [ 381.840117] Call Trace: [ 381.840167] <IRQ> [ 381.840225] ? dump_stack+0x44/0x58 [ 381.840278] ? __report_bad_irq+0x14/0x97 [ 381.840327] ? handle_edge_irq+0xa5/0xa5 [ 381.840376] ? note_interrupt+0x155/0x1cf [ 381.840426] ? handle_edge_irq+0xa5/0xa5 [ 381.840474] ? handle_irq_event_percpu+0x32/0x38 [ 381.840524] ? handle_irq_event+0x1f/0x38 [ 381.840573] ? handle_fasteoi_irq+0x69/0xb8 [ 381.840625] ? handle_irq+0x4f/0x5d [ 381.840672] </IRQ> [ 381.840726] ? do_IRQ+0x2e/0x8b [ 381.840782] ? common_interrupt+0x2c/0x34 [ 381.840836] ? mwait_idle+0x60/0x82 [ 381.840892] ? arch_cpu_idle+0x6/0x7 [ 381.840949] ? do_idle+0x96/0x18e [ 381.841002] ? cpu_startup_entry+0x16/0x1a [ 381.841057] ? start_kernel+0x319/0x31c [ 381.841111] ? startup_32_smp+0x166/0x168 [ 381.841165] handlers: [ 381.841219] [<c12a7263>] ata_bmdma_interrupt [ 381.841274] Disabling IRQ rib#20 Seems that VT6420 can do hotplug too (there's no documentation) but the comments say that SCR register access (required for detecting hotplug events) can cause problems on these chips. For now, just keep hotplug disabled on anything other than VT6421. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Tejun Heo <tj@kernel.org>

commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] rib#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] rib#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] rib#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] rib#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] rib#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] rib#14 [ffff8800a57bbe00] evict at ffffffff811e1b67 rib#15 [ffff8800a57bbe28] iput at ffffffff811e23a5 rib#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 rib#17 [ffff8800a57bbe88] dput at ffffffff811dd06c rib#18 [ffff8800a57bbea8] __fput at ffffffff811c823b rib#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e rib#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 rib#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c rib#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: rib#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . rib#9 [] tcp_rcv_established at ffffffff81580b64 rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a rib#11 [] tcp_v4_rcv at ffffffff8158cd02 rib#12 [] ip_local_deliver_finish at ffffffff815668f4 rib#13 [] ip_local_deliver at ffffffff81566bd9 rib#14 [] ip_rcv_finish at ffffffff8156656d rib#15 [] ip_rcv at ffffffff81566f06 rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2 rib#17 [] __netif_receive_skb at ffffffff8152b608 rib#18 [] netif_receive_skb at ffffffff8152b690 rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] rib#21 [] net_rx_action at ffffffff8152bac2 rib#22 [] __do_softirq at ffffffff81084b4f rib#23 [] call_softirq at ffffffff8164845c rib#24 [] do_softirq at ffffffff81016fc5 rib#25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cifs_relock_file() can perform a down_write() on the inode's lock_sem even though it was already performed in cifs_strict_readv(). Lockdep complains about this. AFAICS, there is no problem here, and lockdep just needs to be told that this nesting is OK. ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ rib#20 Not tainted --------------------------------------------- cat/701 is trying to acquire lock: (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 but task is already holding lock: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&cifsi->lock_sem); lock(&cifsi->lock_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by cat/701: #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 stack backtrace: CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ rib#20 Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? trace_hardirqs_on_thunk+0x1a/0x1c ? preempt_schedule_irq+0x6b/0x80 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? cifs_reopen_file+0x7a7/0xc00 down_read+0x2d/0x70 ? cifs_reopen_file+0x7a7/0xc00 cifs_reopen_file+0x7a7/0xc00 ? printk+0x43/0x4b cifs_readpage_worker+0x327/0x8a0 cifs_readpage+0x8c/0x2a0 generic_file_read_iter+0x692/0xd00 cifs_strict_readv+0x29f/0x310 generic_file_splice_read+0x11c/0x1c0 do_splice_to+0xa5/0xc0 splice_direct_to_actor+0xfa/0x350 ? generic_pipe_buf_nosteal+0x10/0x10 do_splice_direct+0xb5/0xe0 do_sendfile+0x278/0x3a0 SyS_sendfile64+0xc4/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: rib#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . rib#9 [] tcp_rcv_established at ffffffff81580b64 rib#10 [] tcp_v4_do_rcv at ffffffff8158b54a rib#11 [] tcp_v4_rcv at ffffffff8158cd02 rib#12 [] ip_local_deliver_finish at ffffffff815668f4 rib#13 [] ip_local_deliver at ffffffff81566bd9 rib#14 [] ip_rcv_finish at ffffffff8156656d rib#15 [] ip_rcv at ffffffff81566f06 rib#16 [] __netif_receive_skb_core at ffffffff8152b3a2 rib#17 [] __netif_receive_skb at ffffffff8152b608 rib#18 [] netif_receive_skb at ffffffff8152b690 rib#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] rib#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] rib#21 [] net_rx_action at ffffffff8152bac2 rib#22 [] __do_softirq at ffffffff81084b4f rib#23 [] call_softirq at ffffffff8164845c rib#24 [] do_softirq at ffffffff81016fc5 rib#25 [] irq_exit at ffffffff81084ee5 torvalds#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This prevents a deadlock that somehow results from the suspend() -> forbid() -> resume() callchain. [ 125.266960] [drm] Initialized nouveau 1.3.1 20120801 for 0000:02:00.0 on minor 1 [ 370.120872] INFO: task kworker/4:1:77 blocked for more than 120 seconds. [ 370.120920] Tainted: G O 4.12.0-rc3 rib#20 [ 370.120947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.120982] kworker/4:1 D13808 77 2 0x00000000 [ 370.120998] Workqueue: pm pm_runtime_work [ 370.121004] Call Trace: [ 370.121018] __schedule+0x2bf/0xb40 [ 370.121025] ? mark_held_locks+0x5f/0x90 [ 370.121038] schedule+0x3d/0x90 [ 370.121044] rpm_resume+0x107/0x870 [ 370.121052] ? finish_wait+0x90/0x90 [ 370.121065] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121070] pm_runtime_forbid+0x4c/0x60 [ 370.121129] nouveau_pmops_runtime_suspend+0xaf/0xc0 [nouveau] [ 370.121139] pci_pm_runtime_suspend+0x5f/0x170 [ 370.121147] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121152] __rpm_callback+0xb9/0x1e0 [ 370.121159] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121166] rpm_callback+0x24/0x80 [ 370.121171] ? pci_pm_runtime_resume+0xa0/0xa0 [ 370.121176] rpm_suspend+0x138/0x6e0 [ 370.121192] pm_runtime_work+0x7b/0xc0 [ 370.121199] process_one_work+0x253/0x6a0 [ 370.121216] worker_thread+0x4d/0x3b0 [ 370.121229] kthread+0x133/0x150 [ 370.121234] ? process_one_work+0x6a0/0x6a0 [ 370.121238] ? kthread_create_on_node+0x70/0x70 [ 370.121246] ret_from_fork+0x2a/0x40 [ 370.121283] Showing all locks held in the system: [ 370.121291] 2 locks held by kworker/4:1/77: [ 370.121298] #0: ("pm"){.+.+.+}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0 [ 370.121315] rib#1: ((&dev->power.work)){+.+.+.}, at: [<ffffffffac0d3530>] process_one_work+0x1d0/0x6a0 [ 370.121330] 1 lock held by khungtaskd/81: [ 370.121333] #0: (tasklist_lock){.+.+..}, at: [<ffffffffac10fc8d>] debug_show_all_locks+0x3d/0x1a0 [ 370.121355] 1 lock held by dmesg/1639: [ 370.121358] #0: (&user->lock){+.+.+.}, at: [<ffffffffac124b6d>] devkmsg_read+0x4d/0x360 [ 370.121377] ============================================= Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

According to the kerneldoc[0], should do fbdev setup before calling drm_kms_helper_poll_init(), otherwise, Kms hotplug event may race into fbdev helper initial, and fb_helper->dev may be NULL pointer, that would cause the bug: [ 0.735411] [00000200] *pgd=00000000f6ffe003, *pud=00000000f6ffe003, *pmd=0000000000000000 [ 0.736156] Internal error: Oops: 96000005 [rib#1] PREEMPT SMP [ 0.736648] Modules linked in: [ 0.736930] CPU: 2 PID: 20 Comm: kworker/2:0 Not tainted 4.4.41 rib#20 [ 0.737480] Hardware name: Rockchip RK3399 Board rev2 (BOX) (DT) [ 0.738020] Workqueue: events cdn_dp_pd_event_work [ 0.738447] task: ffffffc0f21f3100 ti: ffffffc0f2218000 task.ti: ffffffc0f2218000 [ 0.739109] PC is at mutex_lock+0x14/0x44 [ 0.739469] LR is at drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756253] [<ffffff8008a344f4>] mutex_lock+0x14/0x44 [ 0.756260] [<ffffff8008445708>] drm_fb_helper_hotplug_event+0x30/0x114 [ 0.756271] [<ffffff8008473c84>] rockchip_drm_output_poll_changed+0x18/0x20 [ 0.756280] [<ffffff8008439fcc>] drm_kms_helper_hotplug_event+0x28/0x34 [ 0.756286] [<ffffff800846c444>] cdn_dp_pd_event_work+0x394/0x3c4 [ 0.756295] [<ffffff80080b2b38>] process_one_work+0x218/0x3e0 [ 0.756302] [<ffffff80080b3538>] worker_thread+0x2e8/0x404 [ 0.756308] [<ffffff80080b7e70>] kthread+0xe8/0xf0 [ 0.756316] [<ffffff8008082690>] ret_from_fork+0x10/0x40 [0]: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-kms-helpers.html Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501575103-20136-1-git-send-email-mark.yao@rock-chips.com

This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0 regression, that was introduced by commit 01d4d67 Author: Nicholas Bellinger <nab@linux-iscsi.org> Date: Wed Dec 7 12:55:54 2016 -0800 which originally had the proper list_del_init() usage, but was dropped during list review as it was thought unnecessary by HCH. However, list_del_init() usage is required during the special generate_node_acls = 1 + cache_dynamic_acls = 0 case when transport_free_session() does a list_del(&se_nacl->acl_list), followed by target_complete_nacl() doing the same thing. This was manifesting as a general protection fault as reported by Justin: kernel: general protection fault: 0000 [rib#1] SMP kernel: Modules linked in: kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ rib#20 kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014 kernel: task: ffff88026939e800 task.stack: ffffc90007884000 kernel: RIP: 0010:target_put_nacl+0x49/0xb0 kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246 kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000 kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028 kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000 kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020 kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540 kernel: FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: transport_free_session+0x67/0x140 kernel: transport_deregister_session+0x7a/0xc0 kernel: iscsit_close_session+0x92/0x210 kernel: iscsit_close_connection+0x5f9/0x840 kernel: iscsit_take_action_for_connection_exit+0xfe/0x110 kernel: iscsi_target_tx_thread+0x140/0x1e0 kernel: ? wait_woken+0x90/0x90 kernel: kthread+0x124/0x160 kernel: ? iscsit_thread_get_cpumask+0x90/0x90 kernel: ? kthread_create_on_node+0x40/0x40 kernel: ret_from_fork+0x22/0x30 kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c 89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89 ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20 kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70 kernel: ---[ end trace f12821adbfd46fed ]--- To address this, go ahead and use proper list_del_list() for all cases of se_nacl->acl_list deletion. Reported-by: Justin Maggard <jmaggard01@gmail.com> Tested-by: Justin Maggard <jmaggard01@gmail.com> Cc: Justin Maggard <jmaggard01@gmail.com> Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

syzkaller reported a refcount_t warning [1] Issue here is that noop_qdisc refcnt was never really considered as a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set : if (qdisc->flags & TCQ_F_BUILTIN || !refcount_dec_and_test(&qdisc->refcnt))) return; Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not really needed, but harmless until refcount_t came. To fix this problem, we simply need to not increment noop_qdisc.refcnt, since we never decrement it. [1] refcount_t: increment on 0; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ rib#20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152 RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282 RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000 RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8 RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0 R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000 attach_default_qdiscs net/sched/sch_generic.c:792 [inline] dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833 __dev_open+0x227/0x330 net/core/dev.c:1380 __dev_change_flags+0x695/0x990 net/core/dev.c:6726 dev_change_flags+0x88/0x140 net/core/dev.c:6792 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554 sock_do_ioctl+0x94/0xb0 net/socket.c:968 sock_ioctl+0x2c2/0x440 net/socket.c:1058 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 Fixes: 7b93640 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>

…-text symbols" This reverts commit 83e840c ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols"). Chandan reported that on newer kernels, trying to enable function_graph tracer on ppc64 (BE) locks up the system with the following trace: Unable to handle kernel paging request for data at address 0x600000002fa30010 Faulting instruction address: 0xc0000000001f1300 Thread overran stack, or stack corrupted Oops: Kernel access of bad area, sig: 11 [rib#1] BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries Modules linked in: CPU: 1 PID: 6586 Comm: bash Not tainted 4.14.0-rc3-00162-g6e51f1f-dirty rib#20 task: c000000625c07200 task.stack: c000000625c07310 NIP: c0000000001f1300 LR: c000000000121cac CTR: c000000000061af8 REGS: c000000625c088c0 TRAP: 0380 Not tainted (4.14.0-rc3-00162-g6e51f1f-dirty) MSR: 8000000000001032 <SF,ME,IR,DR,RI> CR: 28002848 XER: 00000000 CFAR: c0000000001f1320 SOFTE: 0 ... NIP [c0000000001f1300] .__is_insn_slot_addr+0x30/0x90 LR [c000000000121cac] .kernel_text_address+0x18c/0x1c0 Call Trace: [c000000625c08b40] [c0000000001bd040] .is_module_text_address+0x20/0x40 (unreliable) [c000000625c08bc0] [c000000000121cac] .kernel_text_address+0x18c/0x1c0 [c000000625c08c50] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c08cf0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c08d60] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c08df0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 ... [c000000625c0ab30] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0abd0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ac40] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c0acd0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0ad70] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ade0] [c000000000121b40] .kernel_text_address+0x20/0x1c0 This is because ftrace is using ppc_function_entry() for obtaining the address of return_to_handler() in prepare_ftrace_return(). The call to kernel_text_address() itself gets traced and we end up in a recursive loop. Fixes: 83e840c ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Cc: stable@vger.kernel.org # v4.13+ Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1 and N2, we want to transfer socket ownership to the new fresh skbs. In order to avoid expensive atomic operations on a cache line subject to cache bouncing, we replace the sequence : refcount_add(N1, &sk->sk_wmem_alloc); refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments refcount_sub(O, &sk->sk_wmem_alloc); by a single refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc); Problem is : In some pathological cases, sum(N) - O might be a negative number, and syzkaller bot was apparently able to trigger this trace [1] atomic_t was ok with this construct, but we need to take care of the negative delta with refcount_t [1] refcount_t: saturated; leaking memory. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ rib#20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1c4/0x1e0 kernel/panic.c:546 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177 do_trap_no_signal arch/x86/kernel/traps.c:211 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:260 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77 RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282 RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000 RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68 RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f refcount_add+0x1b/0x60 lib/refcount.c:101 tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155 tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51 inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271 skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749 __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821 skb_gso_segment include/linux/netdevice.h:3971 [inline] validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074 __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497 dev_queue_xmit+0x17/0x20 net/core/dev.c:3538 neigh_hh_output include/net/neighbour.h:471 [inline] neigh_output include/net/neighbour.h:479 [inline] ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229 ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:238 [inline] ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:459 [inline] ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124 ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137 tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341 __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513 tcp_push_pending_frames include/net/tcp.h:1722 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline] tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2776 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x31c/0x890 net/socket.c:2048 __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138 Fixes: 14afee4 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Currently on driver bringup with KASAN enabled, meson triggers an OOB memory access as shown below: [ 117.904528] ================================================================== [ 117.904560] BUG: KASAN: global-out-of-bounds in meson_viu_set_osd_lut+0x7a0/0x890 [ 117.904588] Read of size 4 at addr ffff20000a63ce24 by task systemd-udevd/498 [ 117.904601] [ 118.083372] CPU: 4 PID: 498 Comm: systemd-udevd Not tainted 4.20.0-rc3Lyude-Test+ rib#20 [ 118.091143] Hardware name: amlogic khadas-vim2/khadas-vim2, BIOS 2018.07-rc2-armbian 09/11/2018 [ 118.099768] Call trace: [ 118.102181] dump_backtrace+0x0/0x3e8 [ 118.105796] show_stack+0x14/0x20 [ 118.109083] dump_stack+0x130/0x1c4 [ 118.112539] print_address_description+0x60/0x25c [ 118.117214] kasan_report+0x1b4/0x368 [ 118.120851] __asan_report_load4_noabort+0x18/0x20 [ 118.125566] meson_viu_set_osd_lut+0x7a0/0x890 [ 118.129953] meson_viu_init+0x10c/0x290 [ 118.133741] meson_drv_bind_master+0x474/0x748 [ 118.138141] meson_drv_bind+0x10/0x18 [ 118.141760] try_to_bring_up_master+0x3d8/0x768 [ 118.146249] component_add+0x214/0x570 [ 118.149978] meson_dw_hdmi_probe+0x18/0x20 [meson_dw_hdmi] [ 118.155404] platform_drv_probe+0x98/0x138 [ 118.159455] really_probe+0x2a0/0xa70 [ 118.163070] driver_probe_device+0x1b4/0x2d8 [ 118.167299] __driver_attach+0x200/0x280 [ 118.171189] bus_for_each_dev+0x10c/0x1a8 [ 118.175144] driver_attach+0x38/0x50 [ 118.178681] bus_add_driver+0x330/0x608 [ 118.182471] driver_register+0x140/0x388 [ 118.186361] __platform_driver_register+0xc8/0x108 [ 118.191117] meson_dw_hdmi_platform_driver_init+0x1c/0x1000 [meson_dw_hdmi] [ 118.198022] do_one_initcall+0x12c/0x3bc [ 118.201883] do_init_module+0x1fc/0x638 [ 118.205673] load_module+0x4b4c/0x6808 [ 118.209387] __se_sys_init_module+0x2e8/0x3c0 [ 118.213699] __arm64_sys_init_module+0x68/0x98 [ 118.218100] el0_svc_common+0x104/0x210 [ 118.221893] el0_svc_handler+0x48/0xb8 [ 118.225594] el0_svc+0x8/0xc [ 118.228429] [ 118.229887] The buggy address belongs to the variable: [ 118.235007] eotf_33_linear_mapping+0x84/0xc0 [ 118.239301] [ 118.240752] Memory state around the buggy address: [ 118.245522] ffff20000a63cd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 118.252695] ffff20000a63cd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 118.259850] >ffff20000a63ce00: 00 00 00 00 04 fa fa fa fa fa fa fa 00 00 00 00 [ 118.267000] ^ [ 118.271222] ffff20000a63ce80: 00 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 [ 118.278393] ffff20000a63cf00: 00 00 00 00 00 00 00 00 00 00 00 00 04 fa fa fa [ 118.285542] ================================================================== [ 118.292699] Disabling lock debugging due to kernel taint It seems that when looping through the OSD EOTF LUT maps, we use the same max iterator for OETF: 20. This is wrong though, since 20*2 is 40, which means that we'll stop out of bounds on the EOTF maps. But, this whole thing is already confusing enough to read through as-is, so let's just replace all of the hardcoded sizes with OSD_(OETF/EOTF)_LUT_SIZE / 2. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: bbbe775 ("drm: Add support for Amlogic Meson Graphic Controller") Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Maxime Ripard <maxime.ripard@bootlin.com> Cc: Carlo Caione <carlo@caione.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-amlogic@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: <stable@vger.kernel.org> # v4.10+ Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181125012117.31915-1-lyude@redhat.com Signed-off-by: Sean Paul <seanpaul@chromium.org>

…r-free issue The evlist should be destroyed before the perf session. Detected with gcc's ASan: ================================================================= ==27350==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000002e38 at pc 0x5611da276999 bp 0x7ffce8f1d1a0 sp 0x7ffce8f1d190 WRITE of size 8 at 0x62b000002e38 thread T0 #0 0x5611da276998 in __list_del /home/work/linux/tools/include/linux/list.h:89 rib#1 0x5611da276d4a in __list_del_entry /home/work/linux/tools/include/linux/list.h:102 rib#2 0x5611da276e77 in list_del_init /home/work/linux/tools/include/linux/list.h:145 rib#3 0x5611da2781cd in thread__put util/thread.c:130 rib#4 0x5611da2cc0a8 in __thread__zput util/thread.h:68 rib#5 0x5611da2d2dcb in hist_entry__delete util/hist.c:1148 rib#6 0x5611da2cdf91 in hists__delete_entry util/hist.c:337 rib#7 0x5611da2ce19e in hists__delete_entries util/hist.c:365 rib#8 0x5611da2db2ab in hists__delete_all_entries util/hist.c:2639 rib#9 0x5611da2db325 in hists_evsel__exit util/hist.c:2651 rib#10 0x5611da1c5352 in perf_evsel__exit util/evsel.c:1304 rib#11 0x5611da1c5390 in perf_evsel__delete util/evsel.c:1309 rib#12 0x5611da1b35f0 in perf_evlist__purge util/evlist.c:124 rib#13 0x5611da1b38e2 in perf_evlist__delete util/evlist.c:148 rib#14 0x5611da069781 in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1645 rib#15 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#16 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#17 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#18 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#19 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) rib#20 0x5611d9ff35c9 in _start (/home/work/linux/tools/perf/perf+0x3e95c9) 0x62b000002e38 is located 11320 bytes inside of 27448-byte region [0x62b000000200,0x62b000006d38) freed by thread T0 here: #0 0x7fdccb04ab70 in free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedb70) rib#1 0x5611da260df4 in perf_session__delete util/session.c:201 rib#2 0x5611da063de5 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1300 rib#3 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642 rib#4 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#5 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#6 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#7 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#8 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) previously allocated by thread T0 here: #0 0x7fdccb04b138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) rib#1 0x5611da26010c in zalloc util/util.h:23 rib#2 0x5611da260824 in perf_session__new util/session.c:118 rib#3 0x5611da0633a6 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1192 rib#4 0x5611da06973c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642 rib#5 0x5611da17d038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 rib#6 0x5611da17d577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 rib#7 0x5611da17d97b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 rib#8 0x5611da17e0e9 in main /home/changbin/work/linux/tools/perf/perf.c:520 rib#9 0x7fdcc970f09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) SUMMARY: AddressSanitizer: heap-use-after-free /home/work/linux/tools/include/linux/list.h:89 in __list_del Shadow bytes around the buggy address: 0x0c567fff8570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c567fff85c0: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd 0x0c567fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff85f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c567fff8610: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==27350==ABORTING Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190316080556.3075-8-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Current snapshot implementation swaps two ring_buffers even though their sizes are different from each other, that can cause an inconsistency between the contents of buffer_size_kb file and the current buffer size. For example: # cat buffer_size_kb 7 (expanded: 1408) # echo 1 > events/enable # grep bytes per_cpu/cpu0/stats bytes: 1441020 # echo 1 > snapshot // current:1408, spare:1408 # echo 123 > buffer_size_kb // current:123, spare:1408 # echo 1 > snapshot // current:1408, spare:123 # grep bytes per_cpu/cpu0/stats bytes: 1443700 # cat buffer_size_kb 123 // != current:1408 And also, a similar per-cpu case hits the following WARNING: Reproducer: # echo 1 > per_cpu/cpu0/snapshot # echo 123 > buffer_size_kb # echo 1 > per_cpu/cpu0/snapshot WARNING: WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380 Modules linked in: CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 rib#20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380 Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48 RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093 RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8 RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005 RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000 R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060 FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0 Call Trace: ? trace_array_printk_buf+0x140/0x140 ? __mutex_lock_slowpath+0x10/0x10 tracing_snapshot_write+0x4c8/0x7f0 ? trace_printk_init_buffers+0x60/0x60 ? selinux_file_permission+0x3b/0x540 ? tracer_preempt_off+0x38/0x506 ? trace_printk_init_buffers+0x60/0x60 __vfs_write+0x81/0x100 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 ? __ia32_sys_read+0xb0/0xb0 ? do_syscall_64+0x1f/0x390 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe This patch adds resize_buffer_duplicate_size() to check if there is a difference between current/spare buffer sizes and resize a spare buffer if necessary. Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: ad909e2 ("tracing: Add internal tracing_snapshot() functions") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand OABUFFER_UDW workaround #20

Understand OABUFFER_UDW workaround #20

rib commented Mar 2, 2016

Understand OABUFFER_UDW workaround #20

Understand OABUFFER_UDW workaround #20

Comments

rib commented Mar 2, 2016