[syzkaller] INFO: task hung in lock_sock_nested #88

cpaasch · 2020-09-10T15:26:01Z

      Not tainted 5.9.0-rc3 #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.3  state:D stack:    0 pid:13279 ppid:  1314 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x616/0x1800 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0xfd/0x190 net/core/sock.c:2504
 lock_sock_nested+0x10f/0x140 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_setsockopt+0x50/0x690 net/mptcp/protocol.c:2249
 __sys_setsockopt+0x154/0x390 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f13081ab469
Code: Bad RIP value.
RSP: 002b:00007f1308838dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000068c0e0 RCX: 00007f13081ab469
RDX: 0000000000000019 RSI: 0000000000000006 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000004 R09: 0000000000000000
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041ec0a R14: 00007f13088395c0 R15: 0000000000000003
INFO: task syz-executor.3:13283 blocked for more than 143 seconds.
      Not tainted 5.9.0-rc3 #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.3  state:D stack:    0 pid:13283 ppid:  1314 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x616/0x1800 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0xfd/0x190 net/core/sock.c:2504
 lock_sock_nested+0x10f/0x140 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_sendmsg+0xf7/0x18c0 net/mptcp/protocol.c:1183
 inet_sendmsg+0x115/0x140 net/ipv4/af_inet.c:820
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 ____sys_sendmsg+0x754/0x8e0 net/socket.c:2353
 ___sys_sendmsg+0xff/0x170 net/socket.c:2407
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f13081ab469
Code: Bad RIP value.
RSP: 002b:00007f1308817dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000068c180 RCX: 00007f13081ab469
RDX: 0000000000000000 RSI: 0000000020001480 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041e048 R14: 00007f13088185c0 R15: 0000000000000003

HEAD is:

fdc664ab314a ("mptcp: call tcp_cleanup_rbuf on subflows") (23 hours ago)
35d6b46 ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20200909T050747, mptcp_net-next/export) (34 hours ago)
6e7290b ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (34 hours ago)
8fd4ed4 ("tcp: propagate MPTCP skb extensions on xmit splits") (34 hours ago)
691048c ("mptcp: use _fast lock version in __mptcp_move_skbs") (34 hours ago)
ef14371 ("mptcp: adjust mptcp receive buffer limit if subflow has larger one") (34 hours ago)
74acdfd ("mptcp: simult flow self-tests") (34 hours ago)
a1975d0 ("mptcp: allow picking different xmit subflows") (34 hours ago)
6855e98 ("mptcp: allow creating non-backup subflows") (34 hours ago)
2c61627 ("mptcp: move address attribute into mptcp_addr_info") (34 hours ago)
b347db3 ("mptcp: add OoO related mibs") (34 hours ago)
91baf67 ("mptcp: cleanup mptcp_subflow_discard_data()") (34 hours ago)
685dab9 ("mptcp: move ooo skbs into msk out of order queue.") (34 hours ago)
c97925c ("mptcp: introduce and use mptcp_try_coalesce()") (34 hours ago)
904204e ("mptcp: basic sndbuf autotuning") (34 hours ago)
5d28362 ("mptcp: trigger msk processing even for OoO data") (34 hours ago)
227d369 ("mptcp: set data_ready status bit in subflow_check_data_avail()") (34 hours ago)
6f2c3af ("mptcp: rethink 'is writable' conditional") (34 hours ago)
ff4e207 ("mptcp: add accept_subflow re-check") (34 hours ago)
1cde79a ("selftests: mptcp: add ADD_ADDR mibs check function") (34 hours ago)
70671b7 ("mptcp: add ADD_ADDR related mibs") (34 hours ago)
6d37e97 ("mptcp: send out ADD_ADDR with echo flag") (34 hours ago)
a20a96f ("mptcp: add the incoming RM_ADDR support") (34 hours ago)
7103b2d ("mptcp: add the outgoing RM_ADDR support") (34 hours ago)
328bd7f ("mptcp: rename addr_signal and the related functions") (34 hours ago)
e8d556d ("selftests/mptcp: Better delay & reordering configuration") (34 hours ago)
38119e7 ("bpf:selftests: add bpf_mptcp_sock() verifier tests") (34 hours ago)
43b5943 ("bpf:selftests: add MPTCP test base") (34 hours ago)
c3de50c ("bpf: add 'bpf_mptcp_sock' structure and helper") (34 hours ago)
31e2af1 ("mptcp: attach subflow socket to parent cgroup") (34 hours ago)
36440f0 ("bpf: expose is_mptcp flag to bpf_tcp_sock") (34 hours ago)
f5499c6 ("nfc: pn533/usb.c: fix spelling of "functions"") (mptcp_net-next/net-next) (2 days ago)

No reproducer, I will schedule a syzkaller run with lockdep enabled.

The text was updated successfully, but these errors were encountered:

cpaasch · 2020-09-10T21:29:15Z

More info with lockdep:

Process accounting resumed
INFO: task syz-executor.6:10245 blocked for more than 143 seconds.
      Not tainted 5.9.0-rc3 #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.6  state:D stack:    0 pid:10245 ppid:  1298 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x6f7/0x1d30 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0x13d/0x260 net/core/sock.c:2504
 lock_sock_nested+0xf6/0x120 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_recvmsg+0xe8/0x1710 net/mptcp/protocol.c:1513
 inet_recvmsg+0x4ee/0x660 net/ipv4/af_inet.c:851
 sock_recvmsg_nosec net/socket.c:885 [inline]
 sock_recvmsg_nosec net/socket.c:882 [inline]
 sock_recvmsg net/socket.c:903 [inline]
 ____sys_recvmsg+0x4ed/0x5c0 net/socket.c:2576
 ___sys_recvmsg+0xe4/0x150 net/socket.c:2618
 do_recvmmsg+0x24c/0x730 net/socket.c:2718
 __sys_recvmmsg+0x23e/0x250 net/socket.c:2797
 __do_sys_recvmmsg net/socket.c:2820 [inline]
 __se_sys_recvmmsg net/socket.c:2813 [inline]
 __x64_sys_recvmmsg+0xde/0x130 net/socket.c:2813
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fbdf9e27469
Code: Bad RIP value.
RSP: 002b:00007fbdfa4d5dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 000000000068c040 RCX: 00007fbdf9e27469
RDX: 0000000000000002 RSI: 0000000020004ac0 RDI: 0000000000000006
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041c976 R14: 00007fbdfa4d65c0 R15: 0000000000000003

Showing all locks held in the system:
1 lock held by khungtaskd/258:
 #0: ffffffff838d5520 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x264 kernel/locking/lockdep.c:5829
1 lock held by in:imklog/1056:
2 locks held by agetty/1093:
 #0: ffff888111870098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc900002d32e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x220/0x1b20 drivers/tty/n_tty.c:2156
1 lock held by syz-executor.0/12446:
 #0: ffff888113a0e068 (&pipe->mutex/1){+.+.}-{3:3}, at: pipe_lock_nested fs/pipe.c:66 [inline]
 #0: ffff888113a0e068 (&pipe->mutex/1){+.+.}-{3:3}, at: pipe_lock+0x63/0x80 fs/pipe.c:74
2 locks held by agetty/9303:
 #0: ffff88810fcdc098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc9000010b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x220/0x1b20 drivers/tty/n_tty.c:2156
2 locks held by syz-executor.6/10236:

pabeni · 2020-09-11T09:34:03Z

I think/hope this a duplicate of issue#83: due the latter issue a writer is stuck in sendmsg() and all others locking operations blocks forever

matttbe · 2020-09-11T09:44:13Z

(From Paolo: maybe linked to #83 → a comment just to mark it like that in Github :) )

pabeni · 2020-09-17T20:56:59Z

@cpaasch : I hope this one is resolved by:

e09eaf2: "squashed" in "mptcp: allow picking different xmit subflows"
895b3d8..341c313: result
have you ever reproduced it in this week? Otherwise I think we can close it

cpaasch · 2020-09-21T14:29:13Z

Yes, no more happening! Closing...

The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer serial_test_perf_buffer:PASS:nr_cpus 0 nsec serial_test_perf_buffer:PASS:nr_on_cpus 0 nsec serial_test_perf_buffer:PASS:skel_load 0 nsec serial_test_perf_buffer:PASS:attach_kprobe 0 nsec serial_test_perf_buffer:PASS:perf_buf__new 0 nsec serial_test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #4 serial_test_perf_buffer:PASS:perf_buffer__poll 0 nsec serial_test_perf_buffer:PASS:seen_cpu_cnt 0 nsec serial_test_perf_buffer:PASS:buf_cnt 0 nsec ... serial_test_perf_buffer:PASS:fd_check 0 nsec serial_test_perf_buffer:PASS:drain_buf 0 nsec serial_test_perf_buffer:PASS:consume_buf 0 nsec serial_test_perf_buffer:FAIL:cpu_seen cpu 5 not seen #88 perf_buffer:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED If the offline cpu is from the middle of the possible set, we get mismatch with possible and online cpu buffers. The perf buffer test calls perf_buffer__consume_buffer for all 'possible' cpus, but the library holds only 'online' cpu buffers and perf_buffer__consume_buffer returns them based on index. Adding extra (online) index to keep track of online buffers, we need the original (possible) index to trigger trace on proper cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-3-jolsa@kernel.org

I got issue as follows: [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 [ 594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 594.364987] Modules linked in: [ 594.365405] irq event stamp: 604180238 [ 594.365906] hardirqs last enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50 [ 594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0 [ 594.368420] softirqs last enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e [ 594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250 [ 594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G L 5.15.0-next-20211112+ #88 [ 594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 594.373604] Workqueue: events_unbound io_ring_exit_work [ 594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50 [ 594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474 [ 594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202 [ 594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106 [ 594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001 [ 594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab [ 594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0 [ 594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000 [ 594.382787] FS: 0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000 [ 594.383851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0 [ 594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 594.387403] Call Trace: [ 594.387738] <TASK> [ 594.388042] find_and_remove_object+0x118/0x160 [ 594.389321] delete_object_full+0xc/0x20 [ 594.389852] kfree+0x193/0x470 [ 594.390275] __io_remove_buffers.part.0+0xed/0x147 [ 594.390931] io_ring_ctx_free+0x342/0x6a2 [ 594.392159] io_ring_exit_work+0x41e/0x486 [ 594.396419] process_one_work+0x906/0x15a0 [ 594.399185] worker_thread+0x8b/0xd80 [ 594.400259] kthread+0x3bf/0x4a0 [ 594.401847] ret_from_fork+0x22/0x30 [ 594.402343] </TASK> Message from syslogd@localhost at Nov 13 09:09:54 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 We can reproduce this issue by follow syzkaller log: r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0) sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0) syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0) io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0) The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead to soft lockup. To solve this issue, we need add schedule point when do while loop in '__io_remove_buffers'. After add schedule point we do regression, get follow data. [ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 [ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00 [ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 ... Fixes:8bab4c09f24e("io_uring: allow conditional reschedule for intensive iterators") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>

Branch data available to BPF programs can be very useful to get stack traces out of userspace application. Commit fff7b64 ("bpf: Add bpf_read_branch_records() helper") added BPF support to capture branch records in x86. Enable this feature also for other architectures as well by removing checks specific to x86. If an architecture doesn't support branch records, bpf_read_branch_records() still has appropriate checks and it will return an -EINVAL in that scenario. Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures should return -ENOENT in such case. Hence, update the appropriate check to return -ENOENT instead. Selftest 'perf_branches' result on power9 machine which has the branch stacks support: - Before this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:FAIL #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:FAIL Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:OK #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Selftest 'perf_branches' result on power9 machine which doesn't have branch stack report: - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:SKIP #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED Fixes: fff7b64 ("bpf: Add bpf_read_branch_records() helper") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com

The BPF STX/LDX instruction uses offset relative to the FP to address stack space. Since the BPF_FP locates at the top of the frame, the offset is usually a negative number. However, arm64 str/ldr immediate instruction requires that offset be a positive number. Therefore, this patch tries to convert the offsets. The method is to find the negative offset furthest from the FP firstly. Then add it to the FP, calculate a bottom position, called FPB, and then adjust the offsets in other STR/LDX instructions relative to FPB. FPB is saved using the callee-saved register x27 of arm64 which is not used yet. Before adjusting the offset, the patch checks every instruction to ensure that the FP does not change in run-time. If the FP may change, no offset is adjusted. For example, for the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // #8 bc: ldr x2, [x19, x10] [...] With this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, #8] 40: str x0, [x27, #16] 44: str x0, [x27, #24] 48: str x0, [x27, #32] 4c: str x0, [x27, #40] 50: str x0, [x27, #48] 54: str x0, [x27, #56] 58: str x0, [x27, #64] 5c: str x0, [x27, #72] 60: str x0, [x27, #80] 64: str x0, [x27, #88] 68: str x0, [x27, #96] 6c: str x0, [x27, #104] 70: str x0, [x27, #112] 74: str x0, [x27, #120] 78: str x0, [x27, #128] 7c: ldr x2, [x19, #8] [...] Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com

powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which from my (arguably very short) checking is not commonly done for other archs. This is fine, except when PF_IO_WORKER's have been created and the task does something that causes a coredump to be generated. Then we get this crash: Kernel attempted to read user page (160) - exploit attempt? (uid: 1000) BUG: Kernel NULL pointer dereference on read at 0x00000160 Faulting instruction address: 0xc0000000000c3a60 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0 REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8 ... NIP memcpy_power7+0x200/0x7d0 LR ppr_get+0x64/0xb0 Call Trace: ppr_get+0x40/0xb0 (unreliable) __regset_get+0x180/0x1f0 regset_get_alloc+0x64/0x90 elf_core_dump+0xb98/0x1b60 do_coredump+0x1c34/0x24a0 get_signal+0x71c/0x1410 do_notify_resume+0x140/0x6f0 interrupt_exit_user_prepare_main+0x29c/0x320 interrupt_exit_user_prepare+0x6c/0xa0 interrupt_return_srr_user+0x8/0x138 Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL pt_regs. Check for a valid pt_regs in both ppc_get/ppr_set, and return an error if not set. The actual error value doesn't seem to be important here, so just pick -EINVAL. Fixes: fa43981 ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Jens Axboe <axboe@kernel.dk> [mpe: Trim oops in change log, add Fixes & Cc stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk

cpaasch closed this as completed Sep 21, 2020

matttbe added bug syzkaller labels Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[syzkaller] INFO: task hung in lock_sock_nested #88

[syzkaller] INFO: task hung in lock_sock_nested #88

cpaasch commented Sep 10, 2020

cpaasch commented Sep 10, 2020

pabeni commented Sep 11, 2020

matttbe commented Sep 11, 2020

pabeni commented Sep 17, 2020

cpaasch commented Sep 21, 2020

[syzkaller] INFO: task hung in lock_sock_nested #88

[syzkaller] INFO: task hung in lock_sock_nested #88

Comments

cpaasch commented Sep 10, 2020

cpaasch commented Sep 10, 2020

pabeni commented Sep 11, 2020

matttbe commented Sep 11, 2020

pabeni commented Sep 17, 2020

cpaasch commented Sep 21, 2020