Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[syzkaller] INFO: task hung in lock_sock_nested #88

Closed
cpaasch opened this issue Sep 10, 2020 · 5 comments
Closed

[syzkaller] INFO: task hung in lock_sock_nested #88

cpaasch opened this issue Sep 10, 2020 · 5 comments

Comments

@cpaasch
Copy link
Member

cpaasch commented Sep 10, 2020

      Not tainted 5.9.0-rc3 #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.3  state:D stack:    0 pid:13279 ppid:  1314 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x616/0x1800 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0xfd/0x190 net/core/sock.c:2504
 lock_sock_nested+0x10f/0x140 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_setsockopt+0x50/0x690 net/mptcp/protocol.c:2249
 __sys_setsockopt+0x154/0x390 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f13081ab469
Code: Bad RIP value.
RSP: 002b:00007f1308838dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000068c0e0 RCX: 00007f13081ab469
RDX: 0000000000000019 RSI: 0000000000000006 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000004 R09: 0000000000000000
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041ec0a R14: 00007f13088395c0 R15: 0000000000000003
INFO: task syz-executor.3:13283 blocked for more than 143 seconds.
      Not tainted 5.9.0-rc3 #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.3  state:D stack:    0 pid:13283 ppid:  1314 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x616/0x1800 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0xfd/0x190 net/core/sock.c:2504
 lock_sock_nested+0x10f/0x140 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_sendmsg+0xf7/0x18c0 net/mptcp/protocol.c:1183
 inet_sendmsg+0x115/0x140 net/ipv4/af_inet.c:820
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg net/socket.c:671 [inline]
 ____sys_sendmsg+0x754/0x8e0 net/socket.c:2353
 ___sys_sendmsg+0xff/0x170 net/socket.c:2407
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f13081ab469
Code: Bad RIP value.
RSP: 002b:00007f1308817dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000068c180 RCX: 00007f13081ab469
RDX: 0000000000000000 RSI: 0000000020001480 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041e048 R14: 00007f13088185c0 R15: 0000000000000003

HEAD is:

fdc664ab314a ("mptcp: call tcp_cleanup_rbuf on subflows") (23 hours ago)
35d6b46 ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20200909T050747, mptcp_net-next/export) (34 hours ago)
6e7290b ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (34 hours ago)
8fd4ed4 ("tcp: propagate MPTCP skb extensions on xmit splits") (34 hours ago)
691048c ("mptcp: use _fast lock version in __mptcp_move_skbs") (34 hours ago)
ef14371 ("mptcp: adjust mptcp receive buffer limit if subflow has larger one") (34 hours ago)
74acdfd ("mptcp: simult flow self-tests") (34 hours ago)
a1975d0 ("mptcp: allow picking different xmit subflows") (34 hours ago)
6855e98 ("mptcp: allow creating non-backup subflows") (34 hours ago)
2c61627 ("mptcp: move address attribute into mptcp_addr_info") (34 hours ago)
b347db3 ("mptcp: add OoO related mibs") (34 hours ago)
91baf67 ("mptcp: cleanup mptcp_subflow_discard_data()") (34 hours ago)
685dab9 ("mptcp: move ooo skbs into msk out of order queue.") (34 hours ago)
c97925c ("mptcp: introduce and use mptcp_try_coalesce()") (34 hours ago)
904204e ("mptcp: basic sndbuf autotuning") (34 hours ago)
5d28362 ("mptcp: trigger msk processing even for OoO data") (34 hours ago)
227d369 ("mptcp: set data_ready status bit in subflow_check_data_avail()") (34 hours ago)
6f2c3af ("mptcp: rethink 'is writable' conditional") (34 hours ago)
ff4e207 ("mptcp: add accept_subflow re-check") (34 hours ago)
1cde79a ("selftests: mptcp: add ADD_ADDR mibs check function") (34 hours ago)
70671b7 ("mptcp: add ADD_ADDR related mibs") (34 hours ago)
6d37e97 ("mptcp: send out ADD_ADDR with echo flag") (34 hours ago)
a20a96f ("mptcp: add the incoming RM_ADDR support") (34 hours ago)
7103b2d ("mptcp: add the outgoing RM_ADDR support") (34 hours ago)
328bd7f ("mptcp: rename addr_signal and the related functions") (34 hours ago)
e8d556d ("selftests/mptcp: Better delay & reordering configuration") (34 hours ago)
38119e7 ("bpf:selftests: add bpf_mptcp_sock() verifier tests") (34 hours ago)
43b5943 ("bpf:selftests: add MPTCP test base") (34 hours ago)
c3de50c ("bpf: add 'bpf_mptcp_sock' structure and helper") (34 hours ago)
31e2af1 ("mptcp: attach subflow socket to parent cgroup") (34 hours ago)
36440f0 ("bpf: expose is_mptcp flag to bpf_tcp_sock") (34 hours ago)
f5499c6 ("nfc: pn533/usb.c: fix spelling of "functions"") (mptcp_net-next/net-next) (2 days ago)

No reproducer, I will schedule a syzkaller run with lockdep enabled.

@cpaasch
Copy link
Member Author

cpaasch commented Sep 10, 2020

More info with lockdep:

Process accounting resumed
INFO: task syz-executor.6:10245 blocked for more than 143 seconds.
      Not tainted 5.9.0-rc3 #23
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.6  state:D stack:    0 pid:10245 ppid:  1298 flags:0x00000004
Call Trace:
 context_switch kernel/sched/core.c:3778 [inline]
 __schedule+0x6f7/0x1d30 kernel/sched/core.c:4527
 schedule+0xcd/0x2a0 kernel/sched/core.c:4602
 __lock_sock+0x13d/0x260 net/core/sock.c:2504
 lock_sock_nested+0xf6/0x120 net/core/sock.c:3036
 lock_sock include/net/sock.h:1583 [inline]
 mptcp_recvmsg+0xe8/0x1710 net/mptcp/protocol.c:1513
 inet_recvmsg+0x4ee/0x660 net/ipv4/af_inet.c:851
 sock_recvmsg_nosec net/socket.c:885 [inline]
 sock_recvmsg_nosec net/socket.c:882 [inline]
 sock_recvmsg net/socket.c:903 [inline]
 ____sys_recvmsg+0x4ed/0x5c0 net/socket.c:2576
 ___sys_recvmsg+0xe4/0x150 net/socket.c:2618
 do_recvmmsg+0x24c/0x730 net/socket.c:2718
 __sys_recvmmsg+0x23e/0x250 net/socket.c:2797
 __do_sys_recvmmsg net/socket.c:2820 [inline]
 __se_sys_recvmmsg net/socket.c:2813 [inline]
 __x64_sys_recvmmsg+0xde/0x130 net/socket.c:2813
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fbdf9e27469
Code: Bad RIP value.
RSP: 002b:00007fbdfa4d5dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 000000000068c040 RCX: 00007fbdf9e27469
RDX: 0000000000000002 RSI: 0000000020004ac0 RDI: 0000000000000006
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041c976 R14: 00007fbdfa4d65c0 R15: 0000000000000003

Showing all locks held in the system:
1 lock held by khungtaskd/258:
 #0: ffffffff838d5520 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x264 kernel/locking/lockdep.c:5829
1 lock held by in:imklog/1056:
2 locks held by agetty/1093:
 #0: ffff888111870098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc900002d32e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x220/0x1b20 drivers/tty/n_tty.c:2156
1 lock held by syz-executor.0/12446:
 #0: ffff888113a0e068 (&pipe->mutex/1){+.+.}-{3:3}, at: pipe_lock_nested fs/pipe.c:66 [inline]
 #0: ffff888113a0e068 (&pipe->mutex/1){+.+.}-{3:3}, at: pipe_lock+0x63/0x80 fs/pipe.c:74
2 locks held by agetty/9303:
 #0: ffff88810fcdc098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc9000010b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x220/0x1b20 drivers/tty/n_tty.c:2156
2 locks held by syz-executor.6/10236:

@pabeni
Copy link

pabeni commented Sep 11, 2020

I think/hope this a duplicate of issue#83: due the latter issue a writer is stuck in sendmsg() and all others locking operations blocks forever

@matttbe
Copy link
Member

matttbe commented Sep 11, 2020

(From Paolo: maybe linked to #83 → a comment just to mark it like that in Github :) )

@pabeni
Copy link

pabeni commented Sep 17, 2020

@cpaasch : I hope this one is resolved by:

  • e09eaf2: "squashed" in "mptcp: allow picking different xmit subflows"
  • 895b3d8..341c313: result
    have you ever reproduced it in this week? Otherwise I think we can close it

@cpaasch
Copy link
Member Author

cpaasch commented Sep 21, 2020

Yes, no more happening! Closing...

@cpaasch cpaasch closed this as completed Sep 21, 2020
matttbe pushed a commit that referenced this issue Nov 2, 2021
The perf_buffer fails on system with offline cpus:

  # test_progs -t perf_buffer
  serial_test_perf_buffer:PASS:nr_cpus 0 nsec
  serial_test_perf_buffer:PASS:nr_on_cpus 0 nsec
  serial_test_perf_buffer:PASS:skel_load 0 nsec
  serial_test_perf_buffer:PASS:attach_kprobe 0 nsec
  serial_test_perf_buffer:PASS:perf_buf__new 0 nsec
  serial_test_perf_buffer:PASS:epoll_fd 0 nsec
  skipping offline CPU #4
  serial_test_perf_buffer:PASS:perf_buffer__poll 0 nsec
  serial_test_perf_buffer:PASS:seen_cpu_cnt 0 nsec
  serial_test_perf_buffer:PASS:buf_cnt 0 nsec
  ...
  serial_test_perf_buffer:PASS:fd_check 0 nsec
  serial_test_perf_buffer:PASS:drain_buf 0 nsec
  serial_test_perf_buffer:PASS:consume_buf 0 nsec
  serial_test_perf_buffer:FAIL:cpu_seen cpu 5 not seen
  #88 perf_buffer:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

If the offline cpu is from the middle of the possible set,
we get mismatch with possible and online cpu buffers.

The perf buffer test calls perf_buffer__consume_buffer for
all 'possible' cpus, but the library holds only 'online'
cpu buffers and perf_buffer__consume_buffer returns them
based on index.

Adding extra (online) index to keep track of online buffers,
we need the original (possible) index to trigger trace on
proper cpu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211021114132.8196-3-jolsa@kernel.org
jenkins-tessares pushed a commit that referenced this issue Dec 3, 2021
I got issue as follows:
[ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
[  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  594.364987] Modules linked in:
[  594.365405] irq event stamp: 604180238
[  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
[  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
[  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
[  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
[  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
[  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  594.373604] Workqueue: events_unbound io_ring_exit_work
[  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
[  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
[  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
[  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
[  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
[  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
[  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
[  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
[  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
[  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
[  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  594.387403] Call Trace:
[  594.387738]  <TASK>
[  594.388042]  find_and_remove_object+0x118/0x160
[  594.389321]  delete_object_full+0xc/0x20
[  594.389852]  kfree+0x193/0x470
[  594.390275]  __io_remove_buffers.part.0+0xed/0x147
[  594.390931]  io_ring_ctx_free+0x342/0x6a2
[  594.392159]  io_ring_exit_work+0x41e/0x486
[  594.396419]  process_one_work+0x906/0x15a0
[  594.399185]  worker_thread+0x8b/0xd80
[  594.400259]  kthread+0x3bf/0x4a0
[  594.401847]  ret_from_fork+0x22/0x30
[  594.402343]  </TASK>

Message from syslogd@localhost at Nov 13 09:09:54 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680

We can reproduce this issue by follow syzkaller log:
r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)

The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add  schedule point we do regression, get follow data.
[  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...

Fixes:8bab4c09f24e("io_uring: allow conditional reschedule for intensive iterators")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
matttbe pushed a commit that referenced this issue Dec 11, 2021
Branch data available to BPF programs can be very useful to get stack traces
out of userspace application.

Commit fff7b64 ("bpf: Add bpf_read_branch_records() helper") added BPF
support to capture branch records in x86. Enable this feature also for other
architectures as well by removing checks specific to x86.

If an architecture doesn't support branch records, bpf_read_branch_records()
still has appropriate checks and it will return an -EINVAL in that scenario.
Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures
should return -ENOENT in such case. Hence, update the appropriate check to
return -ENOENT instead.

Selftest 'perf_branches' result on power9 machine which has the branch stacks
support:

 - Before this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:FAIL
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:FAIL
  Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED

 - After this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:OK
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:OK
  Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Selftest 'perf_branches' result on power9 machine which doesn't have branch
stack report:

 - After this patch:

  [command]# ./test_progs -t perf_branches
   #88/1 perf_branches/perf_branches_hw:SKIP
   #88/2 perf_branches/perf_branches_no_hw:OK
   #88 perf_branches:OK
  Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED

Fixes: fff7b64 ("bpf: Add bpf_read_branch_records() helper")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com
jenkins-tessares pushed a commit that referenced this issue Apr 9, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number.  Therefore, this patch tries to
convert the offsets.

The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.

FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.

Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.

For example, for the following bpftrace command:

  bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'

Without this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   mov     x25, sp
  1c:   mov     x26, #0x0                       // #0
  20:   bti     j
  24:   sub     sp, sp, #0x90
  28:   add     x19, x0, #0x0
  2c:   mov     x0, #0x0                        // #0
  30:   mov     x10, #0xffffffffffffff78        // #-136
  34:   str     x0, [x25, x10]
  38:   mov     x10, #0xffffffffffffff80        // #-128
  3c:   str     x0, [x25, x10]
  40:   mov     x10, #0xffffffffffffff88        // #-120
  44:   str     x0, [x25, x10]
  48:   mov     x10, #0xffffffffffffff90        // #-112
  4c:   str     x0, [x25, x10]
  50:   mov     x10, #0xffffffffffffff98        // #-104
  54:   str     x0, [x25, x10]
  58:   mov     x10, #0xffffffffffffffa0        // #-96
  5c:   str     x0, [x25, x10]
  60:   mov     x10, #0xffffffffffffffa8        // #-88
  64:   str     x0, [x25, x10]
  68:   mov     x10, #0xffffffffffffffb0        // #-80
  6c:   str     x0, [x25, x10]
  70:   mov     x10, #0xffffffffffffffb8        // #-72
  74:   str     x0, [x25, x10]
  78:   mov     x10, #0xffffffffffffffc0        // #-64
  7c:   str     x0, [x25, x10]
  80:   mov     x10, #0xffffffffffffffc8        // #-56
  84:   str     x0, [x25, x10]
  88:   mov     x10, #0xffffffffffffffd0        // #-48
  8c:   str     x0, [x25, x10]
  90:   mov     x10, #0xffffffffffffffd8        // #-40
  94:   str     x0, [x25, x10]
  98:   mov     x10, #0xffffffffffffffe0        // #-32
  9c:   str     x0, [x25, x10]
  a0:   mov     x10, #0xffffffffffffffe8        // #-24
  a4:   str     x0, [x25, x10]
  a8:   mov     x10, #0xfffffffffffffff0        // #-16
  ac:   str     x0, [x25, x10]
  b0:   mov     x10, #0xfffffffffffffff8        // #-8
  b4:   str     x0, [x25, x10]
  b8:   mov     x10, #0x8                       // #8
  bc:   ldr     x2, [x19, x10]
  [...]

With this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   stp     x27, x28, [sp, #-16]!
  1c:   mov     x25, sp
  20:   sub     x27, x25, #0x88
  24:   mov     x26, #0x0                       // #0
  28:   bti     j
  2c:   sub     sp, sp, #0x90
  30:   add     x19, x0, #0x0
  34:   mov     x0, #0x0                        // #0
  38:   str     x0, [x27]
  3c:   str     x0, [x27, #8]
  40:   str     x0, [x27, #16]
  44:   str     x0, [x27, #24]
  48:   str     x0, [x27, #32]
  4c:   str     x0, [x27, #40]
  50:   str     x0, [x27, #48]
  54:   str     x0, [x27, #56]
  58:   str     x0, [x27, #64]
  5c:   str     x0, [x27, #72]
  60:   str     x0, [x27, #80]
  64:   str     x0, [x27, #88]
  68:   str     x0, [x27, #96]
  6c:   str     x0, [x27, #104]
  70:   str     x0, [x27, #112]
  74:   str     x0, [x27, #120]
  78:   str     x0, [x27, #128]
  7c:   ldr     x2, [x19, #8]
  [...]

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com
matttbe pushed a commit that referenced this issue Apr 7, 2023
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:

  Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
  BUG: Kernel NULL pointer dereference on read at 0x00000160
  Faulting instruction address: 0xc0000000000c3a60
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
  Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
  CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  NIP:  c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
  REGS: c0000000041833b0 TRAP: 0300   Not tainted  (6.3.0-rc2+)
  MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88082828  XER: 200400f8
  ...
  NIP memcpy_power7+0x200/0x7d0
  LR  ppr_get+0x64/0xb0
  Call Trace:
    ppr_get+0x40/0xb0 (unreliable)
    __regset_get+0x180/0x1f0
    regset_get_alloc+0x64/0x90
    elf_core_dump+0xb98/0x1b60
    do_coredump+0x1c34/0x24a0
    get_signal+0x71c/0x1410
    do_notify_resume+0x140/0x6f0
    interrupt_exit_user_prepare_main+0x29c/0x320
    interrupt_exit_user_prepare+0x6c/0xa0
    interrupt_return_srr_user+0x8/0x138

Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.

Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.

Fixes: fa43981 ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants