Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate orphan sections #61

Closed
ojeda opened this issue Dec 16, 2020 · 10 comments
Closed

Investigate orphan sections #61

ojeda opened this issue Dec 16, 2020 · 10 comments
Labels
• kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options...

Comments

@ojeda
Copy link
Member

ojeda commented Dec 16, 2020

Since synchronizing with v5.10 (#60) we are getting:

ld: warning: orphan section `.text.__rust_probestack' from `drivers/char/rust_example/rust_example.o' being placed in section `.text.__rust_probestack'
ld: warning: orphan section `.eh_frame' from `drivers/char/rust_example/rust_example.o' being placed in section `.eh_frame'

See e.g. https://github.com/Rust-for-Linux/linux/runs/1561327749

@ojeda ojeda added • kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options... - optional labels Dec 16, 2020
@ojeda
Copy link
Member Author

ojeda commented Dec 17, 2020

A guess is that some kernel structures changed in v5.10 and now, when using the bindings, we have crossed the stack size threshold in some function, thus calling the probe.

@adamrk
Copy link

adamrk commented Jan 17, 2021

I think this might be related to some actual errors I'm seeing:

ERROR: modpost: "__rust_probestack" [drivers/char/rust_example_4.ko] undefined!

(see the failed runs here: https://github.com/adamrk/linux/pull/1/checks?check_run_id=1711541791).
The symbol is defined in rust/complier_builtins.o and vmlinux:

$ objdump --syms  rust/vmlinux | rg probestack
ffffffff81801393 l    d  .text.__rust_probestack	0000000000000000 .text.__rust_probestack
ffffffff81801393 g     F .text.__rust_probestack	0000000000000039 .hidden __rust_probestack

Could this issue be the .hidden attribute? That seems to be explicitly set in the definition.

I'm happy to dig into this some more, but I'm kind of stuck at this point so any suggestions would be helpful.

@adamrk
Copy link

adamrk commented Jan 20, 2021

Looks like the modpost error isn't directly related. Opened #73 instead.

@adamrk
Copy link

adamrk commented Jan 20, 2021

Maybe the issue here is just that .text.__rust_probestack is a non-standard section? Making this change to vmlinux.lds.h takes care of the warning and seems to work fine, but it doesn't seem correct to add logic there just for this special case:

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..42d46683f188 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -590,7 +590,7 @@
 #define TEXT_TEXT                                                      \
                ALIGN_FUNCTION();                                       \
                *(.text.hot .text.hot.*)                                \
-               *(TEXT_MAIN .text.fixup)                                \
+               *(TEXT_MAIN .text.fixup .text.__rust_probestack)        \
                *(.text.unlikely .text.unlikely.*)                      \
                *(.text.unknown .text.unknown.*)                        \
                NOINSTR_TEXT                                            \

Looks like __rust_probstack is intentionally put in a separate section (source), but I'm not sure why. Maybe if we end up copying needed parts compiler_builtins into the tree (one of the options in #69) we can change it to store __rust_probestack in the regular .text section?

Or is there a way to add a separate linker script file that specifically handles this?

@bjorn3
Copy link
Member

bjorn3 commented Jan 20, 2021

The separate section was added as part of moving from a naked function to global_asm! in rust-lang/compiler-builtins#328 to add CFI information to help unwinding inside a debugger. The separate section matches how rustc compiles with -Zfunction-sections=yes by default.

Does Linux have it's own probestack function? You may be able to alias __rust_probestack to it and completely avoid linking to compiler-builtins. Instead using whatever Linux uses for providing gcc/LLVM compiler intrinsics. Alternatively you could wait for rust-lang/rust#80838 to land and switch to inline asm stack probes.

@ojeda
Copy link
Member Author

ojeda commented Jan 20, 2021

AFAICS both GCC and Clang generate the probing inline by default, so we should likely wait for rust-lang/rust#80838.

Having said that, I don't see stack clash protection used nor proposed for the kernel yet, in fact I see a related patch that required disabling it explicitly -- @nickdesaulniers @kees?

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Jan 20, 2021

Having said that, I don't see stack clash protection used nor proposed for the kernel yet, in fact I see a related patch that required disabling it explicitly

Maybe a userspace centric feature that can be disabled?

ld: warning: orphan section .text.__rust_probestack' from drivers/char/rust_example/rust_example.o' being placed in section `.text.__rust_probestack'
ERROR: modpost: "__rust_probestack" [drivers/char/rust_example_4.ko] undefined!

So looks like a missing function (symbol) AND missplaced section? Adding that section makes sense if __rust_probestack is required; not sure what that would look like in the kernel.

ld: warning: orphan section .eh_frame' from drivers/char/rust_example/rust_example.o' being placed in section `.eh_frame'

In C compilers, that's controlled by -fno-asynchronous-unwind-info which we generally use in the kernel, except for one unwinder that 32b ARM uses (by default). There should be a corresponding flag for rustc (or, if not, go make it!).

EDIT: sorry for closing/reopened. "Fat fingers"

@bjorn3
Copy link
Member

bjorn3 commented Jan 20, 2021

In C compilers, that's controlled by -fno-asynchronous-unwind-info which we generally use in the kernel, except for one unwinder that 32b ARM uses (by default). There should be a corresponding flag for rustc (or, if not, go make it!).

This unwind info is not generated by the compiler. It is part of the assembly that implements __rust_probestack. The compiler doesn't have any control over this.

So looks like a missing function (symbol) AND missplaced section? Adding that section makes sense if __rust_probestack is required; not sure what that would look like in the kernel.

Same applies here.

Maybe a userspace centric feature that can be disabled?

I believe the target spec has an option to disable stackprobes.

@adamrk
Copy link

adamrk commented Feb 12, 2021

I think this is resolved with the latest rustc version changes.

@ojeda
Copy link
Member Author

ojeda commented Feb 12, 2021

Yeah, when compiled with a high enough LLVM (11.0.1), we should now be getting inlined stack probes now since #85. The CI compiles with 11.1.0 at the moment.

However, the warnings went away before, since #80 was merged, since there was no more compiler_builtins.

@ojeda ojeda closed this as completed Feb 12, 2021
ojeda pushed a commit that referenced this issue Jul 29, 2021
The histogram logic was allowing events with char * pointers to be used as
normal strings. But it was easy to crash the kernel with:

 # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger

And open some files, and boom!

 BUG: unable to handle page fault for address: 00007f2ced0c3280
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:strlen+0x0/0x20
 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3

 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
 FS:  00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
 Call Trace:
  event_hist_trigger+0x463/0x5f0
  ? find_held_lock+0x32/0x90
  ? sched_clock_cpu+0xe/0xd0
  ? lock_release+0x155/0x440
  ? kernel_init_free_pages+0x6d/0x90
  ? preempt_count_sub+0x9b/0xd0
  ? kernel_init_free_pages+0x6d/0x90
  ? get_page_from_freelist+0x12c4/0x1680
  ? __rb_reserve_next+0xe5/0x460
  ? ring_buffer_lock_reserve+0x12a/0x3f0
  event_triggers_call+0x52/0xe0
  ftrace_syscall_enter+0x264/0x2c0
  syscall_trace_enter.constprop.0+0x1ee/0x210
  do_syscall_64+0x1c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Where it triggered a fault on strlen(key) where key was the filename.

The reason is that filename is a char * to user space, and the histogram
code just blindly dereferenced it, with obvious bad results.

I originally tried to use strncpy_from_user/kernel_nofault() but found
that there's other places that its dereferenced and not worth the effort.

Just do not allow "char *" to act like strings.

Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Tom Zanussi <zanussi@kernel.org>
Fixes: 79e577c ("tracing: Support string type key properly")
Fixes: 5967bd5 ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
ojeda pushed a commit that referenced this issue Jan 16, 2023
Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faa ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
• kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options...
Development

No branches or pull requests

4 participants