[Bug]: Clone behavior is different when scoped #1170

michalbiesek · 2022-10-26T16:00:06Z

Steps To Reproduce

With following code:

// gcc -g test/manual/clonetest.c -o clonetest
// Different behavior in case of:
// sudo ./clonetest
// sudo ldscope -- ./clonetest

#define _GNU_SOURCE
#include <linux/limits.h>
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

static char child_stack[5000];

int child_fn(void* arg) {
  for (int i = 0 ; i < 5; ++i) {
      printf(" child_fn begin pid = %d tid = %d iteration = %d\n", getpid(), gettid(), i);
      sleep(1);
  }
  printf(" child_fn after iteration pid = %d tid = %d\n", getpid(), gettid());
  return 0;
}

int main() {
  int status;
  printf("main started pid = %d tid = %d\n", getpid(), gettid());
  pid_t retPid = clone(child_fn, child_stack+5000, CLONE_VFORK, NULL);
  printf("main after clone parent pid = %d tid = %d retPid = %d errno = %d\n", getpid(), gettid(), retPid, errno);

  waitpid(retPid, &status, 0);
  printf("main after after wait pid = %d tid = %d\n", getpid(), gettid());
  return 0;
}

We can observe a difference in run unscoped:

sudo ./clonetest 
main started pid = 2326563 tid = 2326563
 child_fn begin pid = 2326564 tid = 2326564 iteration = 0
 child_fn begin pid = 2326564 tid = 2326564 iteration = 1
 child_fn begin pid = 2326564 tid = 2326564 iteration = 2
 child_fn begin pid = 2326564 tid = 2326564 iteration = 3
 child_fn begin pid = 2326564 tid = 2326564 iteration = 4
 child_fn after iteration pid = 2326564 tid = 2326564
main after clone parent pid = 2326563 tid = 2326563 retPid = 2326564 errno = 0
main after after wait pid = 2326563 tid = 2326563

and scoped:

sudo ldscope -- ./clonetest 
main started pid = 2326822 tid = 2326822
 child_fn begin pid = 2326823 tid = 2326823 iteration = 0
main after clone parent pid = 2326822 tid = 2326822 retPid = 2326823 errno = 0
main after after wait pid = 2326822 tid = 2326822

Environment

No response

Requested priority

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

jrcheli · 2023-03-20T14:50:33Z

By running dmesg after running the above test, we saw a segfault like this:

[2929580.138031] clonetest[31577]: segfault at 5596cbd387a0 ip 00007fbff0003cbc sp 00005596cbd38768 error 6 in libscope.so[7fbfefeb2000+874000]

When we dug into this (by using objdump to see what was at this offset in libscope.so), it pointed to where we were calling a pcre2 function. When libscope.so see writes of console data, we call pcre2 which implements a regex filter. This filter allows users to define a whitelist of strings they want us to see - aka SCOPE_EVENT_CONSOLE_VALUE.

We make this call to pcre2 on the stack of the client (here, the child process's) stack. From our previous experience (happens to be in go, which uses very small stack) we know that pcre2 has the potential to use a lot of stack space.
The segfault we see above is due to our pcre2 code for SCOPE_EVENT_CONSOLE_VALUE overflowing the stack. This can be confirmed by changing CHILD_STACK_BYTES from 5000 to 50000.

The other way to confirm that the console write was a causing of this problem... we did not see this problem (did not blow the stack) if we commented out this code in initHook(). This code allows libscope.so to see the console writes in the first place.

            if (g_fn.__write_libc) {
                rc = funchook_prepare(funchook, (void**)&g_fn.__write_libc, __write_libc);
            }

To prevent this from happening, we need to stop using the client stack when pcre2 is involved. We've done this in our go code, so we need to apply a similar pattern here too (even when the app we're in was not written in go).

jrcheli · 2023-03-20T18:16:10Z

We already have two existing wrappers that do the required stack switching that is needed.

pcre2_match_wrapper() in src/com.c
regexec_wrapper() in src/com.c

W.r.t. completeness of our solution, we need to make sure that we use these _wrapper() functions everywhere. We don't have anything in contrib that depends on the pcre2 library, so there is nothing in the contrib directory that would need to use these wrapper functions. That only leaves the source code we've written for libscope.so. These searches demonstrate that we already use the required wrapper functions everywhere:
find src/ -type f | xargs grep pcre2_match | grep -v pcre2_match_ shows that the only place we call pcre2_match() is from pcre2_match_wrapper() in src/com.c
find src/ -type f | xargs grep regexec | grep -v regexec_ shows that the only place we call regexec() is from regexec_wrapper() in src/com.c

The only reason we have this problem is that we had clauses at the top of our wrapper functions which effectively made it so the stack was only switched for go applications. The final solution is to just remove these clauses from our wrapper functions so non-go applications will change to use our own stack just like the go applications have been doing.

…ttid()

Needed when we run pcre2 on threads we don't own.

jrcheli · 2023-03-24T18:58:26Z

Before the last commit, the splunk integration test had very poor performance results. By very poor I mean the overhead of unscoped vs scoped went from <10% to more than 100%. This was traced back to our memory allocator, which was being used by our stack switching code. Every malloc of a 32k stack was resulting in a mmap, every free of 32k stack was resulting in an munmap. These are both syscalls, resulting in context switches which were killing our performance. The last commit adds a pool of stacks that can be reused to avoid the context switching from the mmap/munmap syscalls. With this, the performance measured by the splunk integration test was restored to the level before we added this stack switching.

Another thing of note is that we noticed that the pcre2 code was being called many (>700) times per http request/response. I've noted this in #968, for further investigation.

…ently

jrcheli · 2023-04-04T16:37:29Z

Wow. This issue is one that keeps on giving. The sub-issues:

Blowing the stack of non-go apps, solved by stack switching as was being done in go. (above)
Performance of our memory allocator (mmaping and unmmaping), solved by a "stack pool". (above)
Now one more "signals on ARM while in c code" (described here).

On aarch64 machines, go processes crash intermittently with a "bad g". This has only been seen on aarch64. This was uncovered in go integration tests (go_20) in test cases for "signalHandlerStatic" and "signalHandlerStaticStripped". By adding a go routine to constantly write to the console, these test now fail within a small number of iterations (on this branch). To see them ASAP, I'd recommend commenting out the other test cases and running the go_20 signalHandler tests in a loop to see the "bad g" failure.

We may not have a perfect understanding, but we believe that when a signal happens while we are on the switched stack (for pcre2), go tries to retrieve the "g" from the stack. It's not going to find "g" on the stack when we're currently on our own stack executing pcre2 code, so go crashes. Without the new go routine that writes to the console, the "bad g" happens very infrequently, but importantly "bad g" in the signalHandlerStatic tests has also been observed on branches that do not have the stack pool implementation here. To the best of our knowledge, we could see this any time a signal handler interrupts us while we're running our c code.

If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running.
https://github.com/golang/go/blob/master/src/runtime/cgocall.go
https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s

It's not required nor part of the problem described here, but we'd also like to note that we don't need to do stack switching while running pcre2 on our own reporting thread. It may make sense to stop switching the stack when in our own thread.

jrcheli · 2023-05-05T15:45:18Z

I've created #1469 to address 3. This allows us to merge in the solutions for the original issue (sub-issues 1 and 2).

jrcheli · 2023-05-05T16:53:34Z

I've created a branch bug/1469-go-signals-arm that contains the go routine that writes to the console. See #1469 for more info.

Closing this issue because it's been merged into release/1.4

michalbiesek added the bug Something isn't working label Oct 26, 2022

ricksalsa added this to the 1.3.1 milestone Mar 10, 2023

ricksalsa added this to AppScope 1.3.2 Mar 10, 2023

ricksalsa moved this to Todo in AppScope 1.3.2 Mar 10, 2023

jrcheli added a commit that referenced this issue Mar 20, 2023

(#1170) update stacksize in test/manual/clonetest, and add missing ge…

dc4aa68

…ttid()

jrcheli added a commit that referenced this issue Mar 20, 2023

(#1170) switch to our own stack when running pcre2 functions.

e4bfdd2

Needed when we run pcre2 on threads we don't own.

jrcheli mentioned this issue Mar 20, 2023

Always use our wrapper functions when using pcre2 #1392

Merged

ricksalsa modified the milestones: 1.3.1, 1.3.2 Mar 20, 2023

jrcheli added a commit that referenced this issue Mar 21, 2023

(#1170) fix race conditions in attach-glibc and attach-musl int tests

8ec6469

jrcheli mentioned this issue Mar 21, 2023

Performance #968

Closed

iapaddler moved this from New to Done in AppScope 1.3.2 Mar 22, 2023

jrcheli added a commit that referenced this issue Mar 24, 2023

(#1170) perf fix for repeated stack allocation/deallocations.

2c6a2cc

jrcheli added a commit that referenced this issue Mar 24, 2023

(#1170) Update comments for manual test.

7574901

iapaddler assigned jrcheli Mar 24, 2023

jrcheli added a commit that referenced this issue Mar 27, 2023

(#1170) Resolve review comments.

0ad0530

jrcheli added a commit that referenced this issue Mar 28, 2023

(#1170) Add debugging for intermittent go tests.

8e85d0b

jrcheli added a commit that referenced this issue Mar 29, 2023

(#1170) Improve signalHandler go test so it shows failures more frequ…

6c5f967

…ently

jrcheli modified the milestones: 1.3.2, 1.3.3 Apr 19, 2023

jrcheli added this to AppScope 1.4.0 Release (Summer 23) May 2, 2023

jrcheli removed this from the 1.3.3 milestone May 2, 2023

iapaddler moved this to 🔖 To Do in AppScope 1.4.0 Release (Summer 23) May 4, 2023

jrcheli mentioned this issue May 5, 2023

Go signals on aarch64 #1469

Open

jrcheli moved this from 🔖 To Do to 🏗 In progress in AppScope 1.4.0 Release (Summer 23) May 5, 2023

jrcheli added a commit that referenced this issue May 5, 2023

(#1170) comment out test changes, to be reintroduced in #1469

347c681

jrcheli added a commit that referenced this issue May 5, 2023

(#1469) Reintroduce test commented out by #1170.

ef85a21

jrcheli closed this as completed May 5, 2023

github-project-automation bot moved this from 🏗 In progress to ✅ Done in AppScope 1.4.0 Release (Summer 23) May 5, 2023

michalbiesek pushed a commit to michalbiesek/appscope that referenced this issue May 12, 2023

(criblio#1469) Reintroduce test commented out by criblio#1170.

24e17d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Clone behavior is different when scoped #1170

[Bug]: Clone behavior is different when scoped #1170

michalbiesek commented Oct 26, 2022

jrcheli commented Mar 20, 2023 •

edited

Loading

jrcheli commented Mar 20, 2023

jrcheli commented Mar 24, 2023

jrcheli commented Apr 4, 2023 •

edited

Loading

jrcheli commented May 5, 2023

jrcheli commented May 5, 2023

[Bug]: Clone behavior is different when scoped #1170

[Bug]: Clone behavior is different when scoped #1170

Comments

michalbiesek commented Oct 26, 2022

Steps To Reproduce

Environment

Requested priority

Relevant log output

jrcheli commented Mar 20, 2023 • edited Loading

jrcheli commented Mar 20, 2023

jrcheli commented Mar 24, 2023

jrcheli commented Apr 4, 2023 • edited Loading

jrcheli commented May 5, 2023

jrcheli commented May 5, 2023

jrcheli commented Mar 20, 2023 •

edited

Loading

jrcheli commented Apr 4, 2023 •

edited

Loading