Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Clone behavior is different when scoped #1170

Closed
michalbiesek opened this issue Oct 26, 2022 · 6 comments · Fixed by #1392
Closed

[Bug]: Clone behavior is different when scoped #1170

michalbiesek opened this issue Oct 26, 2022 · 6 comments · Fixed by #1392
Assignees
Labels
bug Something isn't working

Comments

@michalbiesek
Copy link
Contributor

Steps To Reproduce

With following code:

// gcc -g test/manual/clonetest.c -o clonetest
// Different behavior in case of:
// sudo ./clonetest
// sudo ldscope -- ./clonetest

#define _GNU_SOURCE
#include <linux/limits.h>
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

static char child_stack[5000];

int child_fn(void* arg) {
  for (int i = 0 ; i < 5; ++i) {
      printf(" child_fn begin pid = %d tid = %d iteration = %d\n", getpid(), gettid(), i);
      sleep(1);
  }
  printf(" child_fn after iteration pid = %d tid = %d\n", getpid(), gettid());
  return 0;
}

int main() {
  int status;
  printf("main started pid = %d tid = %d\n", getpid(), gettid());
  pid_t retPid = clone(child_fn, child_stack+5000, CLONE_VFORK, NULL);
  printf("main after clone parent pid = %d tid = %d retPid = %d errno = %d\n", getpid(), gettid(), retPid, errno);

  waitpid(retPid, &status, 0);
  printf("main after after wait pid = %d tid = %d\n", getpid(), gettid());
  return 0;
}

We can observe a difference in run unscoped:

sudo ./clonetest 
main started pid = 2326563 tid = 2326563
 child_fn begin pid = 2326564 tid = 2326564 iteration = 0
 child_fn begin pid = 2326564 tid = 2326564 iteration = 1
 child_fn begin pid = 2326564 tid = 2326564 iteration = 2
 child_fn begin pid = 2326564 tid = 2326564 iteration = 3
 child_fn begin pid = 2326564 tid = 2326564 iteration = 4
 child_fn after iteration pid = 2326564 tid = 2326564
main after clone parent pid = 2326563 tid = 2326563 retPid = 2326564 errno = 0
main after after wait pid = 2326563 tid = 2326563

and scoped:

sudo ldscope -- ./clonetest 
main started pid = 2326822 tid = 2326822
 child_fn begin pid = 2326823 tid = 2326823 iteration = 0
main after clone parent pid = 2326822 tid = 2326822 retPid = 2326823 errno = 0
main after after wait pid = 2326822 tid = 2326822

Environment

No response

Requested priority

No response

Relevant log output

No response

@michalbiesek michalbiesek added the bug Something isn't working label Oct 26, 2022
@ricksalsa ricksalsa added this to the 1.3.1 milestone Mar 10, 2023
@ricksalsa ricksalsa moved this to Todo in AppScope 1.3.2 Mar 10, 2023
@jrcheli
Copy link
Contributor

jrcheli commented Mar 20, 2023

By running dmesg after running the above test, we saw a segfault like this:

[2929580.138031] clonetest[31577]: segfault at 5596cbd387a0 ip 00007fbff0003cbc sp 00005596cbd38768 error 6 in libscope.so[7fbfefeb2000+874000]

When we dug into this (by using objdump to see what was at this offset in libscope.so), it pointed to where we were calling a pcre2 function. When libscope.so see writes of console data, we call pcre2 which implements a regex filter. This filter allows users to define a whitelist of strings they want us to see - aka SCOPE_EVENT_CONSOLE_VALUE.

We make this call to pcre2 on the stack of the client (here, the child process's) stack. From our previous experience (happens to be in go, which uses very small stack) we know that pcre2 has the potential to use a lot of stack space.
The segfault we see above is due to our pcre2 code for SCOPE_EVENT_CONSOLE_VALUE overflowing the stack. This can be confirmed by changing CHILD_STACK_BYTES from 5000 to 50000.

The other way to confirm that the console write was a causing of this problem... we did not see this problem (did not blow the stack) if we commented out this code in initHook(). This code allows libscope.so to see the console writes in the first place.

            if (g_fn.__write_libc) {
                rc = funchook_prepare(funchook, (void**)&g_fn.__write_libc, __write_libc);
            }

To prevent this from happening, we need to stop using the client stack when pcre2 is involved. We've done this in our go code, so we need to apply a similar pattern here too (even when the app we're in was not written in go).

@jrcheli
Copy link
Contributor

jrcheli commented Mar 20, 2023

We already have two existing wrappers that do the required stack switching that is needed.

  1. pcre2_match_wrapper() in src/com.c
  2. regexec_wrapper() in src/com.c

W.r.t. completeness of our solution, we need to make sure that we use these _wrapper() functions everywhere. We don't have anything in contrib that depends on the pcre2 library, so there is nothing in the contrib directory that would need to use these wrapper functions. That only leaves the source code we've written for libscope.so. These searches demonstrate that we already use the required wrapper functions everywhere:
find src/ -type f | xargs grep pcre2_match | grep -v pcre2_match_ shows that the only place we call pcre2_match() is from pcre2_match_wrapper() in src/com.c
find src/ -type f | xargs grep regexec | grep -v regexec_ shows that the only place we call regexec() is from regexec_wrapper() in src/com.c

The only reason we have this problem is that we had clauses at the top of our wrapper functions which effectively made it so the stack was only switched for go applications. The final solution is to just remove these clauses from our wrapper functions so non-go applications will change to use our own stack just like the go applications have been doing.

@jrcheli
Copy link
Contributor

jrcheli commented Mar 24, 2023

Before the last commit, the splunk integration test had very poor performance results. By very poor I mean the overhead of unscoped vs scoped went from <10% to more than 100%. This was traced back to our memory allocator, which was being used by our stack switching code. Every malloc of a 32k stack was resulting in a mmap, every free of 32k stack was resulting in an munmap. These are both syscalls, resulting in context switches which were killing our performance. The last commit adds a pool of stacks that can be reused to avoid the context switching from the mmap/munmap syscalls. With this, the performance measured by the splunk integration test was restored to the level before we added this stack switching.

Another thing of note is that we noticed that the pcre2 code was being called many (>700) times per http request/response. I've noted this in #968, for further investigation.

@jrcheli
Copy link
Contributor

jrcheli commented Apr 4, 2023

Wow. This issue is one that keeps on giving. The sub-issues:

  1. Blowing the stack of non-go apps, solved by stack switching as was being done in go. (above)
  2. Performance of our memory allocator (mmaping and unmmaping), solved by a "stack pool". (above)
  3. Now one more "signals on ARM while in c code" (described here).

On aarch64 machines, go processes crash intermittently with a "bad g". This has only been seen on aarch64. This was uncovered in go integration tests (go_20) in test cases for "signalHandlerStatic" and "signalHandlerStaticStripped". By adding a go routine to constantly write to the console, these test now fail within a small number of iterations (on this branch). To see them ASAP, I'd recommend commenting out the other test cases and running the go_20 signalHandler tests in a loop to see the "bad g" failure.

We may not have a perfect understanding, but we believe that when a signal happens while we are on the switched stack (for pcre2), go tries to retrieve the "g" from the stack. It's not going to find "g" on the stack when we're currently on our own stack executing pcre2 code, so go crashes. Without the new go routine that writes to the console, the "bad g" happens very infrequently, but importantly "bad g" in the signalHandlerStatic tests has also been observed on branches that do not have the stack pool implementation here. To the best of our knowledge, we could see this any time a signal handler interrupts us while we're running our c code.

If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running.
https://github.com/golang/go/blob/master/src/runtime/cgocall.go
https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s

It's not required nor part of the problem described here, but we'd also like to note that we don't need to do stack switching while running pcre2 on our own reporting thread. It may make sense to stop switching the stack when in our own thread.

@jrcheli jrcheli modified the milestones: 1.3.2, 1.3.3 Apr 19, 2023
@jrcheli jrcheli removed this from the 1.3.3 milestone May 2, 2023
@jrcheli
Copy link
Contributor

jrcheli commented May 5, 2023

I've created #1469 to address 3. This allows us to merge in the solutions for the original issue (sub-issues 1 and 2).

@jrcheli jrcheli moved this from 🔖 To Do to 🏗 In progress in AppScope 1.4.0 Release (Summer 23) May 5, 2023
@jrcheli
Copy link
Contributor

jrcheli commented May 5, 2023

I've created a branch bug/1469-go-signals-arm that contains the go routine that writes to the console. See #1469 for more info.

Closing this issue because it's been merged into release/1.4

@jrcheli jrcheli closed this as completed May 5, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in AppScope 1.4.0 Release (Summer 23) May 5, 2023
michalbiesek pushed a commit to michalbiesek/appscope that referenced this issue May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants