Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash Analysis capability #1309

Closed
9 tasks done
seanvaleo opened this issue Feb 2, 2023 · 3 comments
Closed
9 tasks done

Crash Analysis capability #1309

seanvaleo opened this issue Feb 2, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@seanvaleo
Copy link
Collaborator

seanvaleo commented Feb 2, 2023

Add the capability for AppScope to produce useful information following an application crash. Whether the application was scoped or not.

What scenarios are we intending to support?

  • Daemon running in container ; Process crashes in same container [container pid provided]
  • Daemon running on host ; Process crashes on host [host pid provided]
  • Daemon running on host ; Process crashes in below container [host pid provided]
  • Daemon running in container ; Process crashes on host or in another container (requires --privileged where daemon is run)

How will the data be accessed?

  • From the file system (via the daemon)
  • From a network destination (via the daemon)

Main Components

check == merged

@seanvaleo
Copy link
Collaborator Author

seanvaleo commented Feb 14, 2023

Todo

  • when attached to redis-server, we are unable to stop it without it exiting (resolved by replacing SIGSTOP with sleep 5)
  • empty lines in json output from daemon when writing to a tcp destination
  • conditional inclusion of signal info in "snapshot" file (when from daemon-received signal)
  • don't create a snapshot from SIGILL created by libscope/openssl
  • demo crash analysis (see below repro instructions)
  • update scope daemon integration test
  • top goes into background (resolved by replacing SIGSTOP with sleep 5)
  • execute three scenarios (described above)
  • we have new go modules after 1315. Take a look at snyk output w.r.t. these.
  • validate Go panic output after signal handling (panic does not raise a signal)
  • validate integ tests run on Ubuntu 18
  • update our license
  • test container pid in ubuntu 18, 20 & 22 (John blocked on daemon running in container below)
  • process name not formatted correctly in snapshot file
  • daemon running in container ; Process crashes in same container : doesnt generate snapshot file (integ test)
    • error determining namespace
  • daemon running on host ; Process crashes in below container : error generating crash files
  • check for admin privileges when running scope daemon. return if not
  • resolve bug - hostname getting copied to host /etc directory
  • environment variable list contains lots of empty items in redis-server snapshot
  • can't get core file from a container when running the daemon on the host
  • we need to not exit when there's an error getting files
  • two signals received when we are attached to a process and send it one signal
  • what do we do about a crashing go app?
  • delete hostname file after copy from container
  • test with node and crash
  • add a suffix to snapshot files so that we support multiple signals
  • fix snapshotsigsegv unit test
  • test crashing java apps

@seanvaleo
Copy link
Collaborator Author

seanvaleo commented Feb 15, 2023

Demo/Repro Instructions

Daemon running in container ; Process crashes in same container

On host:

docker run -it --rm --cap-add=SYS_ADMIN --privileged -v /home/ubuntu/jrc/appscope2:/opt/appscope -v /sys/kernel/debug:/sys/kernel/debug:ro ubuntu:20.04

In container:

./bin/linux/x86_64/scope daemon
top
./bin/linux/x86_64/scope attach --backtrace --coredump top
kill -s SIGSEGV `pidof top`

Expected results:
In the container, the following should exist in /tmp/appscope/<pidof top>/ :

  • core
  • info
  • cfg
  • backtrace
  • snapshot

Daemon running on host ; Process crashes on host

On host:

sudo ./bin/linux/x86_64/scope daemon
top
sudo ./bin/linux/x86_64/scope attach --backtrace --coredump top
sudo kill -s SIGSEGV `pidof top`

Expected results:
On the host, the following should exist in /tmp/appscope/<pidof top>/ :

  • core
  • info
  • cfg
  • backtrace
  • snapshot

Daemon running on host ; Process crashes in below container

On host:

sudo ./bin/linux/x86_64/scope daemon
docker run --rm -it redis
sudo ./bin/linux/x86_64/scope attach --backtrace --coredump redis-server
sudo kill -s SIGSEGV `pidof redis-server`

Expected results:
On the host, the following should exist in /tmp/appscope/<pidof redis-server>/ :

  • core
  • info
  • cfg
  • backtrace
  • snapshot

@criblio criblio deleted a comment from jrcheli Feb 16, 2023
@seanvaleo
Copy link
Collaborator Author

seanvaleo commented Feb 17, 2023

Top

We observe two signals being received (and thus two snapshots being created) for top after sending it a SIGILL/similar.

We think our behavior is correct in terms of signal handling (in the library).

Top behaves as follows:

  • Register a valid signal handler with sigaction to catch SIGILL/etc.
  • When a signal is received:
    • It prints the signal to the console.
    • It then registers a second (null) signal handler (supposedly to allow things like coredump to work).
    • It then raises the original signal again, for it to be caught by the second signal handler.
    • (It does not print the repeated signal to the console.)

Since all apps don't behave like top, we think its best to act on both signals (rather than shut off our signal handler after the first one for example).

Worth noting: We send a sigabort when the second signal is received with the null handler (which is why top shows that it also received sigabort).

reference top code the function sig_abexit()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

1 participant