-
Notifications
You must be signed in to change notification settings - Fork 146
RFC: better debugging #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In the current state, a FORM crash looks like: or |
|
I cleaned this up a bit more, eg, it doesn't print any functions below I don't see any performance regressions due to keeping debug symbols and frame pointers in the release build. My opinion would be to always print the trace by default, but if anyone strongly opposes this I would add a setup parameter to control it. |
|
It doesn't look nice when I wonder if we really need to replace |
|
For TERMINATE I was vaguely following the "convention" that macros have uppercase names, but in the form sources it is true that plenty of macros do not have this. For me it would be fine to rename TERMINATE as you suggest. As for |
|
I think Another question would be: do we want FORM to depend on glibc? |
|
FORM already links |
|
I'm not referring to |
|
I see, sorry. Then as I understand the "proper" way to do this is for the configure script to check for backtrace, and then we include this code only on platforms where it exists? |
|
Yes, I think so. Or, optionally, we can consider a new configure script switch like |
|
About |
|
These should implement the changes we discussed. The last can be reverted if TERMINATE is preferred after all. |
|
I thought the last commit broke the static build but actually it went wrong before. But github gave it a green tick regardless...? Anyway it looks like https://sourceware.org/bugzilla/show_bug.cgi?id=15648 If I link with |
|
The multiple |
|
This is true, but the build will still be broken for users who want to do their own static build (for whatever reason) on older systems. Edit: though of course in this (rare?) situation one can just |
|
I cleaned these patches up, but we still need to decide what to do about the CI fail on 20.04. If we bump the version here things will be fine. Then in case someone reports that they can't do a static build on 20.04 in the future, we'd have to advise to configure with I'd like some input on having the backtrace "enabled" by default: in principle "simple form script errors" look more scary. It could be hidden behind |
|
Maybe |
|
By this do you mean that if we are doing a static link, disable backtrace if libpthread contains |
|
No. If I understand correctly, we can create a test program and check if the program can be linked with pthreads by |
|
A simple pthread program suffices, if you build as |
|
At least on Ubuntu 20.04, if I simply omit |
|
So using |
|
Yes, or |
|
Would it be useful if FORM prints some stack information, at least function names, even if else {
/* eu-addr2line not found */
char **strings;
strings = backtrace_symbols(stack, stacksize);
MesPrint("Backtrace:");
for ( int i = 0; i < stacksize && !stop; i++ ) {
char *p = strings[i];
while ( *p && *p != '(' ) p++;
MesPrint("%#%2d: %s\n", i, p);
/* Maybe check stopping conditions? */
}
MesPrint("Please install eu-addr2line for readable stack information.");
free(strings);
}This requires |
|
Yes, this is nice (I seem to remember having trouble getting this output before, maybe I messed up the rdynamic flag or something.) Here we can stop printing at main, but not at start_thread for tform, though we could stop at RunThread or RunSortBot instead, we don't actually need to know |
|
I don't observe any performance impact due to the new build options (at least for a test with mincer). |
|
Maybe diff --git a/sources/startup.c b/sources/startup.c
index 425a4d7..6865bda 100644
--- a/sources/startup.c
+++ b/sources/startup.c
@@ -45,6 +45,10 @@
#endif
#ifdef ENABLE_BACKTRACE
#include <execinfo.h>
+#ifdef LINUX
+ #include <stdint.h>
+ #include <inttypes.h>
+#endif
#endif
/*
@@ -1820,8 +1824,64 @@ VOID TerminateImpl(int errorcode, const char* file, int line, const char* functi
pclose(fp);
}
}
+#ifdef LINUX
+ else if ( !system("command -v addr2line > /dev/null 2>&1") ) {
+ /* Get the executable path. */
+ char exe_path[PATH_MAX];
+ {
+ ssize_t len = readlink("/proc/self/exe", exe_path, sizeof(exe_path) - 1);
+ if ( len != -1 ) {
+ exe_path[len] = '\0';
+ }
+ else {
+ goto backtrace_fallback;
+ }
+ }
+ /* Assume PIE binary and get the base address. */
+ uintptr_t base_address = 0;
+ {
+ char line[256];
+ FILE *maps = fopen("/proc/self/maps", "r");
+ if ( !maps ) {
+ goto backtrace_fallback;
+ }
+ /* See the format used by nommu_region_show() in fs/proc/nommu.c of the Linux source. */
+ if ( fgets(line, sizeof(line), maps) ) {
+ sscanf(line, "%" SCNxPTR "-", &base_address);
+ }
+ else {
+ fclose(maps);
+ goto backtrace_fallback;
+ }
+ fclose(maps);
+ }
+ MesPrint("Backtrace:");
+ for ( int i = 0; i < stacksize && !stop; i++ ) {
+ FILE *fp;
+ char cmd[PATH_MAX + 512];
+ uintptr_t addr = (uintptr_t)stack[i] - base_address;
+ MesPrint("%#%2d: %", i);
+ snprintf(cmd, sizeof(cmd), "addr2line -e \"%s\" -i -p -s -f -C 0x%" PRIxPTR, exe_path, addr);
+ fp = popen(cmd, "r");
+ while ( fgets(cmd, sizeof(cmd), fp) != NULL ) {
+ MesPrint("%s", cmd);
+ /* Don't show functions lower than "main" */
+ if ( strstr(cmd, "main") || strstr(cmd, "RunThread") || strstr(cmd, "RunSortBot") ) {
+ stop = 1;
+ }
+ }
+ pclose(fp);
+ }
+ }
+#endif
else {
/* eu-addr2line not found */
+#ifdef LINUX
+backtrace_fallback:
+#endif
char **strings;
strings = backtrace_symbols(stack, stacksize);
MesPrint("Backtrace:");
@@ -1835,7 +1895,11 @@ VOID TerminateImpl(int errorcode, const char* file, int line, const char* functi
}
/* Maybe check stopping conditions? */
}
+#ifdef LINUX
+ MesPrint("Please install addr2line or eu-addr2line for readable stack information.");
+#else
MesPrint("Please install eu-addr2line for readable stack information.");
+#endif
free(strings);
}
#else |
|
This patch works nicely for me. I just added the same skip of "main" etc as in the eu-addr2line code. I've just noticed that none of this works with parform, however, if there is a straightforward fix for that. |
|
For this, maybe we should use the value passed to Terminate to determine whether to print a trace or not. Then for simple script syntax errors, the backtrace spam can be avoided, and it only prints for more "serious problems"? While I like the idea for better bug reports generally, it is not so nice to get the back trace for, eg, missing semicolon or undefined symbol. This means a review of every Terminate call, and also in some cases it might be a bit tricky since functions return an error condition and Terminate is called further up the stack rather than terminating immediately. Also an option |
|
The last commit implements what we discussed on the video call: add an "on backtrace;" option which is off by default for form, on by default for vorm. To clean up for merge probably I should squash all commits here apart from the first, which can stay separate. |
b243618 to
a0e59b4
Compare
|
I think I want to do a bit more benchmarking of this before merging. I did not find any issues before, but it is perhaps concerning that the form and tform binaries go from ~2.5M to ~13M, due to not stripping the symbols. |
Slightly more useful information is given in case of a crash outside of a debugger.
If enabled, use eu-addr2line or addr2line to print a stack trace on crash, for easier debugging of long-running job crashes. This builds FORM with -g -fno-omit-frame-pointer, which leads to a performance penalty of about 1%. For this reason, the feature is disabled by default.
This is on by default for both form and vorm.
|
After some more benchmarking, I would say that the build with debug information is ~1% slower. For this reason, now the feature is disabled by default. If compiled in ( In terms of bug reporting it is not ideal to have it disabled by default, but probably users do not want an opt-out 1% performance loss for something that is not a bug fix. |
|
For information: I've cherry-picked the first commit (Terminate macro) into 4.3.2 but not the actual backtracing commits. |
Here are a few ideas which may help with debugging in case of crashes, particularly for longer running code and crashes which other people can't reproduce. The first uses
eu-addr2lineto print a stack trace inTerminate. The second replacesTerminateeverywhere with a macro which inserts file, line and function information into the error message, so one can see where exactlyTerminatewas called. The third retains debugging symbols in theformbuild. This "probably" doesn't have a measurable performance impact (needs to be tested), but it means you get a useful trace fromformcrashes, and not justvorm.Any thoughts?