-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Energy APIs segfault with strange pthread_mutex_init
issue on Lassen
#568
Comments
pthread_mutex_init
issue on Lassen
@rountree Comments from debugging on Jul 30. Todo:
Notes:
|
Don’t ask me to explain this, but if I declare you mlock as static in Power9.c, it appears to work: static pthread_mutex_t mlock; rountree@lassen709 ~/w/lassen/source/hw$ gcc -fsanitize=address -O0 -DVARIORUM_ONLY hw.c -Wall -Wextra -lvariorum -I${HOME}/w/lassen/install/variorum/include -L/usr/WS1/rountree/lassen/source/variorum/lee218build/variorum -Wl,-rpath=/usr/WS1/rountree/lassen/source/variorum/lee218build/variorum -o lee218 -g rountree@lassen709 ~/w/lassen/source/hw$ ./lee218 ================================================================= Direct leak of 100 byte(s) in 1 object(s) allocated from: SUMMARY: AddressSanitizer: 100 byte(s) leaked in 1 allocation(s). |
Thanks @lee218llnl. I confirm that using To be clear, we have discovered at this stage that this is not a Variorum bug. There seems to be a strange naming conflict with Note that we get this segfault when using the I tried to reproduce with My suggestion at this point is to just rename the |
This was first discovered in Tre's testing of the updated integration of Variorum and Caliper when we encountered a segmentation fault.
Note that this bug is only applicable to
lassen
, where we use pthreads for sampling power in the energy APIs. Thanks @tjeter for finding this sneaky issue!variorum-get-energy-json-example
example without-fsanitize=address
, it works correctly in our Variorum tests but fails in Caliper integration.-fsanitize=address
in ourCMakeLists.txt
insrc/examples
, we now get the same error as we see in the caliper integration PR and are able to reproduce it at our end on Lassen. Rather strange that it doesn't occur at our end when we don't use address sanitizer.Currently, it seems like the fix is to declare the
pthread_mutex_t mlock
asstatic
here. We need to discuss this more to see why this error occurs and where the memory is getting corrupted. @rountree and I spent significant time debugging withgdb
and were unable to find the source of corruption.@tpatki's guess is there is some strange naming conflict with the name
mlock
, as the error goes away if we (1) rename this variable to any other name, or (2) declare it asstatic
. More investigation and understanding is needed here.The text was updated successfully, but these errors were encountered: