Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdsh 2.32 segfault on OS X 10.11 #95

Closed
ilovezfs opened this issue Jun 23, 2017 · 9 comments · Fixed by #96
Closed

pdsh 2.32 segfault on OS X 10.11 #95

ilovezfs opened this issue Jun 23, 2017 · 9 comments · Fixed by #96

Comments

@ilovezfs
Copy link

I'm getting a segfault with the 2.32 release on OS X 10.11:

iMac-TMP:~ joe$ sudo lldb /usr/local/Cellar/pdsh/2.32/bin/pdsh 
Password:
(lldb) target create "/usr/local/Cellar/pdsh/2.32/bin/pdsh"
Current executable set to '/usr/local/Cellar/pdsh/2.32/bin/pdsh' (x86_64).
(lldb) r -V
Process 76749 launched: '/usr/local/Cellar/pdsh/2.32/bin/pdsh' (x86_64)
pdsh@iMac-TMP: Unable to determine ownership of pdsh binary: Bad address
Process 76749 stopped
* thread #1: tid = 0x8af88a, 0x000000010001bb92 pdsh`_next_tok(sep=",", str=0x00007fff5fbffac0) + 34 at split.c:51, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xa00000020)
    frame #0: 0x000000010001bb92 pdsh`_next_tok(sep=",", str=0x00007fff5fbffac0) + 34 at split.c:51
   48  	    int level = 0;
   49  	
   50  	    /* push str past any leading separators */
-> 51  	    while (**str != '\0' && strchr(sep, **str) != NULL)
   52  	        (*str)++;
   53  	
   54  	    if (**str == '\0')
(lldb) bt
* thread #1: tid = 0x8af88a, 0x000000010001bb92 pdsh`_next_tok(sep=",", str=0x00007fff5fbffac0) + 34 at split.c:51, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xa00000020)
  * frame #0: 0x000000010001bb92 pdsh`_next_tok(sep=",", str=0x00007fff5fbffac0) + 34 at split.c:51
    frame #1: 0x000000010001bafb pdsh`list_split(sep=",", str="") + 75 at split.c:96
    frame #2: 0x0000000100005e57 pdsh`_mod_initialize_modules_by_name(names="", m=0x0000000100800600) + 55 at mod.c:348
    frame #3: 0x0000000100005a5e pdsh`mod_load_modules(dir="/usr/local/Cellar/pdsh/2.32/lib/pdsh", opt=0x00007fff5fbffb80) + 78 at mod.c:367
    frame #4: 0x0000000100001b3c pdsh`main(argc=2, argv=0x00007fff5fbffc98) + 236 at main.c:112
    frame #5: 0x00007fff8eea35ad libdyld.dylib`start + 1
    frame #6: 0x00007fff8eea35ad libdyld.dylib`start + 1
(lldb) 
@grondo
Copy link
Member

grondo commented Jun 23, 2017

Thanks, what was the pdsh cmdline that caused the segfault?

@ilovezfs
Copy link
Author

pdsh -V

@ilovezfs
Copy link
Author

see the (lldb) r -V above 🙂

@grondo
Copy link
Member

grondo commented Jun 23, 2017

Oh, got it, I didn't put it together that you were running under a debugger 😅

In order to update to latest autotools, I had dropped support for libltdl in favor of straight dlopen(3) -- I'm guessing this has something to do with the new OSX failure. I don't know anything about loading DSOs on OSX -- apparently dlopen is supported but I'm not sure if there is some extra magic required... I'll have to find a Mac on which to test.

@ilovezfs
Copy link
Author

@grondo feel free to abuse our CI if you want.

url "https://github.com/grondo/pdsh.git", :revision => "943bf12f622a1165058ee8240044980b4594646a"
version "2.32.0.1"

to test an arbitrary commit as if it's a real version.

@grondo
Copy link
Member

grondo commented Jun 23, 2017

Cool! Thanks, can you paste quick instructions?

Travis-CI also has OSX images for testing and I should have added that to the automated CI earlier. The only problem is it seems to take many minutes for an OSX image to be available to run the tests.

@ilovezfs
Copy link
Author

@grondo
Copy link
Member

grondo commented Jun 23, 2017

Thank you!

@grondo
Copy link
Member

grondo commented Jun 25, 2017

Wow, using Travis as an OSX development environment was painful.

I think I did find the source of this crash though (I hope). The problem was that different source files in pdsh were seeing different definitions of the bool type with different sizes, causing bad offsets into structs, most notably the pdsh options struct opt_t.

See

https://github.com/grondo/pdsh/tree/osx-fixes

This branch additionally has some warnings from clang on OSX fixed, and adds an OSX build to Travis-CI for pdsh.

There is one remaining problem, however. If I add genders to the Travis build for OSX then crashes start again whenever the genders module is used. I was not successful debugging this iteratively via Travis, so if anyone gets a good backtrace, that would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants