Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with TLS and std.parallelism on macOS #2187

Closed
jacob-carlborg opened this issue Jul 1, 2017 · 12 comments
Closed

Segfault with TLS and std.parallelism on macOS #2187

jacob-carlborg opened this issue Jul 1, 2017 · 12 comments

Comments

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Jul 1, 2017

I'm working on adding support for building DStep with LDC. Some of the tests are failing due to a segmentation fault, which seems related to using std.parallelism together with a TLS variable. The issue seems to only occur on macOS, at least it doesn't occur in Travis CI in a Docker environment. Running the test in a debugger gives back this stack trace:

Process 69261 launched: './bin/dstep' (x86_64)
Process 69261 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001001a6a27 dstep`_aaInX + 151
dstep`_aaInX:
->  0x1001a6a27 <+151>: movq   (%rcx,%rbx), %rax
    0x1001a6a2b <+155>: cmpq   %r12, %rax
    0x1001a6a2e <+158>: jne    0x1001a6a4e               ; <+190>
    0x1001a6a30 <+160>: movq   (%r15), %rax
(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001001a6a27 dstep`_aaInX + 151
    frame #1: 0x00000001000498ce dstep`_D5dstep10translator14IncludeHandler14IncludeHandler14isKnownIncludeMFAyaZAya at IncludeHandler.d:214
    frame #2: 0x000000010001312b dstep`_D5dstep10translator14IncludeHandler14IncludeHandler9toImportsMFC5dstep10translator6Output6OutputZv at IncludeHandler.d:161
    frame #3: 0x0000000100011968 dstep`_D5dstep10translator10Translator10Translator17translateToStringMFZAya at Translator.d:118
    frame #4: 0x000000010001174e dstep`_D5dstep10translator10Translator10Translator9translateMFZv at Translator.d:73
    frame #5: 0x0000000100058e2b dstep`_D5dstep6driver11Application9ParseFile15startConversionMFZv at Application.d:198
    frame #6: 0x000000010005882d dstep`_D5dstep6driver11Application11Application16startParsingFileFxS5dstep13Configuration13ConfigurationAyaAyaZv at Application.d:116
    frame #7: 0x000000010005b7e7 dstep`_D3std11parallelism165__T4TaskS106_D5dstep6driver11Application11Application16startParsingFileFxS5dstep13Configuration13ConfigurationAyaAyaZvTS5dstep13Configuration13ConfigurationTAyaTAyaZ4Task4implFPvZv at parallelism.d:449
    frame #8: 0x000000010012fb19 dstep`_D3std11parallelism8TaskPool12doSingleTaskMFZv + 41
    frame #9: 0x000000010018be89 dstep`thread_entryPoint + 377
    frame #10: 0x00007fff9df2f93b libsystem_pthread.dylib`_pthread_body + 180
    frame #11: 0x00007fff9df2f887 libsystem_pthread.dylib`_pthread_start + 286
    frame #12: 0x00007fff9df2f08d libsystem_pthread.dylib`thread_start + 13

The segmentation fault occurs here [1]. std.parallelism is used here [2][3]. If I remove the usage of std.parallelism or making the TLS variable global the segmentation fault does not occur.

The command I've been using to reproduce this is:

./bin/dstep test_files/clang-c/BuildSystem.h test_files/clang-c/CXCompilationDatabase.h test_files/clang-c/CXErrorCode.h test_files/clang-c/CXString.h test_files/clang-c/Documentation.h test_files/clang-c/Index.h test_files/clang-c/Platform.h -Itest_files --public-submodules --package clang.c -Iresources -o asd

Unfortunatley I haven't spent any time on trying to reduce the test case, because I noticed that the variable is never written to, so making it immutable fixed the issue and made it more correct as well.

This issue might be the same as #666, but I'm not entirely sure since the issue did not occur in Travis CI on Linux using Docker. It does occur in Travis CI on macOS.

I've tried using both LDC 1.2.0 and 1.3.0-beta2, on macOS 10.12.5.

[1] https://github.com/jacob-carlborg/dstep/blob/cc83cbc4523878ed6cec500e581d8e46534223db/dstep/translator/IncludeHandler.d#L214

[2] https://github.com/jacob-carlborg/dstep/blob/cc83cbc4523878ed6cec500e581d8e46534223db/dstep/driver/Application.d#L84-L89

[3] https://github.com/jacob-carlborg/dstep/blob/cc83cbc4523878ed6cec500e581d8e46534223db/dstep/driver/Application.d#L92-L93

@dnadlinger
Copy link
Member

Does your code use fibres that move across threads? If no, it's not related to #666.

@jacob-carlborg
Copy link
Contributor Author

Not explicitly and no fibers are used explicitly. The application is creating a new task (std.parallelism.Task) for each input file, then calls executeInNewThread on each task. Then it ends with calling yieldForce on each task. I'm not sure of how the internals of std.parallelism work. Searching for fiber in std.parallelism results in no hits.

@jacob-carlborg
Copy link
Contributor Author

I got some help from Russel Winder to try this directly on Linux (not inside a Docker environment), the problem does not occur. So it seems to be macOS specific.

@jacob-carlborg jacob-carlborg changed the title Segfault with TLS and std.parallelism Segfault with TLS and std.parallelism on macOS Jul 2, 2017
@kinke
Copy link
Member

kinke commented Jul 2, 2017

Are you using the shared or static libs? If shared, could you give the static ones a try and see if those are affected too?

@jacob-carlborg
Copy link
Contributor Author

I'm using what's default from the release provided here on GitHub. Looks like that's the static libs.

@s-ludwig
Copy link
Contributor

s-ludwig commented Oct 23, 2018

I got a similar problem on macOS (does not occur on other systems or with DMD). From a coarse look, it appears that the TLS section of the non-main thread in the repro case is not scanned correctly when the collection is triggered by the main thread: https://gist.github.com/s-ludwig/5b924673b826c4c8427c0a5460631ba9 (edit: LDC 1.12.0)

@s-ludwig
Copy link
Contributor

For reference, related forum thread with a working fix: https://forum.dlang.org/post/mailman.6193.1546546742.29801.digitalmars-d-ldc@puremagic.com

It just wasn't clear yet under which circumstances the fix is actually needed.

@kinke
Copy link
Member

kinke commented Apr 15, 2019

Proper (?) fix proposed by Iain upstream: dlang/druntime#2558

s-ludwig added a commit to vibe-d/vibe-core that referenced this issue Nov 22, 2019
@s-ludwig
Copy link
Contributor

1.20.0-beta1 appears to work fine, so I think this can be closed, unless @jacob-carlborg actually hit a different issue,

@kinke
Copy link
Member

kinke commented Jan 31, 2020

Ah yeah, I've forgotten about this - the upstream fix was cherry-picked into LDC v1.19.

@jacob-carlborg
Copy link
Contributor Author

The code looks different now. It's now using std.parallelism.parallel instead of directly using the tasks. I cannot reproduce the issue with LDC 1.19 or 1.18.

@kinke
Copy link
Member

kinke commented Feb 4, 2020

Let's close this then, zombie issues aren't really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants