-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiler segmentation fault #28648
Comments
Just a comment, profiler works fine for other functions, for example Basic Usage example in https://docs.julialang.org/en/v0.6.2/manual/profile/ |
Tried this on linux and it works. |
I can reproduce on Mac |
Same thing on linux:
|
No segfault on Windows, profiler works:
|
Not sure if this is related: |
I have my Julia session randomly being silently killed like 50% of the times when running |
I am seeing the same thing, with both Julia 0.7 and Julia 1.0, on Mac. Not using Juno. Never had any issues with Julia 0.6.4. Specifying a smaller number of instruction pointers using |
This example managers to trigger it on both Mac and Linux on 1.0
|
The above example fails for me too (Mac, 1.0). But if you put I got the same stack trace that @sverek posted above. |
Can somebody who can reproduce this try running it with |
Started julia 1.0 on macOS with and ran code by simonbyrne and it segfaults REPL in the same way as without check-bounds
|
I just encountered the same issue on Mac although there's an extra warning. Not sure if it's the same bug.
|
I can reproduce this on Windows too, on Julia 1.0, with the example above. Same stack trace as posted above. |
Perhaps this will help to find the cause: Some of the pointers returned by svec(:kwfunc, Symbol("./boot.jl"), 321, MethodInstance for kwfunc(::Any), false, false, Ptr{Nothing} @0x00007fb5617cba50) but pointer Ignoring these "off-by-one" addresses seems to yield plausible results. |
Hmm, I can't reproduce that using Simons example above (on a Mac on Julia 1.0). While some pointers are problematic and segfault when |
I cannot reproduce the failures above. @RalphAS 's observation makes it seem likely that this is a libunwind problem. There are some interesting observations in libunwind's README. It's also worth noting that several PRs have been committed to patch or work around libunwind failures (e.g., #28291, #24379, #4159, #24023, probably more). For starters we should have folks report a fair amount architecture detail: Julia build type: from source julia> versioninfo()
Julia Version 1.0.1-pre.139
Commit 9ee3f881b3* (2018-09-12 15:03 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
JULIAFUNCDIR = /home/tim/juliafunc
JULIA_CPU_THREADS = 2 (I'm on the branch for #28764), and since I'm on linux: tim@diva:~$ ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27
tim@diva:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping: 4
CPU MHz: 1009.495
CPU max MHz: 3000.0000
CPU min MHz: 500.0000
BogoMIPS: 4788.76
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dtherm ida arat pln pts flush_l1d Is it worth having people run libunwind's tests and report the results? I navigated to $ ./configure
# suppressed output
$ make check
# lots of build output, then
PASS: test-proc-info
PASS: test-static-link
PASS: test-strerror
PASS: Gtest-bt
PASS: Ltest-bt
PASS: Gtest-exc
PASS: Ltest-exc
PASS: Gtest-init
PASS: Ltest-init
PASS: Gtest-concurrent
PASS: Ltest-concurrent
../config/test-driver: line 107: 7003 Segmentation fault (core dumped) "$@" > $log_file 2>&1
FAIL: Gtest-resume-sig
../config/test-driver: line 107: 7024 Segmentation fault (core dumped) "$@" > $log_file 2>&1
FAIL: Ltest-resume-sig
../config/test-driver: line 107: 7044 Segmentation fault (core dumped) "$@" > $log_file 2>&1
FAIL: Gtest-resume-sig-rt
../config/test-driver: line 107: 7064 Segmentation fault (core dumped) "$@" > $log_file 2>&1
FAIL: Ltest-resume-sig-rt
XFAIL: Gtest-dyn1
XFAIL: Ltest-dyn1
PASS: Gtest-trace
PASS: Ltest-trace
PASS: test-async-sig
PASS: test-flush-cache
PASS: test-init-remote
PASS: test-mem
PASS: Ltest-varargs
PASS: Ltest-nomalloc
PASS: Ltest-nocalloc
PASS: Lrs-race
PASS: test-ptrace
PASS: test-setjmp
PASS: run-check-namespace
PASS: run-ptrace-mapper
PASS: run-ptrace-misc
PASS: run-coredump-unwind
============================================================================
Testsuite summary for libunwind 1.1
============================================================================
# TOTAL: 33
# PASS: 27
# SKIP: 0
# XFAIL: 2
# FAIL: 4
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
Please report to libunwind-devel@nongnu.org
============================================================================ |
I get the segfault all the time too, on Ubuntu, Julia 1.0.0. It's triggered by @simonbyrne 's code above.
|
Is that a source build or a downloaded binary? |
Oooh, interesting: I just tried a downloaded julia binary and got the segfault. But my source-build is fine. Is everyone who is experiencing this using a binary? versioninfo for the binary: julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
Environment:
JULIAFUNCDIR = /home/tim/juliafunc
JULIA_CPU_THREADS = 2 |
Downloaded binary. |
Downloaded binary (mac, linux, windows). |
Would be interesting to check if nightly has the same problem. |
I can reproduce this on nightly Julia Version 1.1.0-DEV.271
Commit 16516b5fbf (2018-09-17 12:51 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake) |
Crashes on nightly macOS for me too
|
I'm on a downloaded binary (1.0.0, commit 5d4eaca, macOS) |
This seems to be related to more recent architectures. I haven't been able to reproduce this if I set |
My experiments described above (with consistent failures on sufficiently large profiling runs) were on a downloaded binary 1.0.0, Linux x86_64 (Haswell). On an installation locally built from source 1.1.0-DEV.281, same system, I'm not seeing any segfaults (or off-by-one pointers) so far. I do still see a few "stragglers", which are incorrectly printed outside of the tree. |
Built julia from source, then no crash with example by @simonbyrne
|
Seems pretty clearly related to static binaries, but dependent on architecture. CC @staticfloat in case he hasn't seen this. |
I can also trigger this reliably using the official 1.0.0 binary. Interestingly, if I put a Secondly, I believe this is due to the sysimg multiversioning we do on the buildbots. I can reproduce the segfault locally if I set the makevars My from-source builds are from the latest
|
Note: I updated the above to note that |
Yes I'm on a Broadwell processor |
I see this too on Juno on a Mac with Skylake. |
Bump. Is there any hope of fixing this? |
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list that do not correspond to an actual method (such as jlplt thunks). Also some code cleanup for clarity. fix #28648
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list that do not correspond to an actual method (such as jlplt thunks). Also some code cleanup for clarity. fix #28648
Previously, with a multi-versioned system image, there might be additional entries at the end of the clone list that do not correspond to an actual method (such as jlplt thunks). Also some code cleanup for clarity. fix #28648
Nice collaborative work building the reduction folks. Thanks! |
I'm also seeing this on Linux with v1.0.3 binary (Skylake CPU). Putting a I'm not sure if I'm facing the same error as those posted above. I didn't have the error a while ago. A workaround for me was to use Not sure if the following stacktrace helps:
|
When running the profiler in Julia 1.0.0, the REPL crashes when running Profile.print()
Also tried with Juno profiler. Crashes REPL while profiling.
Julia 1.0.0 prebuilt binaries.
macOS 10.13.6
Example:
https://gist.github.com/sverek/107e64a21eed660b273d0fd2f5d366e3
The text was updated successfully, but these errors were encountered: