-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bitvector_next() returns incorrect results on Fedora armv7hf with system LLVM 3.7.0 #13752
Comments
@nalimilan I can give you access to a scaleway arm machine if it helps to have a second machine to help work through this. If so, please send me your public key and I will set it up. |
If I put twice the same line in ccall(:bitvector_next, UInt64, (Ptr{UInt32}, UInt64, UInt64), [0x00000002], 0, 10)
ccall(:bitvector_next, UInt64, (Ptr{UInt32}, UInt64, UInt64), [0x00000002], 0, 10) I get a segfault:
Adding only one of the two |
@ViralBShah Thanks, but for now I'm not sure that would really help me. I've found more than my share of things to debug on a single machine! :-) I'm rather in need of guidance. |
FWIW, I don't see these segfaults on scaleway running both the ccalls. |
Note I only see them when setting |
calling convention looks fine for this call (ccall tests will segfault due to the warning however):
|
@vtjnash I get the same result here:
What do you mean by "ccall tests will segfault due to the warning however"? Due to the I've tried building the in-tree LLVM with the same options as the system package, but I'm unable to reproduce the same failure. What can be so specific about the system LLVM, given that it's built the same way on the same hardware? For the record, the build command I used is: |
I don't see this issue with ArchLinux system LLVM 3.7 either. Maybe could you attach the disassembly of the ccall ( |
@yuyichao Sorry for the delay. Unfortunately, when I run this code from
I get
This is with (system) LLVM 3.7.1 plus Julia patches. |
.... I'm not sure if Can you make the following change and run it with gdb? diff --git a/base/intset.jl b/base/intset.jl
index 5b610ea..ec38735 100644
--- a/base/intset.jl
+++ b/base/intset.jl
@@ -1,5 +1,12 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license
+function azerty()
+ ccall(:bitvector_next, UInt64, (Ptr{UInt32}, UInt64, UInt64),
+ reinterpret(Ptr{Int32}, 0x12345678),
+ 0x2221111111, 0x4444441111)
+end
+azerty()
+
abstract AbstractSet{T}
type IntSet <: AbstractSet{Int} gdb --args ../julia -Ccortex-a9 -f --output-ji /dev/null coreimg.jl
(gdb) start
(gdb) br bitvector_next
(gdb) c
# <when you hit the breakpoint>
(gdb) disassemble $pc
# <paste output>
(gdb) up
(gdb) info registers
# <paste output>
(gdb) x/32b $sp - 16
# <paste output>
(gdb) x/40i $pc - (25 * 4)
# <paste output; decrease the 25 if you get a memory not accessible error> |
(Replace |
Thanks for the instructions. Here's what I get:
|
Hmm, so there IS a problem with the ABI, the arguments should be passed in
but are rather passed in
So it seems that the alignment of Is Also c.c. @maleadt |
I used |
Just a random guess, it seems that the mismatch in calling convention could be that LLVM is somehow configured to use the legacy ABI instead of EABI. |
The build flags used by Fedora are listed here: http://pkgs.fedoraproject.org/cgit/rpms/llvm3.7.git/tree/llvm3.7.spec#n194 This includes Though the recent packages have moved to CMake and no longer pass these flags, so I could try again. |
Please reopen if this still happens. |
I initially observed this in #10602. When building Julia with LLVM 3.7.0 and
USE_SYSTEM_LLVM=1
, I get a failure in inference which I traced back to bugs inIntSet
. The error happens infirst(s::IntSet)
, due tonext(IntSet([1]), 0)[1]
returning apparently random values like4690168797640785920 == 0x4116d3ac00000000
, all with the 4 lower bytes equal to zero.This comes from these buggy results:
What I don't understand is how/where the conversion of these values to
Int64
gives the very large numbers thatnext
returns.The problem does not happen with
USE_SYSTEM_LLVM=0 LLVM_VER=3.7.0
.Could it have something to do with an incorrect ccall ABI, as indicated by the "ccall is defaulting to llvm ABI, since no platform ABI has been defined for this CPU/OS combination" warning?
The text was updated successfully, but these errors were encountered: