Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM build failing during bootstrap on Raspberry Pi 2 #10235

Closed
sbromberger opened this issue Feb 18, 2015 · 87 comments
Closed

ARM build failing during bootstrap on Raspberry Pi 2 #10235

sbromberger opened this issue Feb 18, 2015 · 87 comments
Labels
system:arm ARMv7 and AArch64

Comments

@sbromberger
Copy link
Contributor

...
multidimensional.jl
primes.jl
reducedim.jl
ordering.jl
collections.jl
sort.jl
combinatorics.jl
version.jl
error during bootstrap:
LoadError(at "sysimg.jl" line 170: LoadError(at "version.jl" line 102: BoundsError(a=Array{Union, 1}[], i=(1,))))

Makefile:162: recipe for target '/home/seth/dev/julia/julia/usr/lib/julia/sys0.o' failed
make[1]: *** [/home/seth/dev/julia/julia/usr/lib/julia/sys0.o] Error 1
Makefile:76: recipe for target 'julia-sysimg-release' failed
make: *** [julia-sysimg-release] Error 2

ARM.inc:

BFORCE_ARMV7=1

override LLVM_ASSERTIONS=1
LLVM_FLAGS+="--with-cpu=cortex-a9 --with-float=hard --with-abi=aapcs-vfp --with-fpu=neon --enable-targets=arm --enable-optimized --enable-assertions --disable-compiler-version-checks"

override OPENBLAS_DYNAMIC_ARCH=0
override OPENBLAS_TARGET_ARCH=ARMV7
override USE_BLAS64=0

override LLVM_VER=3.5.1

override USE_SYSTEM_FFTW=1
override USE_SYSTEM_GMP=1
override USE_SYSTEM_MPFR=1

JCFLAGS += -fsigned-char

Make.user:

include /home/seth/dev/julia/julia/ARM.inc
override USE_SYSTEM_LIBM=1
@sbromberger
Copy link
Contributor Author

ccing @ViralBShah for advice and the proper tag :)

@jakebolewski jakebolewski added the system:arm ARMv7 and AArch64 label Feb 18, 2015
@sbromberger
Copy link
Contributor Author

As an additional data point: I did a completely clean build on the RPi2 and got the exact same error in the exact same place, so it's at least reproducible.

@ViralBShah
Copy link
Member

There is a typo in the first line of ARM.inc. A B has snuck into the first line, as the first character. If you fix that, does that change anything? You may have to do a complete build from scratch.

@sbromberger
Copy link
Contributor Author

Gah. Thanks, @ViralBShah - I don't know how I missed that. Rebuilding now.

@ViralBShah
Copy link
Member

Turns out that line was not used anywhere, which is why nothing ever barfed. I have removed it now. Sorry about that.

@sbromberger
Copy link
Contributor Author

So - that seems to indicate that fixing it will NOT fix this error? :(

@sbromberger
Copy link
Contributor Author

...and indeed it has no discernible effect. Same error.

@sbromberger
Copy link
Contributor Author

@ViralBShah @StefanKarpinski - can you help me understand why defining a method dispatch would result in a BoundsError on its parameters, as is apparently happening here? The offending line (102) of version.jl is as follows:

typemax(::Type{VersionNumber}) = VersionNumber(typemax(Int),typemax(Int),typemax(Int),(),("",))

@sbromberger
Copy link
Contributor Author

OK, so line 102 in the error message isn't really line 102. I added some debugging statements in base/version.jl that moved the method dispatch down a line, and it's still complaining about 102:

LoadError(at "sysimg.jl" line 170: LoadError(at "version.jl" line 102: BoundsError(a=Array{Union, 1}[], i=(1,))))

What version.jl is being used here?

@ihnorton
Copy link
Member

I pulled the latest everything and left it building earlier today (Samsung ARM Chromebook). Built fine, but this is with llvm-svn.

@sbromberger
Copy link
Contributor Author

@ihnorton do you think the issue I'm seeing is llvm-related?

@ihnorton
Copy link
Member

No idea. I'm building w 3.5 to see. Was about 1/2 through sysimg last time
I checked. Can't access right now.
On Feb 19, 2015 7:34 PM, "Seth Bromberger" notifications@github.com wrote:

@ihnorton https://github.com/ihnorton do you think the issue I'm seeing
is llvm-related?


Reply to this email directly or view it on GitHub
#10235 (comment).

@ihnorton
Copy link
Member

Worked fine.

LINK usr/lib/julia/sys.so
real    80m50.502s

@ViralBShah
Copy link
Member

I am using llvm 3.5 just fine on my ARM chromebook. It is quite possible there is some memory corruption elsewhere, or that it is running out of memory. But if it was running out, I would have expected some other error.

@sbromberger
Copy link
Contributor Author

Doubtful its OOM. I've got a gig of ram and configured a gig of swap. I was
also monitoring and didn't see any usage above 800 megs or so.

On Feb 19, 2015, at 19:49, Viral B. Shah notifications@github.com wrote:

I am using llvm 3.5 just fine on my ARM chromebook. It is quite possible
there is some memory corruption elsewhere, or that it is running out of
memory. But if it was running out, I would have expected some other error.


Reply to this email directly or view it on GitHub
#10235 (comment).

@ViralBShah
Copy link
Member

Try valgrind to see if something comes up?

@sbromberger
Copy link
Contributor Author

I don't suppose the sys.so is interchangeable?

On Feb 19, 2015, at 19:43, Isaiah notifications@github.com wrote:

Worked fine.

LINK usr/lib/julia/sys.so
real 80m50.502s


Reply to this email directly or view it on GitHub
#10235 (comment).

@ViralBShah
Copy link
Member

No harm trying!

@sbromberger
Copy link
Contributor Author

I don't know how to use valgrind.

On Feb 19, 2015, at 19:53, Viral B. Shah notifications@github.com wrote:

Try valgrind to see if something comes up?


Reply to this email directly or view it on GitHub
#10235 (comment).

@sbromberger
Copy link
Contributor Author

I tried changing ARM.inc to have --with-cpu=cortex-a7 in the LLVM_FLAGS but that didn't help.

@sbromberger
Copy link
Contributor Author

PS: I'm happy to give access to my rpi (via ssh) if anyone here would like to take a shot at getting this build working...

@ermueller2000
Copy link

I'm experiencing the same error trying to build on an oDroid:

version.jl
error during bootstrap:
LoadError(at "sysimg.jl" line 171: LoadError(at "version.jl" line 102: BoundsError(a=Array{Union, 1}[], i=(1,))))

make[1]: *** [/home/odroid/julia/usr/lib/julia/sys0.o] Error 1
make: *** [julia-sysimg-release] Error 2

@sbromberger
Copy link
Contributor Author

I upgraded from wheezy to jessie, and now the build fails at a different point:

    CC src/debuginfo.o
debuginfo.cpp: In function ‘int jl_get_llvmf_info(uint64_t, uint64_t*, uint64_t*, const llvm::object::ObjectFile**)’:
debuginfo.cpp:812:30: error: ‘struct ObjectInfo’ has no member named ‘slide’
         *slide = fit->second.slide;
                              ^
Makefile:62: recipe for target 'debuginfo.o' failed
make[1]: *** [debuginfo.o] Error 1
Makefile:73: recipe for target 'julia-src-release' failed
make: *** [julia-src-release] Error 2

llvm as installed by apt:

ii libllvm3.5:armhf 1:3.5-8 armhf Modular compiler and toolchain technologies, runtime library

This was in the section

ar: creating libmojibake.a

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2015

@sbromberger that one's actually not an ARM-specific problem, see #10376

@sbromberger
Copy link
Contributor Author

@tkelman ah, cool - I guess I have to wait for that PR to be merged before I can retry?

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2015

Yeah, unless you want to try building 3.6 on your rpi, or there's a deb/ppa for 3.6 on arm somewhere.

edit: you could also try checking out the commit immediately preceding 1d0b067

@sbromberger
Copy link
Contributor Author

Thanks. 3.6 doesn't appear in an apt-cache search so I guess I'll wait.

@ViralBShah
Copy link
Member

You could try changing the LLVM_VER in ARM.inc and leaving the build for a couple of days. :-)

@sbromberger
Copy link
Contributor Author

Hah. The bug might well be fixed before the compile finishes :)

@sbromberger
Copy link
Contributor Author

@ViralBShah is there a way to limit the number of julia processes spawned during make testall?

@tkelman
Copy link
Contributor

tkelman commented Apr 22, 2015

make testall1 or JULIA_CPU_CORES=3 make testall

@sbromberger
Copy link
Contributor Author

@tkelman thanks. The RPi2 ran out of memory (and swap) with normal testall.

@baruchel
Copy link

Hi, could you give a link for downloading these Julia binaries for RPI2? I would ne happy to have a test. Thank you again, regards.

@sbromberger
Copy link
Contributor Author

So close with testall:

seth@raspberrypi ~/dev/julia $ tail -f ~/testall.out
    JULIA test/all
     * linalg1              in 906.94 seconds
     * linalg2              in 800.58 seconds
     * linalg3              in 349.31 seconds
     * linalg4              in 120.78 seconds
     * linalg/lapack        in  57.05 seconds
     * linalg/triangular    in 5821.59 seconds
     * linalg/tridiag       in 244.85 seconds
     * linalg/bidiag        in 430.95 seconds
     * linalg/diagonal      in 258.59 seconds
     * linalg/pinv         exception on 1: ERROR: LoadError: assertion failed: |vecnorm(a * x - b) / vecnorm(b) - 0| <= 0.01
  vecnorm(a * x - b) / vecnorm(b) = 0.012504161841088008
  0 = 0
  difference = 0.012504161841088008 > 0.01
while loading linalg/pinv.jl, in expression starting on line 115
ERROR: LoadError: LoadError: assertion failed: |vecnorm(a * x - b) / vecnorm(b) - 0| <= 0.01
  vecnorm(a * x - b) / vecnorm(b) = 0.012504161841088008
  0 = 0
  difference = 0.012504161841088008 > 0.01
while loading linalg/pinv.jl, in expression starting on line 115
while loading /home/seth/dev/julia/test/runtests.jl, in expression starting on line 3

Makefile:9: recipe for target 'all' failed
make[1]: *** [all] Error 1
Makefile:492: recipe for target 'testall1' failed
make: *** [testall1] Error 2

@ihnorton
Copy link
Member

Something is probably still not set correctly for OpenBLAS.

@sbromberger
Copy link
Contributor Author

@baruchel - https://www.dropbox.com/s/6cuvzv2z1knbxah/julia-0.4-rpi2.tgz?dl=0 is the full (350+MB) build dir, and https://www.dropbox.com/s/l97tv8zm97vxcag/julia?dl=0 is the julia binary by itself (though I can't imagine it would work without supporting libs).

@physicsd00d
Copy link

@baruchel
If you'd like to try to replicate this (in case the julia binaries don't work), I got the llvm binaries from here: http://llvm.org/releases/download.html
Copy all of the contents into your /usr/local directory or, like me, make a new directory and point your PATH and LD_LIBRARY_PATH to them. So for instance, I made a new directory /sw, copied the extracted contents of the llvm tarball to /sw and then added this to my .bashrc

export PATH=/sw/bin/:$PATH
export LD_LIBRARY_PATH=/sw/lib/:$LD_LIBRARY_PATH

@ViralBShah
This odroid XU3 is running Lubuntu.

odroid@odroid:~/tcolvin/julia$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.1 LTS
Release:    14.04
Codename:   trusty


odroid@odroid:~/tcolvin/julia$ clang -v
clang version 3.6.0 (tags/RELEASE_360/final)
Target: armv7l-unknown-linux-gnueabihf
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.7
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.7.3
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.8
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.8.2
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.9
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.9.0
Selected GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.8
Candidate multilib: .;@m32
Selected multilib: .;@m32


julia> versioninfo()
Julia Version 0.4.0-dev+4434
Commit 86ba78f* (2015-04-21 22:08 UTC)
Platform Info:
  System: Linux (arm-linux-gnueabihf)
  CPU: ARMv7 Processor rev 3 (v7l)
  WORD_SIZE: 32
  BLAS: libblas
  LAPACK: liblapack
  LIBM: libm
  LLVM: libLLVM-3.6.0

And unfortunately my testall failed as well but different from @sbromberger's error. Mine is due to a missing module?

    JULIA test/all
exception on 6: ERROR: LoadError: error compiling naupd: could not load module libarpack: no error
while loading linalg/arnoldi.jl, in expression starting on line 6
error in running finalizer: InterruptException()
error in running finalizer: InterruptException()
error in running finalizer: InterruptException()
error in running finalizer: InterruptException()
error in running finalizer: InterruptException()
ERROR: LoadError: LoadError: error compiling naupd: could not load module libarpack: no error
while loading linalg/arnoldi.jl, in expression starting on line 6
while loading /home/odroid/tcolvin/julia/test/runtests.jl, in expression starting on line 3
WARNING: Forcibly interrupting busy workers
WARNING: Unable to terminate all workers
    From worker 8:       * linalg/tridiag       in  83.74 seconds
    From worker 6:       * linalg/lapack        in 130.61 seconds
    From worker 5:       * linalg4              in 143.60 seconds
    From worker 5:       * linalg/givens        in  49.48 seconds
    From worker 9:       * linalg/bidiag        in 285.58 seconds
    From worker 4:       * linalg3              in 342.58 seconds
    From worker 6:       * linalg/pinv          in 248.57 seconds
    From worker 8:       * linalg/diagonal      in 340.89 seconds
    From worker 4:       * linalg/symmetric     in 227.28 seconds
    From worker 9:       * linalg/lu            in 370.00 seconds
    From worker 5:       * linalg/cholesky      in 683.52 seconds
    From worker 3:       * linalg2              in 891.26 seconds
    From worker 8:       * core                 in 602.30 seconds
    From worker 2:       * linalg1              in 1046.53 seconds
    From worker 7:       * linalg/triangular    in 4483.00 seconds

    From worker 6:       * linalg/arnoldi      make[1]: *** [all] Error 1
make: *** [testall] Error 2
exception on 7: ERROR: InterruptException:

@sbromberger
It looks like my pinv test passed where yours had failed?

Since those tests took so long -- only to fail -- is it possible for me to comment out the ones that I think worked without messing up any other tests to be run downstream? I'd like to speed up subsequent testalls. I'll try to mess with it later tonight.

@ViralBShah
Copy link
Member

Can someone submit a patch to Make.arm? We need to figure out what LLVM build flags are used with 3.6.0 and update those in Make.arm.

On tests, I usually run the tests all one at a time. See various arm tagged issues, and file a new one if required. IIRC, on my setup, all linear algebra tests passed.

@ihnorton
Copy link
Member

I updated the build on my chromebook yesterday. The current Make.arm works
fine, modulo known test failures, with LLVM 3.6 on the intended
architecture
. If changes are made, please make them conditional on the
architecture.

On Wed, Apr 22, 2015 at 2:06 PM, Viral B. Shah notifications@github.com
wrote:

Can someone submit a patch to Make.arm? We need to figure out what LLVM
build flags are used with 3.6.0 and update those in Make.arm.

On tests, I usually run the tests all one at a time. See various arm
tagged issues, and file a new one if required. IIRC, on my setup, all
linear algebra tests passed.


Reply to this email directly or view it on GitHub
#10235 (comment).

@ViralBShah
Copy link
Member

Should we update Make.arm to use llvm 3.6.0?

@sbromberger
Copy link
Contributor Author

It didn't work. I had to use the precompiled bins from llvm.org.

On Apr 22, 2015, at 11:57, Viral B. Shah notifications@github.com wrote:

Should we update Make.arm to use llvm 3.6.0?


Reply to this email directly or view it on GitHub
#10235 (comment).

@ViralBShah
Copy link
Member

@sbromberger Can you provide a PR to README.arm.md providing these alternate steps? I am building on a scaleway.com arm machine, and can't get sys.ji to build. I have built LLVM 3.6.0 from source. BTW, scaleway arm machines are free until May 1.

error during bootstrap:
LoadError(at "sysimg.jl" line 145: LoadError(at "primes.jl" line 76: InexactError()))

@vtjnash
Copy link
Member

vtjnash commented Apr 24, 2015

that seems to be due to an autodetection issue with llvm. the build passes for me with the following patch:

JULIA_CPU_TARGET=cortex-a8
diff --git a/src/codegen.cpp b/src/codegen.cpp
index 3883359..4f7378d 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -5419,7 +5419,7 @@ extern "C" void jl_init_codegen(void)
 #ifdef USE_MCJIT
     jl_mcjmm = new SectionMemoryManager();
 #endif
-    const char *mattr[] = {
+    const char *mattr[] = { "-neon",
 #ifndef USE_MCJIT
         // Temporarily disable Haswell BMI2 features due to LLVM bug.
         "-bmi2", "-avx2",

@sbromberger
Copy link
Contributor Author

@ViralBShah will propose a PR this morning. Stand by.

ETA: #10986

@daanhb
Copy link
Contributor

daanhb commented Apr 27, 2015

I have succesfully built Julia on a RPi2 using the instructions above. For completeness, make testall1 fails in the exact same way for me as for @sbromberger above, with a failed assertion on vecnorm in linalg/pinv.jl.

ViralBShah pushed a commit that referenced this issue Apr 27, 2015
Per #10235 (comment), provide build instructions for Raspberry Pi 2.

Update README.arm.md
@daanhb
Copy link
Contributor

daanhb commented Apr 28, 2015

I did some more tests. As for existing issues, I can reproduce #10124 and #10127. I do not see JuliaLang/LinearAlgebra.jl#181 in my build. I am now trying something similar to #8402 to get a complete picture.

In the next few weeks we will be building a small cluster of Raspberry Pi's, for demonstration purposes. It is a poor man's supercomputer... It would be cool to have julia working for some demos - that is my interest here.

@ViralBShah
Copy link
Member

This is probably the same issue as #10917, where the CPU is getting detected wrongly.

@vtjnash Should we use the same architecture on RPi2 as RPi1?

@ihnorton
Copy link
Member

RPi1 is ARMv6, RPi2 is ARMv7.

@vtjnash
Copy link
Member

vtjnash commented May 12, 2015

true, but Viral is probably correct that it would work (as I understand it, the RPi2 system was designed to be highly compatible with the RPi1 system)

/proc/cpuinfo says RPi2 is a 0xc07
https://www.reddit.com/r/raspberry_pi/comments/2ulbv3/proccpuinfo_for_the_raspberry_pi_2/

llvm lib/Support/Host.cpp does not list this, so it gets detected as "generic"

llvm lib/Target/ARM/ARM.td reveals that "generic" does not have a vfp, which make the calling convention incorrect (and the processor feature set selection suboptimal)

@daanhb
Copy link
Contributor

daanhb commented May 13, 2015

For completeness, I did run a lot of tests two weeks ago on a Pi 2. The output was similar to #8402, but slightly better. I found more errors by uncommenting tests that failed. In addition to errors that have been reported, I encountered at least the following: two segmentation faults (strings.jl and readdlm.jl), an LLVM error (bitarray.jl), various issues with rationalize() (numbers.jl), and a 'mismatch of non-finite elements' error (pinv.jl).
I did not want to start a flood of issues and I haven't had time since to do a debug build, but I did keep output of tests and can provide them if anybody is interested (if so, where?).
Btw, our Pi cluster has been assembled in the meantime, and with 17 nodes and 68 cores it is still cheaper than my laptop. (Not sure yet whether it is actually also faster.)

@ViralBShah
Copy link
Member

Many of these are already filed as separate issues, with the arm tag. Anything else not already captured should probably be filed as a separate bug.

@baruchel
Copy link

I am aware that you already, all of you, did a great job! Since Julia seems to be already usable for some simple tasks on the RPI2, do you think that some "release" package could be made available? Either some statically-linked binary or some archive containing only the binary with the necessary libraries (in some directory ready to be uncompressed into /opt/julia for instance)? That would be really useful. Thank you again to all, best regards, b.

@ViralBShah
Copy link
Member

It is certainly on my TODO. However, the distribution package was crashing (even though the source was working). Will try again. Once I have something working, I will put it out.

I guess we can announce alpha support for arm in 0.4 as well.

mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
Per JuliaLang#10235 (comment), provide build instructions for Raspberry Pi 2.

Update README.arm.md
@ViralBShah
Copy link
Member

I suspect that the build should be a lot smoother now. It would be nice to see if make just works for RPi2, or if LLVM binaries are still required to get this to work.

@daanhb
Copy link
Contributor

daanhb commented Sep 1, 2015

Just to confirm: a regular make just works on my Raspberry Pi 2 and leads to a functional REPL. LLVM 3.6.1 was built in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64
Projects
None yet
Development

No branches or pull requests