Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve arch/cpu detection/selection on ARM and AArch64 #18100

Merged
merged 2 commits into from
Aug 18, 2016
Merged

Conversation

yuyichao
Copy link
Contributor

// This is the most reliable way I can find
// `/proc/cpuinfo` changes between kernel versions
struct utsname name;
if (uname(&name) >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we suggest this to llvm too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially by including this in the feature detection function. The LLVM cpu/feature separation seems to be inconsistent with other targets since the cpu list doesn't include any generic target (other than the "generic" target itself). This is possibly related to the difficulty to determine the base instruction set. I'll probably open an LLVM issue asking about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 18, 2016

Trying to build this branch on scaleway:

    JULIA usr/lib/julia/inference.ji
terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error
Aborted

@ViralBShah ViralBShah added the system:arm ARMv7 and AArch64 label Aug 18, 2016
@yuyichao yuyichao force-pushed the yyc/threads/arm branch 2 times, most recently from dc36640 to 6e84766 Compare August 18, 2016 06:38
@yuyichao
Copy link
Contributor Author

The regex error is a gcc bug (IIUC it's regex support isn't complete until a later version, either 4.9 or 5.0) Should be fixed now since it's not using regex anymore.

@yuyichao
Copy link
Contributor Author

What platform do we want to build the ARM binary for?

According to https://build.julialang.org/builders/package_tarballarm/builds/700/steps/make/logs/stdio it's currently using the CFLAGS/CXXFLAGS -march=armv7-a so the binary is actually armv7a only and after this PR the sysimg and JIT will also require armv7 on that setup (after all the libjulia code can't run on older hardware anyway).

Personally I only care about armv7+ (possibly armv7-r in additional to armv7-a) but RPI0 and RPI1 B+ are both armv6.

@ViralBShah
Copy link
Member

I am ok with armv7. At least that is what all of us have access to test on. If there is a lot of demand for armv6, we can always do it later.

@yuyichao yuyichao force-pushed the yyc/threads/arm branch 2 times, most recently from c646bbb to 32f54b1 Compare August 18, 2016 12:29
@yuyichao
Copy link
Contributor Author

Fair enough. I updated the readme to mention that the ARM binary we provide is armv7-a only. Also adds auto detection of known A profile so that the arm buildbot should automatically build sysimg with A profile too.

yuyichao and others added 2 commits August 18, 2016 21:04
* Allow `cpu_target` to specify a generic arch, matching the behavior on x86
* Detect the CPU arch version with `uname`
* Require `armv6`

    Close #13270 (`armv5` is not supported)
    Fix #18042

* Remove warning about generic arch since it's not really useful

    Fix #17549

* Require at least the same ARM arch version and profile the C code is compiled with
@ViralBShah
Copy link
Member

Still crashes for me in the building of the system image like before now, but hopefully llvm 3.9 will fix that.

// armv8-m.base, armv8-m.main
//
// Supported AArch64 arch names on LLVM 3.8:
// armv8.1a, armv8.2a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's too bad llvm felt ProcDesc needed to be private: http://llvm.org/docs/doxygen/html/MCSubtargetInfo_8h_source.html#l00033

or we could have auto-generated this list for the user on-demand and provided useful help messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is auto-generated with llc and is just listed here for better reference. We also support generating this list with julia -C help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, not having access to this list at runtime (other than letting LLVM printing a help message) mean that some of the logic above has to be hard coded instead of going through a fallback list and printing the cpu/feature not-recognized warning once instead of once every codegen....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that worked. In llvm 3.3, it also called exit, but now it continues, so I guess we could intentionally create a TargetMachine("help") for the user. As it is now, we print the help several times, then error (probably because it fell back to "generic" arch)

@vtjnash vtjnash merged commit 04047f6 into master Aug 18, 2016
@vtjnash vtjnash deleted the yyc/threads/arm branch August 18, 2016 21:12
tkelman pushed a commit that referenced this pull request Aug 20, 2016
* Allow `cpu_target` to specify a generic arch, matching the behavior on x86
* Detect the CPU arch version with `uname`
* Require `armv6`

    Close #13270 (`armv5` is not supported)
    Fix #18042

* Remove warning about generic arch since it's not really useful

    Fix #17549

* Require at least the same ARM arch version and profile the C code is compiled with

(cherry picked from commit 760bc41)
ref #18100
tkelman added a commit that referenced this pull request Aug 20, 2016
(cherry picked from commit 5977167)
ref #18100
@tkelman
Copy link
Contributor

tkelman commented Aug 23, 2016

Bisecting on release-0.5, the backport of this is causing segfaults during bootstrap repeatably. Not always at the exact same place, but early on:

    JULIA usr/lib/julia/inference.ji
essentials.jl

signal (11): Segmentation fault
while loading essentials.jl, in expression starting on line 190
Allocations: 24337 (Pool: 24337; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl

signal (11): Segmentation fault
while loading int.jl, in expression starting on line 193
Allocations: 100302 (Pool: 100302; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl
operators.jl
pointer.jl
abstractarray.jl
array.jl

signal (11): Segmentation fault
while loading array.jl, in expression starting on line 167
Allocations: 206520 (Pool: 206519; Big: 1); GC: 2
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl

signal (11): Segmentation fault
while loading essentials.jl, in expression starting on line 190
Allocations: 24337 (Pool: 24337; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl

signal (11): Segmentation fault
while loading int.jl, in expression starting on line 193
Allocations: 100302 (Pool: 100302; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl
operators.jl
pointer.jl
abstractarray.jl
array.jl
hashing.jl
nofloat_hashing.jl
reduce.jl
intset.jl
dict.jl
iterator.jl
docs/core.jl
inference.jl

signal (11): Segmentation fault
while loading inference.jl, in expression starting on line 3524
Allocations: 340809 (Pool: 340805; Big: 4); GC: 3
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl

signal (11): Segmentation fault
while loading essentials.jl, in expression starting on line 190
Allocations: 24337 (Pool: 24337; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl

signal (11): Segmentation fault
while loading int.jl, in expression starting on line 193
Allocations: 100469 (Pool: 100469; Big: 0); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl

signal (11): Segmentation fault
while loading int.jl, in expression starting on line 330
Allocations: 135198 (Pool: 135197; Big: 1); GC: 1
Segmentation fault
make[1]: *** [/home/tkelman/julia-0.5/usr/lib/julia/inference.ji] Error 139
make: *** [julia-inference] Error 2
    JULIA usr/lib/julia/inference.ji
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl

signal (11): Segmentation fault
while loading int.jl, in expression starting on line 193
Allocations: 100469 (Pool: 100469; Big: 0); GC: 1
Segmentation fault

@yuyichao
Copy link
Contributor Author

This only affects the flags we pass to llvm so it's most likely LLVM bug.

Backtrace?

@tkelman
Copy link
Contributor

tkelman commented Aug 24, 2016

@yuyichao
Copy link
Contributor Author

Turn on KEEP_BODIES (in options.h), use julia-debug if possible

Dump the llvm ir of the segfaulting function with p jl_dump_llvm_value(jl_function_ptr_by_llvm_name("<function_name_in_gdb>"))
Dump the asm with disassemble $pc
Dump the AST and the arguments of the function call with p jl_(meth), p jl_(jl_uncompress_ast(meth, meth->code)) , p jl_(args[<0 to 2>]) in the jl_call_method_internal frame.

Last time I saw this issue on scaleway somehow passed the wrong argument for a expression splicing.

@tkelman
Copy link
Contributor

tkelman commented Aug 24, 2016

@tkelman
Copy link
Contributor

tkelman commented Aug 24, 2016

here's master, segfaults earlier https://gist.github.com/92a510dad180b42779e1e29677652d67

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

llvm must have relocated jl_get_ptls_states wrong? The IR looks a bit strange there. And it looks like we shouldn't have needed to emit the ptls either, but that's a separate issue

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

Oh wait, sorry. That's supposed to be the ssp

@yuyichao
Copy link
Contributor Author

As Jameson said, it seems that LLVM uses the wrong address for __stack_chk_guard, what's __stack_chk_guard and __stack_chk_fail? Also, maybe check with release build sincethat shouldn't have the stack protection....

@tkelman
Copy link
Contributor

tkelman commented Aug 24, 2016

release build of master https://gist.github.com/6a6b96e0cfdf50407810eb614e742ad1

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

In that one, it's clearly trying to load the GOT for jl_get_ptls_states from 0x81b300. Why isn't this a PIC value?

@yuyichao
Copy link
Contributor Author

This really look like the relocation/code model issue we need to workaround on ARM by turning off fastisel.

Maybe you can check what flags are we passing to LLVM?

diff --git a/src/codegen.cpp b/src/codegen.cpp
index 2789963..07406e3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -5732,12 +5732,16 @@ static inline SmallVector<std::string,10> getTargetFeatures(std::string &cpu)
     }
 #endif

+    jl_safe_printf("CPU: %s\nFeatures: ", cpu.c_str());
+
     SmallVector<std::string,10> attr;
     for (StringMap<bool>::const_iterator it = HostFeatures.begin(); it != HostFeatures.end(); it++) {
         std::string att = it->getValue() ? it->getKey().str() :
                           std::string("-") + it->getKey().str();
+        jl_safe_printf("%s, ", att.c_str());
         attr.append(1, att);
     }
+    jl_safe_printf("\n");
     return attr;
 }

@tkelman
Copy link
Contributor

tkelman commented Aug 25, 2016

    LINK usr/bin/julia
    JULIA usr/lib/julia/inference.ji
CPU: generic
Features: v7, aclass, fp16, vfp2, vfp3,
essentials.jl

@yuyichao
Copy link
Contributor Author

Hmmm, why is fp16 and vfp3 in there. Isn't this the generic build on the buildbot? The auto detection shouldn't put these two options in there unless -C native is used.

@yuyichao
Copy link
Contributor Author

Also, after specifying -C armv6 maybe try using sth like

        if (att == "v7")
            continue;

in front of the jl_safe_printf("%s, ", att.c_str()); to see which option is causing the issue. Don't remove vfp2 and try not remove v6 unless none of the other ones have an effect.

@tkelman
Copy link
Contributor

tkelman commented Aug 25, 2016

-C native is being used. I don't think we have generic build options working properly on the arm buildbot yet. We definitely aren't doing the right old distro, new gcc thing that we do on x86/amd64 linux.

@tkelman
Copy link
Contributor

tkelman commented Aug 25, 2016

Ah my mistake, the above backtraces were from just a normal make, don't think I had anything in Make.user there. The buildbot is setting -C generic from JULIA_CPU_TARGET=generic MARCH=armv7-a on the command line.

@ViralBShah
Copy link
Member

Is the MARCH setting messing up the scaleway arm?

@tkelman
Copy link
Contributor

tkelman commented Aug 31, 2016

I'm going to revert this for now on release-0.5 until we figure out why it's causing problems.

@ViralBShah
Copy link
Member

I set the CPU to generic, and tried disabling aclass and also v7, but that just put the crash elsewhere.

 cd /home/viral/julia/base && /home/viral/julia/usr/bin/julia-debug -C generic --output-ji /home/viral/julia/usr/lib/julia/inference.ji --startup-file=no -g0 -O0 coreimg.jl
CPU: generic
Features: aclass, v7, vfp2, 
essentials.jl
generator.jl
reflection.jl
options.jl
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
bool.jl
number.jl
int.jl
operators.jl

signal (11): Segmentation fault
while loading operators.jl, in expression starting on line 872
Allocations: 154334 (Pool: 154333; Big: 1); GC: 2
Segmentation fault (core dumped)
Makefile:215: recipe for target '/home/viral/julia/usr/lib/julia/inference.ji' failed
make[1]: *** [/home/viral/julia/usr/lib/julia/inference.ji] Error 139

@ViralBShah
Copy link
Member

I successfully built on arm with 0.5-rc4 on scaleway, but can't build on master, which seems to be due to this PR - which @tkelman reverted on release-0.5.

@ViralBShah
Copy link
Member

Both, master (which has this PR) and release-0.5 (which does not) fail to build on my Cortex A15 Chromebook. The failure is always in the system image step, but at different points.

@ViralBShah
Copy link
Member

On the cortex A15, using JULIA_CPU_TARGET=native with release-0.5 is trouble, and using generic gets me further. If I force -O0 I can get a bit further.

staticfloat pushed a commit that referenced this pull request May 5, 2017
* Allow `cpu_target` to specify a generic arch, matching the behavior on x86
* Detect the CPU arch version with `uname`
* Require `armv6`

    Close #13270 (`armv5` is not supported)
    Fix #18042

* Remove warning about generic arch since it's not really useful

    Fix #17549

* Require at least the same ARM arch version and profile the C code is compiled with

(cherry picked from commit d9f5334 from PR #18100)
staticfloat pushed a commit that referenced this pull request May 5, 2017
* Allow `cpu_target` to specify a generic arch, matching the behavior on x86
* Detect the CPU arch version with `uname`
* Require `armv6`

    Close #13270 (`armv5` is not supported)
    Fix #18042

* Remove warning about generic arch since it's not really useful

    Fix #17549

* Require at least the same ARM arch version and profile the C code is compiled with

(cherry picked from commit 760bc41 and PR #18100)
staticfloat pushed a commit that referenced this pull request May 5, 2017
(cherry picked from commit 5977167 and PR #18100)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants