Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation with GCC-9 on OSX Catalina #2303

Closed
ungur opened this issue Nov 4, 2019 · 13 comments
Closed

compilation with GCC-9 on OSX Catalina #2303

ungur opened this issue Nov 4, 2019 · 13 comments

Comments

@ungur
Copy link

ungur commented Nov 4, 2019

Compilation of the v0.3.7 and latest develop branch (commit eb2eddf) is not successful on OSX -10.15.1 (Catalina). XCode 11.2.
GNU Fortran (MacPorts gcc9 9.2.0_1) 9.2.0

One important selected option: "INTERFACE64 = 1", was set in Makefile.rule. Everything else was by default.
The following error appeared:

ranlib ../../../libopenblas_haswellp-r0.3.8.dev.a
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C utils
ar -ru ../../../libopenblas_haswellp-r0.3.8.dev.a lapacke_cgb_nancheck.o lapacke_cgb_trans.o lapacke_cge_nancheck.o lapacke_cge_trans.o lapacke_cgg_nancheck.o lapacke_cgg_trans.o lapacke_cgt_nancheck.o lapacke_chb_nancheck.o lapacke_chb_trans.o lapacke_che_nancheck.o lapacke_che_trans.o lapacke_chp_nancheck.o lapacke_chp_trans.o lapacke_chs_nancheck.o lapacke_chs_trans.o lapacke_c_nancheck.o lapacke_cpb_nancheck.o lapacke_cpb_trans.o lapacke_cpf_nancheck.o lapacke_cpf_trans.o lapacke_cpo_nancheck.o lapacke_cpo_trans.o lapacke_cpp_nancheck.o lapacke_cpp_trans.o lapacke_cpt_nancheck.o lapacke_csp_nancheck.o lapacke_csp_trans.o lapacke_cst_nancheck.o lapacke_csy_nancheck.o lapacke_csy_trans.o lapacke_ctb_nancheck.o lapacke_ctb_trans.o lapacke_ctf_nancheck.o lapacke_ctf_trans.o lapacke_ctp_nancheck.o lapacke_ctp_trans.o lapacke_ctr_nancheck.o lapacke_ctr_trans.o lapacke_dgb_nancheck.o lapacke_dgb_trans.o lapacke_dge_nancheck.o lapacke_dge_trans.o lapacke_dgg_nancheck.o lapacke_dgg_trans.o lapacke_dgt_nancheck.o lapacke_dhs_nancheck.o lapacke_dhs_trans.o lapacke_d_nancheck.o lapacke_dpb_nancheck.o lapacke_dpb_trans.o lapacke_dpf_nancheck.o lapacke_dpf_trans.o lapacke_dpo_nancheck.o lapacke_dpo_trans.o lapacke_dpp_nancheck.o lapacke_dpp_trans.o lapacke_dpt_nancheck.o lapacke_dsb_nancheck.o lapacke_dsb_trans.o lapacke_dsp_nancheck.o lapacke_dsp_trans.o lapacke_dst_nancheck.o lapacke_dsy_nancheck.o lapacke_dsy_trans.o lapacke_dtb_nancheck.o lapacke_dtb_trans.o lapacke_dtf_nancheck.o lapacke_dtf_trans.o lapacke_dtp_nancheck.o lapacke_dtp_trans.o lapacke_dtr_nancheck.o lapacke_dtr_trans.o lapacke_lsame.o lapacke_sgb_nancheck.o lapacke_sgb_trans.o lapacke_sge_nancheck.o lapacke_sge_trans.o lapacke_sgg_nancheck.o lapacke_sgg_trans.o lapacke_sgt_nancheck.o lapacke_shs_nancheck.o lapacke_shs_trans.o lapacke_s_nancheck.o lapacke_spb_nancheck.o lapacke_spb_trans.o lapacke_spf_nancheck.o lapacke_spf_trans.o lapacke_spo_nancheck.o lapacke_spo_trans.o lapacke_spp_nancheck.o lapacke_spp_trans.o lapacke_spt_nancheck.o lapacke_ssb_nancheck.o lapacke_ssb_trans.o lapacke_ssp_nancheck.o lapacke_ssp_trans.o lapacke_sst_nancheck.o lapacke_ssy_nancheck.o lapacke_ssy_trans.o lapacke_stb_nancheck.o lapacke_stb_trans.o lapacke_stf_nancheck.o lapacke_stf_trans.o lapacke_stp_nancheck.o lapacke_stp_trans.o lapacke_str_nancheck.o lapacke_str_trans.o lapacke_xerbla.o lapacke_zgb_nancheck.o lapacke_zgb_trans.o lapacke_zge_nancheck.o lapacke_zge_trans.o lapacke_zgg_nancheck.o lapacke_zgg_trans.o lapacke_zgt_nancheck.o lapacke_zhb_nancheck.o lapacke_zhb_trans.o lapacke_zhe_nancheck.o lapacke_zhe_trans.o lapacke_zhp_nancheck.o lapacke_zhp_trans.o lapacke_zhs_nancheck.o lapacke_zhs_trans.o lapacke_z_nancheck.o lapacke_zpb_nancheck.o lapacke_zpb_trans.o lapacke_zpf_nancheck.o lapacke_zpf_trans.o lapacke_zpo_nancheck.o lapacke_zpo_trans.o lapacke_zpp_nancheck.o lapacke_zpp_trans.o lapacke_zpt_nancheck.o lapacke_zsp_nancheck.o lapacke_zsp_trans.o lapacke_zst_nancheck.o lapacke_zsy_nancheck.o lapacke_zsy_trans.o lapacke_ztb_nancheck.o lapacke_ztb_trans.o lapacke_ztf_nancheck.o lapacke_ztf_trans.o lapacke_ztp_nancheck.o lapacke_ztp_trans.o lapacke_ztr_nancheck.o lapacke_ztr_trans.o lapacke_make_complex_float.o lapacke_make_complex_double.o
ranlib ../../../libopenblas_haswellp-r0.3.8.dev.a
touch libopenblas_haswellp-r0.3.8.dev.a
/Applications/Xcode.app/Contents/Developer/usr/bin/make -j 1 -C test all
gfortran -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8  -mavx2  -o sblat1 sblat1.o ../libopenblas_haswellp-r0.3.8.dev.a -lpthread -lgfortran -lpthread -lgfortran -L/usr/local/lib  -lto_library -lSystem  /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.0/lib/darwin/libclang_rt.osx.a 
gfortran -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8  -mavx2  -o dblat1 dblat1.o ../libopenblas_haswellp-r0.3.8.dev.a -lpthread -lgfortran -lpthread -lgfortran -L/usr/local/lib  -lto_library -lSystem  /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.0/lib/darwin/libclang_rt.osx.a 
gfortran -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8  -mavx2  -o cblat1 cblat1.o ../libopenblas_haswellp-r0.3.8.dev.a -lpthread -lgfortran -lpthread -lgfortran -L/usr/local/lib  -lto_library -lSystem  /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.0/lib/darwin/libclang_rt.osx.a 
gfortran -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8  -mavx2  -o zblat1 zblat1.o ../libopenblas_haswellp-r0.3.8.dev.a -lpthread -lgfortran -lpthread -lgfortran -L/usr/local/lib  -lto_library -lSystem  /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.0/lib/darwin/libclang_rt.osx.a 
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x10ce130d2
#1  0x10ce13870
#2  0x7fff63e83b1c
#3  0x10ced2900
#4  0x10cee2077
#5  0x10cdfe52e
#6  0x10ce059ce
make[1]: *** [level1] Segmentation fault: 11
make: *** [tests] Error 2
@martin-frbg
Copy link
Collaborator

Any warnings earlier in the build, particularly around building of the sgemm kernel, or any chance you could rebuild with the additional option DEBUG=1 (which should hopefully provide a better backtrace) ?
I do not see any problems with gcc 9.1 on Linux (did not get around to installing 9.2 yet)

@ungur
Copy link
Author

ungur commented Nov 4, 2019

Please find attached the compilation log and the warnings and errors during compilation. I used DEBUG=1 option defined in the Makefile.rule.

The compilation looks OK, but the verification of the compiled library fails.
As example, running the sblat2 test:

OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat2 < ./sblat2.dat

produces the following error:
At line 124 of file sblat2.f (unit = 6, file = 'stdout')
Fortran runtime error: Bad STATUS parameter in OPEN statement

The line 124 of the sblat2.f is:
124 OPEN( NOUT, FILE = SUMMRY, STATUS = 'NEW' )

This line of Fortran looks ok for me... Removing the "STATUS = 'NEW' " option propagates the error to a later position, and finally ends with a general segmentation fault.

Also, the folliwing error is there:
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x10ba270d2
#1 0x10ba27870
#2 0x7fff63e83b1c
#3 0x10bae6900
#4 0x10baf6077
#5 0x10ba066f1
#6 0x10ba06959
zsh: segmentation fault OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

compilation.log
compilation.err.txt

@martin-frbg
Copy link
Collaborator

STATUS='NEW' is valid fortran, it will generate a runtime error when the file already exists (SUMMRY is SBLAT2.SUMM here, all the BLAS2 and 3 tests produce corresponding .SUMM files). Strange that the backtrace for the sblat1 crash would not contain function names with DEBUG=1, which adds the -g option to the compiler flags. Could be all steps are inside libc or some other system library for which no debugging information is available though.

@martin-frbg
Copy link
Collaborator

Not reproducible on Linux with gcc 9.2.0 either (and valgrind does not find any signs of illegal memory accesses there). Travis CI shows no suspicious build warnings but unfortunately does not appear to offer Haswell cpu features to its osx instances. A dynamic_arch build with nehalem for the build host target runs all tests without fault, though some of the results are intermixed with runtime warnings about IEEE fp signals for division-by-zero. (I beiieve these are caused by generic test code in LAPACK that deliberately performs a division by zero to see if the system handles it in a standard-conforming way.)

@ungur
Copy link
Author

ungur commented Nov 4, 2019

The latest upgrade of OSX- Catalina discontinues all 32-bit applications. This means that all 32-bit applications and libraries will not function correctly. This was the reason why many of the famous software (MS Word, Mathematica, etc.) released special updates/releases to comply with this drastic change of the new OSX. I wonder if the reported issue is somehow connected with this change of the new OSX.
Also, another big change of this upgrade was that the standard shell language of the terminal is now "zsh" in place of previous "bash".

@martin-frbg
Copy link
Collaborator

Thanks for the pointer, but I do not see how this could be related. (If anything one might argue that it could make INTERFACE64 a requirement if it also implied changing all int to long). BTW does a build without the INTERFACE64=1 work on your system ?

@ungur
Copy link
Author

ungur commented Nov 4, 2019

No. Leaving everything in the Makefile.rule by default gives a similar result.

Can it be related to avx512?
I am not sure my system has it. My system has avx1.0 and avx2 only.

machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MDCLEAR IBRS STIBP L1DF SSBD
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI

@martin-frbg
Copy link
Collaborator

Could it be picking up a wrong dynamic library at runtime ? (It is a long shot, but the sblat1 binary is linked dynamically against libgfortran.so etc). Another thing to try would be specfiying a TARGET like SANDYBRIDGE or NEHALEM to check if the problem is linked to specific gemm kernel features, e.g. AVX2 (I am now trying to build for Sandybridge on Travis in the hope that the virtual
hardware is capable of running Sandybridge code)

@martin-frbg
Copy link
Collaborator

No, not AVX512 (that would be "SkylakeX" target, i.e. recent Xeon cpu. Yours appears to be correctly autodetected as having AVX2 (Haswell, Kaby Lake or similar refreshes of the original Haswell architecture). I assume you are building on actual hardware, or is the build running in a virtual machine ?

@ungur
Copy link
Author

ungur commented Nov 4, 2019

No. I build it on my laptop.

Here are the libraries which are linked dynamically to sblat1 and sblat2 tests:

otool -L sblat1
sblat1:
/usr/local/lib/libgfortran.3.dylib (compatibility version 4.0.0, current version 4.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
/usr/local/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/local/lib/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0)

Other test programs have the same dependencies.

I installed the GCC-9 via MacPorts.

@martin-frbg
Copy link
Collaborator

Hmm, I wonder about the libgfortran.3.dylib - normally gcc 9.x comes with libgfortran version 5 (at least on Linux, but I do not see why OSX would be different). Do you have a libgfortran.5.dylib somewhere on your system as well ?

@martin-frbg
Copy link
Collaborator

TARGET=SANDYBRIDGE build completed on Travis just now, all tests passing without any warning (not even the "IEEE... is signalling" seen from the dynamic_arch build).

@ungur
Copy link
Author

ungur commented Nov 4, 2019

Hmm, I wonder about the libgfortran.3.dylib - normally GCC 9.x comes with libgfortran version 5 (at least on Linux, but I do not see why OSX would be different). Do you have a libgfortran.5.dylib somewhere on your system as well?

Oh, I think the problem is solved now. It was my problem.
The libraries on my system were indeed wrongly linked since I had an old gfortran installation (which I had completely forgotten about).
After the libgfortran.5.dylib which came with gcc9 was correctly linked, everything worked fine.
I was confused by the fact that other Fortran codes compiled and worked fine after the system upgrade. The standard place for the libraries had to be changed to /opt/local/lib in place to /usr/local/lib (which contained old libraries in my case).

sblat1:
/opt/local/lib/libgcc/libgfortran.5.dylib (compatibility version 6.0.0, current version 6.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
/opt/local/lib/libgcc/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/opt/local/lib/libgcc/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0)

The issue is solved and may be closed. Thank you for the support and for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants