Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX 7.1 with GCC/GFortran - Issue needing resolution #1803

Closed
Teej42 opened this issue Oct 9, 2018 · 65 comments
Closed

AIX 7.1 with GCC/GFortran - Issue needing resolution #1803

Teej42 opened this issue Oct 9, 2018 · 65 comments

Comments

@Teej42
Copy link

Teej42 commented Oct 9, 2018

I did review Issue 463.

I set up everything using the packages provided here. This includes the makefile, GCC and GFortran for AIX 7.1, Python 2.7, and so on. My objective is to get OpenBLAS set up to get SciPy module working (due to Lapack/BLAS dependency) for Python 2.7.

I updated Makefile.system to comment out the -m32/-m64 as noted in Issue 463:

#ifndef BINARY_DEFINED
#ifdef BINARY64
#CCOMMON_OPT += -m64
#else
#CCOMMON_OPT += -m32
#endif
#endif

I ran this:

/opt/freeware/bin/make FC=gfortran TARGET=POWER8 >result.txt 2>error.txt

But in the error.txt, I get this:

Assembler:
/home/fp/OpenBLAS-0.2.20/tmp/cc7zjwrP.s: line 71: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/cc7zjwrP.s: line 72: invalid opcode or pseudo-op
/home/fp/OpenBLAS-0.2.20/tmp/cc7zjwrP.s: line 73: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/cc7zjwrP.s: line 100: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/cc7zjwrP.s: line 101: Error In Syntax
make[1]: *** [Makefile.L1:622: sasum_k.o] Error 1
make[1]: *** Waiting for unfinished jobs....
Assembler:
/home/fp/OpenBLAS-0.2.20/tmp/ccVmE6OQ.s: line 63: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/ccVmE6OQ.s: line 64: invalid opcode or pseudo-op
/home/fp/OpenBLAS-0.2.20/tmp/ccVmE6OQ.s: line 65: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/ccVmE6OQ.s: line 85: Error In Syntax
/home/fp/OpenBLAS-0.2.20/tmp/ccVmE6OQ.s: line 86: Error In Syntax
make[1]: *** [Makefile.L1:667: scopy_k.o] Error 1
make: *** [Makefile:139: libs] Error 1

How exactly should I address this?

As an aside, how exactly should $(OSNAME) get defined, so this would be correctly set:

ifeq ($(OSNAME), AIX)
BINARY_DEFINED = 1
endif

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

Output/error/config.h/Makefile.conf as was requested in the other issue. Please let me know if you need anything else.

result.txt
error.txt
config.h.txt
Makefile.conf.txt

@martin-frbg
Copy link
Collaborator

Is there a specific reason why you would want to build 0.2.20 (which predates the fixes mentioned at the end of #463 ? Latest stable release is 0.3.3, you could also try a snapshot of the current "develop" branch.(And OSNAME gets defined by the c_check utility based on the output of gcc -E)

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

I got 0.2.20 from this page - http://www.openblas.net - which I now realize is clearly no longer being maintained. I will pull the latest version and try. Will update this accordingly.

Also, did try gcc -E just now, and it demands an input file. I will investigate the proper way to call gcc -E.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 9, 2018

Sorry, gcc -E ctest.c is what c_check uses (and ctest.c is basically a bunch of ifdefs that prints out OS_AIX if the compiler defines _AIX). The situation with the openblas.net page is unfortunate indeed, I only have commit/release rights to the code repository.

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

Thank you. I confirmed that it recognize OS_AIX, and interestingly enough ARCH_POWER, not POWER8, even though prtconf indicates that Processor Type is PowerPC_POWER8. So I tried:

/opt/freeware/bin/make FC=gfortran TARGET=POWER >result.txt 2>error.txt

It went in a different direction, and produced the following message:

zblat3.f:1447:0: Warning: 'REALPART_EXPR ' may be used uninitialized in this function [-Wmaybe-uninitialized]
ld: 0711-317 ERROR: Undefined symbol: .get_num_procs
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
collect2: error: ld returned 8 exit status
ld: 0711-317 ERROR: Undefined symbol: .get_num_procs
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
collect2: error: ld returned 8 exit status
make[1]: *** [Makefile:137: dblat1] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:134: sblat1] Error 1
make: *** [Makefile:124: tests] Error 2

Results: 32bitcompile.zip

I also tried:

/opt/freeware/bin/make BINARY=64 FC=gfortran TARGET=POWER >result.txt 2>error.txt

Got thrown with a whole bunch of this:

ar: 0707-126 dlasd3.o is not valid with the current object file mode.
Use the -X option to specify the desired object mode.

Results: 64bitcompile.zip

I will grab the latest dev package after lunch, and will also investigate both results closely then to hopefully find a workaround.

Hopefully this is something that have already been seen somewhere else.

@brada4
Copy link
Contributor

brada4 commented Oct 9, 2018

zblat ... undefined symbol
You need to run "make clean" between compilation runs.

0707-126 ...
From the error format it seems that xlc or aix ar is used where gcc expects GNU binutils ar
While the object format is compatible GNU ar manpage states:

       -X32_64
           ar ignores an initial option spelt -X32_64, for compatibility with AIX.  The behaviour produced by this option is the default for GNU ar.  ar does not support any of the other -X options; in particular, it does not support -X32 which is the default for AIX ar.

EDIT: it seems that gcc does not add -X32_64 option so AIX ar does not work.

There are no AIX-specific fixes between 0.3.3 and develop, you can stay with release until any fix is posted in relation with this issue.

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

Re: make clean:

364 vi error.txt
365 /opt/freeware/bin/make clean
366 /opt/freeware/bin/make FC=gfortran TARGET=POWER >result.txt 2>error.txt
367 vi error.txt
368 ls
369 ls -altr
370 /opt/freeware/bin/make clean
371 /opt/freeware/bin/make BINARY=64 FC=gfortran TARGET=POWER >result.txt 2>error.txt
372 vi error.txt


I will look into whether we're referencing the right ar or even if we have a copy installed, and update this accordingly.

@brada4
Copy link
Contributor

brada4 commented Oct 9, 2018

I see one in AIX RPM collection. Probably you need to prepend that bin/ path so that GNU tools override AIX ones.

@martin-frbg
Copy link
Collaborator

At least in theory, identification of ARCH_POWER should lead to a call to cpuid_power that returns TARGET=POWER8, so specifying TARGET=POWER on the command line should not be necessary (and might even be detrimental). Unfortunately #463 was just lots of back and forth with the original poster leaving as soon as we got it to build "somehow", but the -X 64" argument to IBM ar should already be generated by Makefile.power.

@brada4
Copy link
Contributor

brada4 commented Oct 9, 2018

There is $CC invoked at the end twice, to link final shared library and to run linktest.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 9, 2018

From your logs it looks as if ar -X 64 does get called already, perhaps this actually needs to be "-X64" without the intervening blank (line 62 of Makefile.power).
(And building with TARGET=POWER apparently led to a build for POWER5, which may not be what you want)

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

I can confirm that with the 0.3.3 release, the -x 64 issue is all properly resolved, so I did not need to do any modification to the Makefile.system.


I installed binutils, and ran the following commands:

$ export PATH=/opt/freeware/bin:$PATH

This to trigger the use of GNU ar.

$ make clean
$ make

This threw an error for ar:

ar -X 32 -ru ../libopenblas_power5p-r0.3.3.a saxpy.o sswap.o scopy.o sscal.o [blahblahblah]
ar: invalid option -- X
Usage: ar [emulation options] [-]{dmpqrstx}[abcDfilMNoPsSTuvV] [--plugin ] [member-name] [count] archive-file file...
ar -M [<mri-script]
[information]
emulation options:
[-g] - 32 bit small archive
[-X32] - ignores 64 bit objects
[-X64] - ignores 32 bit objects
[-X32_64] - accepts 32 and 64 bit objects

So whitespace between -X and the digit are verboten for this particular AR install. Version:

$ ar -V
GNU ar (GNU Binutils) 2.25.1
Copyright (C) 2014 Free Software Foundation, Inc.

I am digging around to find out where the -X 32 is triggered, and see if I can fix the whitespace myself.


As I just wrote this, Martin responded with the fix which I addressed, and tried make again. Unfortunately, I still have to use FC=gfortran, as xlF is continued to be referenced to.

So in summary, things that AIX needs that I have identified so far:

  1. GCC and its dependencies.
  2. GFortran and its dependencies (xlF chokes unfortunately, even though we already have it installed).
  3. GNU binlibs.
  4. GNU make (AIX make chokes on the Makefile)
  5. makefile.power to have the fix for -X 64 and -X 32 changed to -X64 and -X32.

And it failed here:

ranlib ../../libopenblas_power5p-r0.3.3.a
ar: 0707-108 File ../../libopenblas_power5p-r0.3.3.a is not an archive file.
ranlib: 0654-601 Execution of ar failed
Usage: ranlib [-t] [-X {32|64|32_64}] [--] file ...
make[2]: *** [Makefile:557: ../../libopenblas_power5p-r0.3.3.a] Error 1
make[2]: Leaving directory '/medstat/fp/download/OpenBLAS/lapack-netlib/SRC'
make[1]: *** [Makefile:21: lapacklib] Error 2
make[1]: Leaving directory '/medstat/fp/download/OpenBLAS/lapack-netlib'
make: *** [Makefile:225: netlib] Error 2

Logs: 2-32bitcompile.zip

Running BINARY=64 option after cleaning.

@martin-frbg
Copy link
Collaborator

What does file libopenblas_power5p-r0.3.3.a think what it is, if it is "not an archive file" ? Or does the numeric part of the error message suggest that ranlib managed to still invoke the AIX ar (which may not like the output that its gnu cousin prepared) ? Not sure if we saw anything like that in #463

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

I did a which ranlib which shows it's still referring to /usr/bin/ranlib, so you're on the button on this. I did try changing the symbolic link under /usr/bin/ar to the right version, but ranlib continues to error out with the wrong settings for ar:

$ ranlib libopenblas_power5p-r0.3.3.a
ar: invalid option -- X

Then I looked into /opt/freeware/bin/ and discovered there's several ranlib files:

$ ls -1 |grep ranlib
gcc-ranlib
granlib
powerpc-ibm-aix7.1.0.0-gcc-ranlib

granlib was set as read only, so I changed the permission to 755, and it rans:

$ granlib libopenblas_power5p-r0.3.3.a
$ echo $?
0
$

So we need to update the makefiles to point to granlib, NOT ranlib, on AIX if using GNU programs (gcc/gfortran/gnu make, et cetera). I am currently investigating how to do this.

@martin-frbg
Copy link
Collaborator

It could be that you just need to set RANLIB=/opt/freeware/bin/granlib (however things will get complicated if granlib does not have execute permissions by default).
On the other hand, the issue may just come from using GNU ar for generating the archive, I am not sure if there really is a reason to avoid the AIX ar (beyond brada4's initial interpretation of the error message from the misspelled option)

@brada4
Copy link
Contributor

brada4 commented Oct 9, 2018

https://www.unix.com/aix/116056-ar-0707-126-a.html
Maybe setting variable IPO sneaking parameters helps?

@Teej42
Copy link
Author

Teej42 commented Oct 9, 2018

Okay, I tried the following:

$ make FC=gfortran RANLIB=granlib 1>result.txt 2>error.txt

(running with PATH including /opt/freeware/bin first and using granlib and gnu ar)

3-32bitcompile.zip

$ /opt/freeware/bin/make FC=gfortran 1>result.txt 2>error.txt

(Running with default PATH, and using AIX ranlib and AIX ar).

4-AIXcompile.zip

I need to check out for the day, but will investigate the first run closely tomorrow morning - it's spitting something about collect2 not seeing the .so file as a COFF file:

collect2: fatal error: ../libopenblas_power5p-r0.3.3.a: not a COFF file

Many thanks for the assistance you two have given me today on this. Made a lot of progress, and hopefully will have a working solution SciPy like tomorrow.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 10, 2018

Could be that the GNU binutils create an ELF archive and you are invoking the AIX ld in the last step,
which apparently only supports COFF/XCOFF. You could try specifying LD=/path/to/gnu/ld as well. Alternatively we would need to find out why the AIX build appears to be missing get_num_procs(), which is defined in driver/others/memory.c. (This file contains various ifdef'd implementations, but offhand I do not see which of them would be used on AIX with or without the GNU environment. Perhaps the one using sysconf(_SC_NPROCESSORS_CONF) should be (if...) defined(OS_AIX) as well ?)

@Teej42
Copy link
Author

Teej42 commented Oct 10, 2018

Current command line (with PATH setting to /opt/freeware/bin for the GNU stuff first):

make FC=gfortran RANLIB=granlib LD=gld 1>result.txt 2>error.txt

I did the following command:

$ gld libopenblas_power5p-r0.3.3.a
gld: warning: cannot find entry symbol __start; defaulting to 00000000100000e0

and

$ ld libopenblas_power5p-r0.3.3.a
ld: 0711-715 ERROR: File libopenblas_power5p-r0.3.3.a cannot be processed.
The file must be an object file, an import file, or an archive.

Time for me to crack out that google skills to figure out this error which still shows:

collect2: fatal error: ../libopenblas_power5p-r0.3.3.a: not a COFF file

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 10, 2018

gld error means that it fails to find some startup code (or more likely lacks LDFLAGS that would tell it where to get it from). ld error suggests that AIX ld can only handle its own variant of the COFF format and GNU ar produced something else (which is why I suggested trying to use the AIX ar one more time).

Most google hits I found were at least ten years old and at best referred to some switch to a "large" format that broke archive compatibility between AIX 4.3 and earlier. Typical hits tend to be question only, the best I got was in the GCC installation document https://gcc.gnu.org/install/specific.html#x-ibm-aix where it states that "large" format allows both 32 and 64bit objects (so possibly what -X32_64 does) and one could use the "-g" option with ar to make it write the original, 32bit files only format.
(Not sure if/how this can still be relevant in your AIX7 context)

@brada4
Copy link
Contributor

brada4 commented Oct 10, 2018

Probably @TeejIBM would need to copy full gld line.
Another try is ln -s gld ld (and remember where to remove the artifact after)

@martin-frbg
Copy link
Collaborator

Guessing that we are at the "static :" target in exports/Makefile, this would appear to be gld -r -o goto.o --whole-archive libopenblas_power5p_r0.3.3.a (where the libopenblas...a is subsequently removed and recreated from the intermediate "goto.o" by ar -cq libopenblas_power5p_r0.3.3a goto.o)

@Teej42
Copy link
Author

Teej42 commented Oct 16, 2018

I apologize for the radio silence. We continued to have issues with this particular package. With the advice of someone else within IBM, I was able to get lapack/BLAS to install successfully from the source with the following instruction:

https://www.ibm.com/developerworks/community/forums/html/topic?id=10b1cba0-ff19-49b2-b2cd-2e924ffda12a&ps=25

That worked for us. I do not know if the information provided is relevant here. When we find some time (dealing with deadlines here), we will circle back to this to try to close the loop on this.

Is there a way to set a reminder for this?

@martin-frbg
Copy link
Collaborator

So if I read that thread correctly, you opted to go with the unoptimized netlib reference implementation of both BLAS and LAPACK instead, at least for now ? (This is also what OpenBLAS uses for - most of - its LAPACK component, but clearly the build system and dependencies are simpler with a fortran-only package).
Not sure how to set a reminder, except perhaps by bookmarking this page in your browser of choice.

@Teej42
Copy link
Author

Teej42 commented Oct 30, 2018

Okay, sorry for the delay. There turns out to be a need for 64 bit version of BLAS required for SciPy on AIX, and the BLAS/LAPACK are giving me issues in 64 bit compilation (missing pointers in shared object files). Back to OpenBLAS, then. I have replicated it on a new environment (same AIX 7.1), and ran this:

$ make clean CC=gcc

Because if I didn't I would get this:

$ make clean
In file included from cpuid_power.c:41,
from getarch.c:1051:
/usr/include/sys/vminfo.h:743: error: expected specifier-qualifier-list before 'id64_t'`

Then I ran this:

$ make BINARY=64 CC=gcc FC=gfortran RANLIB=granlib LD=ld AR=ar >result.txt 2>error.txt

And I got the same error:

collect2: fatal error: ../libopenblas_power5p-r0.3.3.a: not a COFF file
compilation terminated.
make[1]: *** [Makefile:134: sblat1] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:124: tests] Error 2

See attached.

result.txt
error.txt

$ which ld
/opt/freeware/bin/ld
$ which ar
/opt/freeware/bin/ar
$ which ranlib
/usr/bin/ranlib
$ which granlib
/opt/freeware/bin/granlib
$ which gcc
/opt/freeware/bin/gcc
$ which gfortran
/opt/freeware/bin/gfortran

I am not quite sure what to do regarding the "static :" target in exports/Makefile comment. Guidance would be welcome.

@martin-frbg
Copy link
Collaborator

Perhaps try adding FFLAGS="-O2 -maix64" to the make arguments - I see the -maix64 option gets set for gcc automatically "somehow", but seems to be missing from the gfortran command lines. (As gfortran is used for both compiling and linking the tests/sblat1 executable, maybe this is where it gets confused over which type of library to link)

@Teej42
Copy link
Author

Teej42 commented Oct 31, 2018

$ make BINARY=64 CC=gcc FC=gfortran FFLAGS="-O2 -maix64" RANLIB=granlib LD=ld AR=ar >result.txt 2>error.txt

That produced this:

collect2: fatal error: ../libopenblas_power5p-r0.3.3.a: not a COcollect2FF file
: fatal error: .compilation terminated.
./libopenblas_power5p-r0.3.3.a: not a COFF file
compilation terminated.

DuckDuckGo yielded absolutely nothing for that error message. Google did not find anything either.

Here's the goods:

error.txt
result.txt

@ayappanec
Copy link
Contributor

Did you have binutils rpm installed ? If that is the case, please uninstall it and do a clean build.
Binutils provided linker (ld) still has issues in AIX. So it's better not to use it.

@brada4
Copy link
Contributor

brada4 commented Oct 31, 2018

can you examine .a file with "file" command?
It might be either ar archive (like on Linux) or COFF32 or COFF64 file.
What is strange that same .a file was completely valid few lines ago, i.e what is built as a big .a file is first used to build _blat tests, then encountered as wrong format by one tool later.
I think it should be same format as gcc -static emits for other tools to make sense....

Could you run make but adding MAKE_NB_JOBS=1 with all other identical parameters in same build tree - it should re-create .a file and make tests again without compilation of all functions, and give sequenced output to see which command actually fails.

PS One may record command outputs with script or 2>&1 >../out.txt a bit better

@martin-frbg
Copy link
Collaborator

Probably best to follow ayappanec's suggestion and try with the AIX ld again. (I am not even sure file would tell anything other than current ar archive, and I do not see how disabling parallel make would help)

@martin-frbg
Copy link
Collaborator

Err, I guess I should have looked at the code before mentioning this - cpuid_power.c just does a quick and dirty "return CPUID_POWER5" on AIX without bothering to find out the actual cpu model.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 31, 2018

Seems the AIX version of install (or whatever tool got called by it) does not support setting permissions (to 644) in the same way as the GNU coreutils version does. Could be just an idiom issue - Makefile.install has it do "install -pm644" where the AIX install may expect a more formal "-p -m 644" or something ?

@brada4
Copy link
Contributor

brada4 commented Oct 31, 2018

It is similar but different....
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.cmds3/install.htm
There is a note in the bottom regarding BSD-type install....

@martin-frbg
Copy link
Collaborator

...and even bsdinstall does not understand the -p option it seems, so that section of Makefile.install will need to be copied to a special ifeq($(OSNAME),AIX) that does install -M 644 instead of -pm644.

@Teej42
Copy link
Author

Teej42 commented Oct 31, 2018

Okay, please let me know what I need to modify, so we can verify on our end, so you can complete your defect on this too.

@martin-frbg
Copy link
Collaborator

Tentative solution in #1845

@martin-frbg
Copy link
Collaborator

martin-frbg commented Nov 2, 2018

Hopefully resolved by #1849 now. (Still might want to look at cpu identification under AIX, though perhaps for your case a POWER5 - and presumably "up" - library is more appropriate than one that requires POWER8, at least as long as OpenBLAS does not have "dynamic_arch" cpu support on all platforms ?

@ayappanec
Copy link
Contributor

@TeejIBM May be it's late but let me explain some common stuffs which one has to do for any opensource software build in AIX.
Many native tools in AIX are now old enough that they won't work for current softwares. For the same reason, we have the required tools in AIX Toolbox ( coreutils, diffutils, findutils , make, automake, autoconf., etc ). It is always recommended to install these packages and set the PATH to /opt/freeware/bin:$PATH
In this case, if one would have installed coreutils rpm, then the install issue would not have happened.

@Teej42
Copy link
Author

Teej42 commented Nov 4, 2018

@ayappanec - I am indeed setting /opt/freeware/bin first in my path. However, due to lack of knowledge and experience, I have not installed coreutils, findutils, automake, or autoconf (or anything else) for that matter. It would be great if there is a single core guideline for non-sysadmins like myself to gain a better insight on the recommended course of actions for installation.

Note: This will have to be described in a non-internet-accessible manner, as our production servers are still being firewalled. Part of my task involves documenting for our sysadmins to follow and set up environments for us in the future, including those production servers, so references I can send them to would be wonderful.

I will check on the the #1849 fix in a moment.

@Teej42
Copy link
Author

Teej42 commented Nov 4, 2018

Okay, one more issue:

$ sudo make install
Password: 
make -j 2 -f Makefile.install install
make[1]: Entering directory '/medstat/advantage/scratch/focal_point/download/OpenBLAS'
Generating openblas_config.h in /opt/OpenBLAS/include
Generating f77blas.h in /opt/OpenBLAS/include
Generating cblas.h in /opt/OpenBLAS/include
Copying LAPACKE header files to /opt/OpenBLAS/include
Copying the static library to /opt/OpenBLAS/lib
Copying the shared library to /opt/OpenBLAS/lib
installbsd: can't find libopenblas_power5p-r0.3.3.so.
make[1]: *** [Makefile.install:54: install] Error 1
make[1]: Leaving directory '/medstat/advantage/scratch/focal_point/download/OpenBLAS'
make: *** [Makefile:336: install] Error 2

This is because on AIX, all shared object files ends in .a, not .so. My quick and dirty fix:

Makefile.system:1202 -
LIBSONAME = $(LIBNAME:.$(LIBSUFFIX)=.a)
instead of
LIBSONAME = $(LIBNAME:.$(LIBSUFFIX)=.so)

Obviously, this is not the best way to do this. Will have to do an ifdef AIX in a sense. After clean and make, then:

$ sudo make install
Password: 
make -j 2 -f Makefile.install install
make[1]: Entering directory '/medstat/advantage/scratch/focal_point/download/OpenBLAS'
Generating openblas_config.h in /opt/OpenBLAS/include
Generating f77blas.h in /opt/OpenBLAS/include
Generating cblas.h in /opt/OpenBLAS/include
Copying LAPACKE header files to /opt/OpenBLAS/include
Copying the static library to /opt/OpenBLAS/lib
Copying the shared library to /opt/OpenBLAS/lib
Generating openblas.pc in /opt/OpenBLAS/lib/pkgconfig
Generating OpenBLASConfig.cmake in /opt/OpenBLAS/lib/cmake/openblas
Generating OpenBLASConfigVersion.cmake in /opt/OpenBLAS/lib/cmake/openblas
Install OK!
make[1]: Leaving directory '/medstat/advantage/scratch/focal_point/download/OpenBLAS'

I need to test this to confirm everything is behaving correctly by installing NumPy and SciPy.

@Teej42
Copy link
Author

Teej42 commented Nov 4, 2018

Okay, this may be getting out of bound in term of what this team could do to help me. However, I would like to give it a shot and see if anyone have an answer.

Installing Numpy 1.15.3 using the script - I tweaked the site.cfg file to enable OpenBLAS. Then I did a python_64 setup.py config and got a weird series of error messages, which is quite hard to interpret (did it succeed? Did it fail?). The output is attached here:
config.txt

In addition, even after python_64 setup.py build and pip_64 install ., and a successful run of the test script I found elsewhere (https://hunseblog.wordpress.com/2014/09/15/installing-numpy-and-openblas/), the results is a bit slow:

$ python_64 test.py
dotted two (1000,1000) matrices in 8933.6 ms
dotted two (4000) vectors in 38.62 us
SVD of (2000,1000) matrix in 15.931 s
Eigendecomp of (1500,1500) matrix in 37.237 s

In addition, when doing an install of Scipy using pip_64 install scipy, I get the following message:

numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

So, I am confused, is OpenBLAS successfully installed? How can I verify this outside Numpy/Scipy?

Thank you for all the help you've done for us.

@martin-frbg
Copy link
Collaborator

In the numpy config.txt I think it found OpenBLAS initially, but then it apparently found an unrelated liblapack in /usr/lib64 (netlib reference implementation from your earlier build probably ?), and it did not find the OpenBLAS-provided cblas.h as it did not search /opt/OpenBLAS/include for headers.
Inside python/numpy, you can do python -c "import numpy; print(numpy.__config__.show())" as
shown in #1844 (comment) , outside you would need to compile/link some C or FORTRAN test code e.g. from the OpenBLAS benchmark directory.
Generally you will probably need to configure your system to search /opt/OpenBLAS/include
(and /opt/OpenBLAS/lib) in addition to the default search paths for headers (and libraries), or install to a commonly recognized PREFIX like /usr/local or /opt/freeware that is searched by default.

@Teej42
Copy link
Author

Teej42 commented Nov 5, 2018

The clue on the /usr/lib64 actually solved a lot of the messy messages, and it narrowed down to ld not being able to spot certain references. Particularly:

ld: 0711-317 ERROR: Undefined symbol: .logf
ld: 0711-317 ERROR: Undefined symbol: .lroundf
ld: 0711-317 ERROR: Undefined symbol: .omp_get_num_threads
ld: 0711-317 ERROR: Undefined symbol: .omp_get_thread_num
ld: 0711-317 ERROR: Undefined symbol: .omp_set_num_threads
ld: 0711-317 ERROR: Undefined symbol: .GOMP_parallel
ld: 0711-317 ERROR: Undefined symbol: .omp_in_parallel
ld: 0711-317 ERROR: Undefined symbol: .omp_get_max_threads

I wonders if I need to just go ahead and install coreutils, et al that @ayappanec recommended even though I do not understand how it relates to this at this time. Updated output:

config.txt

Edit: Output from the requested command:

$ python -c "import numpy; print(numpy.__config__.show())"
lapack_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
openblas_clapack_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
accelerate_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
blis_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
None

Note: I changed python to point to python_64.

@martin-frbg
Copy link
Collaborator

logf and lroundf are probably in libgfortran (or its companion libquadmath), the various omp functions are in libgomp (GNU OpenMP) or whatever other implementation of OpenMP you have available.

@ayappanec
Copy link
Contributor

libgomp is available in AIX Toolbox

@Teej42
Copy link
Author

Teej42 commented Nov 5, 2018

I can confirm that libgomp-6.3.0-1.aix7.1.ppc.rpm is already installed:

$ rpm -qa libgomp
libgomp-6.3.0-1.ppc

@martin-frbg
Copy link
Collaborator

Doing rpm -ql libgomp should display what (and where) is installed as part of this package, but it looks as if the python configure just needs to be told "somehow" that programs linking to openblas should also do -lopenmp -lgfortran to satisfy its dependencies.

@Teej42
Copy link
Author

Teej42 commented Nov 5, 2018

Update: Found a way, just update CC env var to the proper settings (export CC='gcc -pthread -lm -lgcc -lgomp -lgfortran -maix64'), and it got everything. Had to manually build, instead of using pip_64, and everything looks good:

$ python_64 -c "import numpy; print(numpy.__config__.show())"
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
blis_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/opt/OpenBLAS/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    runtime_library_dirs = ['/opt/OpenBLAS/lib']
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
None

Performance seems to be unchanged:

$ python_64 test.py
dotted two (1000,1000) matrices in 8905.2 ms
dotted two (4000) vectors in 38.45 us
SVD of (2000,1000) matrix in 15.946 s
Eigendecomp of (1500,1500) matrix in 37.419 s

Pip_64 installation of SciPy failed with same error as NumPy, so looks like I will have to do the same thing for SciPy too. Will do that tomorrow morning. Got a family to take care of.

@Teej42
Copy link
Author

Teej42 commented Nov 8, 2018

Okay, with further tests and work (several issues were found with Numpy and Scipy and reported (or will be reported)) - I believe that OpenBLAS is mostly ready for AIX, except for the fix I noted in this update - #1803 (comment) - where Makefile.system needs to be modified to point to .a instead of .so for LIBSONAME when OS_AIX is true. Once a working defect is opened on this, I'll close this ticket.

Might also want to consider another ticket on pointing to the right Power CPU -- I am willing to be the guinea pig for this, if it help optimize the performance for OpenBLAS.

Thank you so much for the help you have given us on this!

@martin-frbg
Copy link
Collaborator

The fix for Makefile.system is merged now; an untested solution for the cpu detection is now in #1868

@martin-frbg
Copy link
Collaborator

Err, and in view of recent events (#1844) you may want to rebase your AIX binary on current "develop" and also set USE_SIMPLE_THREADED_LEVEL3=1 in Makefile.rule (#1851) unless a more elegant solution is found for the latter in the next couple of days.

@Teej42
Copy link
Author

Teej42 commented Nov 13, 2018

Thank you for the head up. I will stand by for the time being before I need to remake this. I presume that we can go ahead and remake it without having to remake any dependents downstream, so this is a fairly simple task for us to do. BTW, if your timeline for 0.3.4 is by end of November, we can just absorb that then and rebuild with the Makefile.rule set (unless a fix is provided for that).

@martin-frbg
Copy link
Collaborator

Closing as all the fixes required for building are in 0.3.4, and any remaining issues with installation can be handled in #1896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants