Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build of OpenBLAS 0.3.8 breaks on AIX #2671

Closed
trex58 opened this issue Jun 16, 2020 · 53 comments
Closed

Build of OpenBLAS 0.3.8 breaks on AIX #2671

trex58 opened this issue Jun 16, 2020 · 53 comments

Comments

@trex58
Copy link

trex58 commented Jun 16, 2020

Hi,
Between v0.3.7 and v0.3.8, a change was done in file kernel/Makefile.L3 , and thus building v0.3.8 breaks in my AIX 7.2 environment though building v0.3.7 builds/tests OK.

The following lines were added in line 529 of Makefile.L3 :

ifeq ($(OS), AIX)
	$(CC) $(CFLAGS) -UDOUBLE -UCOMPLEX -E $< -o cgemm_itcopy.s
	m4 cgemm_itcopy.s > cgemm_itcopy_nomacros.s
	$(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX cgemm_itcopy_nomacros.s -o $@
	rm cgemm_itcopy.s cgemm_itcopy_nomacros.s

Which generates:

gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -mpowerpc64 -maix64 -DASMNAME=cgemm_itcopy -DASMFNAME=cgemm_itcopy_ -DNAME=cgemm_itcopy_ -DCNAME=cgemm_itcopy -DCHAR_NAME=\"cgemm_itcopy_\" -DCHAR_CNAME=\"cgemm_itcopy\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -UDOUBLE -UCOMPLEX -E ../kernel/power/../generic/zgemm_tcopy_2.c -o cgemm_itcopy.s

m4 cgemm_itcopy.s > cgemm_itcopy_nomacros.s

...
gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -mpowerpc64 -maix64 -DASMNAME=cgemm_itcopy -DASMFNAME=cgemm_itcopy_ -DNAME=cgemm_itcopy_ -DCNAME=cgemm_itcopy -DCHAR_NAME=\"cgemm_itcopy_\" -DCHAR_CNAME=\"cgemm_itcopy\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -c -UDOUBLE -UCOMPLEX cgemm_itcopy_nomacros.s -o cgemm_itcopy.o
Assembler:
cgemm_itcopy_nomacros.s: line 12: invalid opcode or pseudo-op
cgemm_itcopy_nomacros.s: line 12: Error In Syntax
...............

File cgemm_itcopy.s starts with:

# 1 "../kernel/power/../generic/zgemm_tcopy_2.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "../kernel/power/../generic/zgemm_tcopy_2.c"
# 39 "../kernel/power/../generic/zgemm_tcopy_2.c"
# 1 "/home2/freeware/lib/gcc/powerpc-ibm-aix7.2.0.0/8/include-fixed/stdio.h" 1 3 4
# 11 "/home2/freeware/lib/gcc/powerpc-ibm-aix7.2.0.0/8/include-fixed/stdio.h" 3 4
........

This is due to the "-E" option used in line:

$(CC) $(CFLAGS) -UDOUBLE -UCOMPLEX -E $< -o cgemm_itcopy.s

This option requires the C compiler (gcc here) to run the C Preprocessor only. Thus the generated file cgemm_itcopy.s is kind of C source code but not assembler.

I guess that the correct option to use is: -S .

Changing -E by -S in all places (all for AIX) does fix the issue. OpenBlas 0.3.8 now builds... up to some other error I have to study.

This patch is WRONG. Do not use it.
openblas-0.3.8-CC-E-S.patch.txt

@trex58
Copy link
Author

trex58 commented Jun 16, 2020

The second issue appears within:

gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -mpowerpc64 -maix64 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sgemm_kernel -DASMFNAME=sgemm_kernel_ -DNAME=sgemm_kernel_ -DCNAME=sgemm_kernel -DCHAR_NAME=\"sgemm_kernel_\" -DCHAR_CNAME=\"sgemm_kernel\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -S -UDOUBLE -UCOMPLEX  ../kernel/power/gemm_kernel_power6.S -o sgemm_kernel.s

still in code of kernel/Makefile.L3 (version 0.3.10) :

656: ifeq ($(OS), AIX)
657:        $(CC) $(CFLAGS) -S -UDOUBLE -UCOMPLEX  $< -o sgemm_kernel$(TSUFFIX).s

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 16, 2020

Weird, the relevant change is from #2338 and according to #1997 earlier versions of OpenBLAS would not have been expected to build on AIX. Having -S here instead of -E does make sense, perhaps the branch used for the PR was a prettied-up (but not tested again) version of the actual work ? (Somehow interconverting the occurences of -S and -E perhaps ?)

@trex58
Copy link
Author

trex58 commented Jun 16, 2020

I don't know. 8:30PM here. Will see tomorrow about the 2nd issue. The patch I've attached previously is not the final one.

@trex58
Copy link
Author

trex58 commented Jun 17, 2020

Hi Martin,
I'm now studying the code in order to understand what it is aimed to do.

About -E , I now understand that it is used to replace some macro in the ./kernel/generic/zgemm_tcopy_2.c C file.
So, my first comment seems to be wrong.
However, the file which is generated (cgemm_itcopy.s) is named .s though it is C code (?!).
Then, m4 is used as a macro-processor and there are some few changes between cgemm_itcopy.s and cgemm_itcopy_nomacros.s (see details below).
And then, gcc is called on cgemm_itcopy_nomacros.s in order to generate cgemm_itcopy.o .

Summary:

cd kernel
gcc -E ../generic/zgemm_tcopy_2.c  ---> cgemm_itcopy.s
m4 cgemm_itcopy.s ----> cgemm_itcopy_nomacros.s
gcc cgemm_itcopy_nomacros.s ---> cgemm_itcopy.o

However, GCC (v8.4) - based on the suffix .s of cgemm_itcopy_nomacros.s - thinks that it is an assembler code. Thus it tries to compile it as an assembler code, and says:

    cgemm_itcopy_nomacros.s: line 12: invalid opcode or pseudo-op
    cgemm_itcopy_nomacros.s: line 12: Error In Syntax

on the line:

    12: typedef __builtin_va_list __gnuc_va_list;

which clearly is C code.

Renaming cgemm_itcopy_nomacros.s as cgemm_itcopy_nomacros.c does fix this issue.
However, then, it breaks at:

In file included from ../common.h:83,
                 from ../kernel/power/../generic/zgemm_tcopy_2.c:40:
/home2/freeware/lib/gcc/powerpc-ibm-aix7.2.0.0/8/include-fixed/stdlib.h:538:13: error: two or more data types in declaration specifiers
  extern int  mkstemp(char *);
             ^~~~
/home2/freeware/lib/gcc/powerpc-ibm-aix7.2.0.0/8/include-fixed/stdlib.h:538:19: error: expected identifier or '(' before '-' token
  extern int  mkstemp(char *);
                   ^
In file included from ../common.h:84,
                 from ../kernel/power/../generic/zgemm_tcopy_2.c:40:
/home2/freeware/lib/gcc/powerpc-ibm-aix7.2.0.0/8/include-fixed/string.h:345:22: error: expected identifier or '(' before '-' token
         extern char     *index(const char *, int);
                      ^

So, probably that this code was correctly built with GCC v6 but breaks with GCC v8.4 ?!
I'll try to find a machine with GCC v6 installed in order to check.

Moreover, it seems that there are optimizations that may depend on the kind of processor.
I'm now using a Power8 VM.

Details about the difference between cgemm_itcopy.s and cgemm_itcopy_nomacros.s :

# diff cgemm_itcopy.s cgemm_itcopy_nomacros.s
2038c2038
<  extern int mkstemp(char *);
---
>  extern int char *kZwqea;
3066c3066
<         extern char *index(const char *, int);
---
>         extern char *-1;

@martin-frbg
Copy link
Collaborator

This looks more like a bug in your GCC installation or include paths to me - types.h not matching the expectations of stdlib.h or something like that, or your m4 is misbehaving.

@trex58
Copy link
Author

trex58 commented Jun 17, 2020

Yes. Probably that something has changed.
Before I lost Internet for 1/2h, I was able to check that removing the m4 step for this file does fix the issue, for the case cgemm_itcopy.
Investigating. I'll check on a GCC v6 machine.

@trex58
Copy link
Author

trex58 commented Jun 17, 2020

No. I have the same behavior with GCC 6.3 on AIX 6.1 :

# gcc --version
gcc (GCC) 6.3.0

+ gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NU
MBER=1 -DVERSION="0.3.8" -mpowerpc64 -maix64 -DASMNAME=cgemm_itcopy -DASMFNAME=cgemm_itcopy_ -DNAME=cgemm_itcopy_ -DCNAME=cgemm_itcopy -DCHAR_NAME="cgemm_it
copy_" -DCHAR_CNAME="cgemm_itcopy" -DNO_AFFINITY -I.. -UDOUBLE -DCOMPLEX -UDOUBLE -UCOMPLEX \
        -E ../kernel/power/../generic/zgemm_tcopy_2.c -o cgemm_itcopy.s
+ m4 cgemm_itcopy.s
+ 1> cgemm_itcopy_nomacros.s
+ gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NU
MBER=1 -DVERSION="0.3.8" -mpowerpc64 -maix64 -DASMNAME=cgemm_itcopy -DASMFNAME=cgemm_itcopy_ -DNAME=cgemm_itcopy_ -DCNAME=cgemm_itcopy -DCHAR_NAME="cgemm_it
copy_" -DCHAR_CNAME="cgemm_itcopy" -DNO_AFFINITY -I.. -UDOUBLE -DCOMPLEX -c -UDOUBLE -UCOMPLEX \
        cgemm_itcopy_nomacros.s -o cgemm_itcopy.o
Assembler:
cgemm_itcopy_nomacros.s: line 12: invalid opcode or pseudo-op
cgemm_itcopy_nomacros.s: line 12: Error In Syntax

@martin-frbg
Copy link
Collaborator

Does it work when you rename the .s to .c again (or add -x c to the gcc options) ? I.e., is the error introduced by m4 present there as well ?

@brada4
Copy link
Contributor

brada4 commented Jun 17, 2020

Could you try to produce m4 output with AIX m4 (something old shipped with their sendmail) and GNU m4 from linux toolbox, then run diff between files. Please help with the diff output (make it *.txt, edit out private paths if any) uploaded.
It might be that there is something tested only with GNU m4 ever and not so POSIX for POSIX tools to handle.

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

I know very very few about how openblas is built. Anyway, after some experiment, I think that things have been done for AIX that are not required for my AIX environment.
It seems to me that the kernel/Makefile.L3 of v0.3.8 is used for building on AIX.
Looking at it, many lines like:

$(KDIR)$(SGEMMOTCOPYOBJ) : $(KERNELDIR)/$(SGEMMOTCOPY)
ifeq ($(OS), AIX)
        $(CC) $(CFLAGS) -E -UDOUBLE -UCOMPLEX $< -o sgemmotcopy.s
        m4 sgemmotcopy.s > sgemmotcopy_nomacros.s
        $(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX sgemmotcopy_nomacros.s -o $@
        rm sgemmotcopy.s sgemmotcopy_nomacros.s
else
        $(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX $< -o $@
endif

appear in kernel/Makefile.L3 . They were added for Power8 I think.

The .spec file I'm using comes from IBM. It defines:

PATH=/opt/freeware/bin:/usr/local/bin/:/usr/vac/bin:/opt/freeware/bin:/usr/linux/bin:/usr/local/bin:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java8_64/jre/bin:/usr/java8_64/bin:/usr/samples/kernel:.
+ export CC=gcc -maix64
+ export CXX=g++ -maix64
+ export FC=gfortran -maix64
+ make BINARY=64 TARGET=POWER6

I'm using a Power8 VM, but, for compatibility with older machines for our customers, it was compiled with TARGET=POWER6 . Though changes done within: #2338 clearly have been done for POWER8.

By simply replacing "AIX" by "AIX-POWER8" (which does not exist) everywhere in kernel/Makefile.L3 , the else part of "ifeq ($(OS), AIX)" is executed, and everything works perfectly, till the tests, which are all PASSED.

However, then, changing the .spec file : make BINARY=64 TARGET=POWER8 , WITHOUT changing AIX by AIX-POWER8, does not succeed: I still have the code breaking at:

gcc -maix64 -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP `
-fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT
 -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2
 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.8\" -mpowerpc64 -maix64
 -DASMNAME=sgemm_kernel -DASMFNAME=sgemm_kernel_ -DNAME=sgemm_kernel_
 -DCNAME=sgemm_kernel -DCHAR_NAME=\"sgemm_kernel_\"
 -DCHAR_CNAME=\"sgemm_kernel\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -S -UDOUBLE
 -UCOMPLEX  ../kernel/power/sgemm_kernel_16x8_power8.S -o sgemm_kernel.s
# 1 "../kernel/power/sgemm_kernel_16x8_power8.S"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "../kernel/power/sgemm_kernel_16x8_power8.S"
# 75 "../kernel/power/sgemm_kernel_16x8_power8.S"

So, maybe that there are more options to be set before building, starting with version 0.3.8 ?
Reading README.md, I see some other options, like: HOSTCC= , CC= , FC= , DEBUG=1 , USE_MASS=1 .
The IBM MASS accelerator seems available only for PPC LittleEndian VM. AIX is BigEndian.
More about Power:

#### PPC/PPC64
- **POWER8**: Optimized BLAS, only for PPC64LE (Little Endian), only with `USE_OPENMP=1`
- **POWER9**: Optimized Level-3 BLAS (real) and some Level-1,2. PPC64LE with OpenMP only.
### Supported OS:
- **AIX**: Supported on PPC up to POWER8

However, looking at kernel/power, there are :

KERNEL.POWER3
KERNEL.POWER4
KERNEL.POWER5
KERNEL.POWER6
KERNEL.POWER8
KERNEL.POWER9

so the README.md file needs some update.

So, my expectation was that the lines in kernel/Makefile.L3 were added for AIX & POWER8 only, breaking building for POWER6. But that still break when TARGET=POWER8 .

@martin-frbg
Copy link
Collaborator

I have requested an account on the gcc compile farm to be able to debug such issues in the future.
Right now I can only guess what makes "your" AIX different from kavanabhat's installation (where I trust these additions in Makefile.L3 will have worked). Given the nature of the changes, it seems likely that they are/were required to work around limitations in some native AIX assembler rather than GNU as that you are probably using.
As an aside, for "historical" reasons the README.md only lists cpu models that are supported in
addition to those known to the original GotoBLAS

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

I'll discuss of this with IBM people, ayappanec, this afternoon. Maybe I am not launching the build correctly. However, looking at their last version of the .spec file, for OpenBlas v0.3.6, they seem to do the same I do, but for power8 and for power6. So, they may not be aware of these more recent changes that appeared in v0.3.8 .
I am using AIX 7.2 , GCC 8.4 , a VM using Power8 HW.

Last Minute: When experimenting with POWER8 target, I forgot to remove the patch changing -E by -S , which is wrong when compiling with POWER 8. It's building now. If OK, that means that the changes made for Power8, and named as "ifeq ($(OS), AIX)" are not compatible when building with Power6. So the test should be more detailed: "if OS==AIX and TARGET>=POWER8", something like this.

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

Yes. The build with TARGET=POWER8 and without my wrong previous patch -E --> -S goes further, close to running the tests (Updated: all OK). So, I think that the addition of new stuff for POWER8 for AIX , appearing with version 0.3.8, did break building with TARGET=POWER6 for AIX. The test if OS=AIX must be improved to not apply when TARGET=POWER6 .

@martin-frbg
Copy link
Collaborator

Probably something like (untested)

ifeq ($(OS)$(filter-out($(TARGET),POWER8 POWER9 POWER10), AIX)

instead of the ifeq($(OS),AIX)

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

Thx! I do not master this syntax.
I'll experiment right now.
Moreover, I'll use a .spec file which builds for both TARGET=POWER6 and 8 . So that we'll know.

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

That does not work. Some issue with "(" and " " that I have fixed, but it does not do what we expect. I'm trying to find a solution.

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

I think I have a solution, split between kernel/Makefile and kernel/Makefile.L3 .
Checking now.

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

The idea is:

  • For kernel/Makefile
 include Makefile.L2

+ifeq ($(OSNAME), AIX)
+# AIXPOWER8910 is 1 if HW >= Power8
+# Needed since Assembler is used starting with Power8
+       AIXPOWER8910 = 0
+       ifeq ($(TARGET), POWER8)
+               AIXPOWER8    = 1
+               AIXPOWER8910 = 1
+       endif
+       ifeq ($(TARGET), POWER9)
+               AIXPOWER9    = 1
+               AIXPOWER8910 = 1
+       endif
+       ifeq ($(TARGET), POWER10)
+               AIXPOWER10   = 1
+               AIXPOWER8910 = 1
+       endif
+endif
 include Makefile.L3
  • For kernel/Makefile.L3 :
 $(KDIR)$(SGEMMOTCOPYOBJ) : $(KERNELDIR)/$(SGEMMOTCOPY)
-ifeq ($(OS), AIX)
+ifeq ($(AIXPOWER8910), 1)

It works fine. Tested with Power6 and Power8.

Maybe that the variable AIXPOWER8910 could be: AIX_POWER_GE8 rather. Your choice.

I have a patch for v0.3.8 . If you want it.
I'll now see for version 0.3.10 if there other issues, and build a patch. I do not know if your master is very different from v0.3.10 and would require a specific patch

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

Hummm Still an issue in Power8 now... Grrr

Corrected: I applied the patch only for the power8 case, not for the power6 case (the RPM .spec file makes use of 2 different directories and were applying patches for the power8 only. I was stupid and added my patch in the wrong place). So, that should work. Anyway, I'll have to try your last suggestion (3 comments below) which is simpler than mine.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 18, 2020

Apart from the excess braces I could almost claim I was close, but the filter was reversed. Can you try this please:

ifneq ($(OS)$(filter $(TARGET), POWER8 POWER9 POWER10),AIX)

EDIT: ugh, sorry, scratch that. it does not fulfill the condition that we want anything non-AIX to take the else branch as well

@trex58
Copy link
Author

trex58 commented Jun 18, 2020

Yes. I had already tried this. But it compares AIXPOWER8 to AIX, or AIXPOWER9 to AIX.
I had succeeded with my proposal when experimenting with a script, and it seemed OK for the 2 POWER6 and POWER8 cases. However, when trying with the full build by means of a .spec file, I had an issue. That's so long before it reaches the AIX-Power point... so that I can know if it's OK... and I'm unable to find easily when and where...
I'll continue tomorrow.

@martin-frbg
Copy link
Collaborator

ifeq ($(findstring AIXPOW, $(MYOS)$(filter $(MYTARGET), POWER8 POWER9 POWER10)),AIXPOW)

think this matches the original intention (but my brain is totally twisted now)

@trex58
Copy link
Author

trex58 commented Jun 19, 2020

Correction to my failure described four comments upper: "I applied the patch only to the power8 case, not for the power6 case (the RPM .spec file makes use of 2 different directories and is applying patches for the power8 only. I was stupid and added my patch in the wrong place). So, that should work. Anyway, I'll have to try your last suggestion which is simpler than mine."

However, discussing of this with my colleague Clément, we still do not understand why gcc -E is used on a .c file to produce a .s file. That looks nonsense. Clément plans to comment.

Moreover, though I did not write Makefiles since ages, I think that there are some ways to not repeat the same code so many times (as done when managing this AIX/Power8 case), using some (more complex) feature of Make. However, I do not remember how.

@Helflym
Copy link

Helflym commented Jun 19, 2020

Hi guys,
Using the same command line (CC) $(CFLAGS) -E -UDOUBLE -UCOMPLEX $< -o filename.s for all inputs seems wrong to me. If the input file is a C file (ie ending with .c), -E will generate another C file. But renaming it .s will tell GCC that it's an ASM file. Thus GCC will not be able to parse it.
That's what is wrong with gcc -maix64 ... -E ../kernel/power/../generic/zgemm_tcopy_2.c -o cgemm_itcopy.s, m4 stuff doesn't matter there.
Note that this is not AIX related, you can try to create a C file and renamed in .s, it's failing on every platforms AFAIK.
It might be possible that older GCC versions or other compilers might allow such things but I'm not sure.

In order to understand the issue behind, I think two questions must first be answered:

  • @kavanabhat could you share us the setup you had when testing your PR ? GCC/XLC ? Compiler version ? AIX version ? etc.
  • Is zgemm_tcopy_2.c also compiled when building Power8 ? I haven't checked but it's possible that this algorithm was implemented in ASM for Power8 but is still using C code for Power6.

My guess is that a true fix, would be to removed these if OS == AIX part when the input files are C files. I think a bit of factorization on that would be much appreciated too.

@brada4
Copy link
Contributor

brada4 commented Jun 19, 2020

@trex58 would be nice if you provide your RPM SPEC file, because it is not produced by OpenBLAS, and interferes with diagnostics, and also check compilation by typing make in OpenBLAS source tree.

@trex58
Copy link
Author

trex58 commented Jun 19, 2020

Hi @martin-frbg The exact line which seems to work is:

ifeq ($(findstring AIXPOW, $(OSNAME)$(filter $(TARGET), POWER8 POWER9 POWER10)), AIXPOW)

I'm now rebuilding all, from scratch and with the RPM .spec file. It will take some time. Then, I'll provide the patch (just replacing: ($(OS), AIX) with: ($(findstring AIXPOW, $(OSNAME)$(filter $(TARGET), POWER8 POWER9 POWER10)), AIXPOW) ).

Once OK for v0.3.8, I'll move to 0.3.10 (more changes I think).

@brada4 The RPM .spec file is OpenSource. I have attached the current one I am using for v0.3.8 within this comment, if you want to have a look at. It has been written by IBM for v0.3.6 and I have quickly adapted it to v0.3.8 . We'll very probably do changes for the final version. Moreover, since we need both the 64bit and 32bit versions, in addition to build power6 and power8 versions, it will manage to build and test the openblas code 4 times.
Since a .spec file makes use of .patch files (3 for this version), I'll have to provide them to you too, if you want to build openblas in your own AIX environment. Let me know. Anyway, wait for the final version, if you can. Moreover, any comment about this .spec file is welcome (since we port SW we have no idea about how it is/must-be done, we may miss important things).
Patch0: openblas-Dont_cross_compile.patch
Patch1: openblas-0.3.8-power8-aix.patch
Patch3: openblas-0.3.8-AIX-TARGETS.patch

openblas-0.3.8-1.spec.txt

@martin-frbg
Copy link
Collaborator

@trex58 sorry, the "MY" parts were leftovers from my experimenting with made-up make variables on x86
@Helflym the $(CC) -E to generate a .s file certainly looks strange, maybe I should not have accepted that PR without (at the time) being able to test it. xlc looks to be the most likely explanation.

@trex58
Copy link
Author

trex58 commented Jun 19, 2020

@brada4 The .spec file I gave you is perfectly OK (on my AIX 7.2 VM). It enabled to build and test 100% for both Power6 and 8, and it generated the RPM files:

Wrote: /opt/freeware/src/packages/SRPMS/openblas-0.3.8-1.src.rpm
Wrote: /opt/freeware/src/packages/RPMS/ppc/openblas-0.3.8-1.aix7.2.ppc.rpm

If you want to experiment with it, you need to have RPM v4.13 installed on your AIX machine. And you need to remove the ".txt" suffix I had to add to the 4 files.

openblas-0.3.8-AIX-TARGETS.patch.txt
openblas-0.3.8-CC-E-S.patch.txt
openblas-0.3.8-power8-aix.patch.txt

@brada4
Copy link
Contributor

brada4 commented Jun 19, 2020

I dont have AIX for a while.
What you suggest to change line containing AIXPOW is actually introduced by their patch (TARGETS), so try without that patch (yes, just comment one line in spec out)

@martin-frbg
Copy link
Collaborator

Hmm. That CC -E thingy does not work with CC set to xlc either. With assembly input, CC gets unhappy and just dumps to stdout rather than writing the new file where the -o option wants it, with
a C source the subsequent compile step chokes on the wrong extension. Just dropping the entire ifdef AIX seems to work best (at least on AIX 7.2 POWER8, gcc 7.2, xlc 13.1.3, maybe things were different with earlier versions). (Need to experiment a bit more though, seems the default build on that machine is 32bit and there are some options not getting passed to the fortran compiler when trying to do a 64bit build,

@kavanabhat
Copy link
Contributor

I see that the fix for #2338 fails for POWER6. Till the fix is available, please run the below commands in the OpenBLAS directory and then initiate the build:

sed -i "s/^.AIX./ifdef AIX_POWER8/g" kernel/Makefile.L3

sed -i "4iifeq ($(CORE), POWER8)\nifeq ($(OS), AIX)\nAIX_POWER8 = 1\nendif\nendif\n" kernel/Makefile.L3

@martin-frbg
Copy link
Collaborator

@kavanabhat could you comment on what AIX and compiler versions your fix was based on ? We have come up with more versatile variations of your changed #ifdef already but what is really unclear is how the $(CC) -E applied to kernels written in both C or assembly was expected to work.

@kavanabhat
Copy link
Contributor

Agreed the extension of the output file for "CC -E ." should have been ".c". These changes worked on POWER8, AIX 7.2 with gcc version "8.1.0". It was even tested for Redhat on POWER8

@kavanabhat
Copy link
Contributor

Was able to successfully build the changes even now with the above configuration.

@trex58
Copy link
Author

trex58 commented Jun 19, 2020

@kavanabhat Good! I'll wait for a patch been available, for testing in my environment.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 20, 2020

Alright, part of my rushed test yesterday was flawed because target autodetection is/was not working correctly on AIX, leading to a power5 build instead of power8. What to make of this however (this is with 0.3.8 for reference), seems to be missing a -B 16384 to require a larger m4 buffer here and in a few other cases:

m4 dtrsm_kernel_lt.s > dtrsm_kernel_lt_nomacros.s

m4:dtrsm_kernel_lt.s:688 1252-219 The parameters to the specified macro
        cannot contain more than 4096 bytes of text.

define(SOLVE_LT_16x4,
(etc.)

Also seen
cgemm_kernel_n_nomacros.s: line 20831: 1252-082 Use more parameters for the instruction.
(line 20831 is an addi 11, 224 that seems to have been present in the form of addi r11 224 already before m4 processing). Do we want to require GNU as here (apparently not preinstalled on the gcc compile farm machine) - the AIX as reports its version as 7.2 ?

@brada4
Copy link
Contributor

brada4 commented Jun 20, 2020

1252-219 The parameters to the specified macro cannot contain more than 4096 bytes of text.

https://www.ibm.com/support/knowledgecenter/ssw_aix_72/m_commands/m4.html
m4 -B 1000000

It is what I tried to figure between aix and gnu m4, it seems that expansion buffer of 4k somehow does not abort conversion, but generates faulty output.

@brada4
Copy link
Contributor

brada4 commented Jun 20, 2020

More:
gnu m4 just ignores all buffer-size parameters "for compatibility", so they are safe to apply even for general case.

@martin-frbg
Copy link
Collaborator

I'd already updated my comment to mention -B 16384, going to even larger values does not appear to affect the subsequent as error (nor would I expect it to)

@kavanabhat
Copy link
Contributor

kavanabhat commented Jun 21, 2020

I haven't hit these m4 issues. What I observe is, '-S' option of gcc outputs the code to stdout when there are still unresolved macros and if not to a '.s' file. Forcing it to output to stdout always and then using m4 to have the macros expanded is the solution. This works for all power versions. @trex58, please use the attached patch and let me know if any issues. Thanks.
openblas-0.3.8-AIX-patch.txt.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 21, 2020

Might actually depend on the version of m4 in use - I suspect my test picked up an original AIX tool in /usr/bin rather than its GNU counterpart from /opt/freeware/bin. Are you calling the GNU m4 ?
EDIT: of course this is a problem for the system m4 only - and the use more parameters error with a few of the assembly kernels is resolved by switching to GNU as as well. (Not quite clear to me why that would not be preinstalled on a GCC compilefarm machine, but then I have only just started using their equipment for testing). Other issues I have come across and will fix
tomorrow (after merging the already pending changes) - BINARY64 does not get passed to gfortran automatically and the dynamic version of the library does not get built by default.

@kavanabhat
Copy link
Contributor

GNU m4 is being invoked.

@trex58
Copy link
Author

trex58 commented Jun 22, 2020

I'm quite lost with what should be done...

On my side, I just can show what happens in my environment (AIX 7.2, GCC 8.4):

  • make BINARY=64 TARGET=POWER8
gcc -maix64 -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -mpowerpc64 -maix64 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sgemm_otcopy -DASMFNAME=sgemm_otcopy_ -DNAME=sgemm_otcopy_ -DCNAME=sgemm_otcopy -DCHAR_NAME=\"sgemm_otcopy_\" -DCHAR_CNAME=\"sgemm_otcopy\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -E -UDOUBLE -UCOMPLEX **../kernel/power/sgemm_tcopy_8_power8.S** -o sgemmotcopy.s
m4 sgemmotcopy.s > sgemmotcopy_nomacros.s
gcc -maix64 -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -mpowerpc64 -maix64 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sgemm_otcopy -DASMFNAME=sgemm_otcopy_ -DNAME=sgemm_otcopy_ -DCNAME=sgemm_otcopy -DCHAR_NAME=\"sgemm_otcopy_\" -DCHAR_CNAME=\"sgemm_otcopy\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -c -UDOUBLE -UCOMPLEX sgemmotcopy_nomacros.s -o sgemm_otcopy.o
rm sgemmotcopy.s sgemmotcopy_nomacros.s

File ../kernel/power/sgemm_tcopy_8_power8.S is made of #define and #include and of assembler code.

  • make BINARY=64 TARGET=POWER6
gcc -maix64 -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION=\"0.3.10\" -mpowerpc64 -maix64 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sgemm_otcopy -DASMFNAME=sgemm_otcopy_ -DNAME=sgemm_otcopy_ -DCNAME=sgemm_otcopy -DCHAR_NAME=\"sgemm_otcopy_\" -DCHAR_CNAME=\"sgemm_otcopy\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -c -UDOUBLE -UCOMPLEX ../kernel/power/gemm_tcopy_4.S -o sgemm_otcopy.o

File : ./kernel/power/gemm_tcopy_4.S is made of some #define and of assembler code.

If I replace the lines:

ifeq ($(findstring AIXPOW, $(OSNAME)$(filter $(TARGET), POWER8 POWER9 POWER10)), AIXPOW)
       $(CC) $(CFLAGS) -E -UDOUBLE -UCOMPLEX $< -o sgemmotcopy.s
       m4 sgemmotcopy.s > sgemmotcopy_nomacros.s
       $(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX sgemmotcopy_nomacros.s -o $@
       rm sgemmotcopy.s sgemmotcopy_nomacros.s

by:

ifeq ($(findstring AIXPOW, $(OSNAME)$(filter $(TARGET), POWER8 POWER9 POWER10)), AIXPOW)
        $(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX $< -o $@

That breaks as:

 gcc -maix64 -Ofast -mcpu=power8 -mtune=power8 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=2 -DMAX_PARALLEL_NUMBER=1 -DVERSION="0.3.10" -mpowerpc64 -maix64 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sgemm_otcopy -DASMFNAME=sgemm_otcopy_ -DNAME=sgemm_otcopy_ -DCNAME=sgemm_otcopy -DCHAR_NAME="sgemm_otcopy_" -DCHAR_CNAME="sgemm_otcopy" -DNO_AFFINITY -I.. -UDOUBLE -UCOMPLEX -c -UDOUBLE -UCOMPLEX ../kernel/power/sgemm_tcopy_8_power8.S -o sgemm_otcopy.o
Assembler:
/tmp//cceKS9Rd.s: line 56: Invalid source character 0x60
/tmp//cceKS9Rd.s: line 56: Error In Syntax
...

However, if I reuse this script and replace ../kernel/power/sgemm_tcopy_8_power8.S by ../kernel/power/gemm_tcopy_4.S , that works !!
So, there is something different in these 2 files, which make compiling first one breaks and compiling second one compile OK.
Looking at the assembler generated, I see line "define(`COPY_4x8...":

# 42 "../kernel/power/sgemm_tcopy_macros_8_power8.S"
define(`COPY_4x8', `
....

is not accepted.
Original code:

#if defined(_AIX)
define(`COPY_4x8', `
#else
.macro COPY_4x8
#endif

        lxvw4x          vs32,   o0,     A0
        lxvw4x          vs33,   o16,    A0
...

So, I'm not an expert of assembler, but it really seems that, on AIX, using m4 is required in case of this Power8-specific file.

@kavanabhat
Copy link
Contributor

@trex58, please use the below patch and let me know if you are able to compile for both power6 and power8. Ensure GNU m4 command is invoked.
https://github.com/xianyi/OpenBLAS/files/4809319/openblas-0.3.8-AIX-patch.txt

@martin-frbg
Copy link
Collaborator

@kavanabhat how about the assembler - I assume now GNU as is required ?
(Apart from that, my tests now suggest that it is not strlctly necessary to replace the CC -E by CC -S plus redirection of the output - it must have been primarily the unwanted application of that call on POWER6 that created the initial confusion, coupled with the build quietly misdetecting POWER8 hardware as POWER5 when no TARGET is supplied.)

@trex58
Copy link
Author

trex58 commented Jun 22, 2020

I'm in several meeting calls this afternoon. Will go back later.

@martin-frbg
Copy link
Collaborator

Another question for those familiar with AIX - is this statement (from #1803 (comment)) correct at all ?

 on AIX, all shared object files ends in .a, not .so.

@edelsohn
Copy link

AIX native shared objects are archives of shared objects. The normal archive file, contains a shared object (shr.o or libopenblas.1.so) or whatever. AIX linker has a mode to look for bare shared objects, but that can introduce other problems.

@edelsohn
Copy link

GNU Assembler does not work on AIX. It is not required.

One can use .S CPP-preprocessed assembly language.

AIX assembler does not understand register with letter prefixes, like v0, fr0. Only numbers.

@edelsohn
Copy link

@trex58 It's not OpenBLAS job to debug AIX assembler.

@edelsohn
Copy link

The Makefile rules that create ".c" or ".s" pre-processed file are explicit. It seems like a bug / typo that ".s" was chosen. One can change the (AIX-specific) rule to use the correct suffixes.

ifeq ($(OS), AIX)
	$(CC) $(CFLAGS) -UDOUBLE -UCOMPLEX -E $< -o cgemm_itcopy.c
	m4 cgemm_itcopy.c > cgemm_itcopy_nomacros.c
	$(CC) $(CFLAGS) -c -UDOUBLE -UCOMPLEX cgemm_itcopy_nomacros.c -o $@
	rm cgemm_itcopy.c cgemm_itcopy_nomacros.c

The assembly file definitely is relying upon m4 (possibly requiring GNU m4) and requires that pre-processing step.

Torbjorn Granlund uses m4 to pre-process PowerPC assembly code in GMP library to work on both Linux and AIX, so there are other working examples.

I don't know exactly what produced

        lxvw4x          vs32,   o0,     A0

but vs32, o0, and A0 are not valid operands for the AIX assembler.

@brada4
Copy link
Contributor

brada4 commented Jun 23, 2020

AIX m4 (and solaris amd64 gives impression it comes from POSIX) requires extending one of three buffers to process bigger macros, just that it is less bug reports if it builds perfectly using "standard" tools, no matter how old that standard is.

kavanabhat added a commit to kavanabhat/OpenBLAS that referenced this issue Jun 24, 2020
@martin-frbg
Copy link
Collaborator

Hopefully all fixed in 0.3.11 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants