-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build of OpenBLAS 0.3.8 breaks on AIX #2671
Comments
The second issue appears within:
still in code of kernel/Makefile.L3 (version 0.3.10) :
|
Weird, the relevant change is from #2338 and according to #1997 earlier versions of OpenBLAS would not have been expected to build on AIX. Having -S here instead of -E does make sense, perhaps the branch used for the PR was a prettied-up (but not tested again) version of the actual work ? (Somehow interconverting the occurences of -S and -E perhaps ?) |
I don't know. 8:30PM here. Will see tomorrow about the 2nd issue. The patch I've attached previously is not the final one. |
Hi Martin, About -E , I now understand that it is used to replace some macro in the ./kernel/generic/zgemm_tcopy_2.c C file. Summary:
However, GCC (v8.4) - based on the suffix .s of cgemm_itcopy_nomacros.s - thinks that it is an assembler code. Thus it tries to compile it as an assembler code, and says:
on the line:
which clearly is C code. Renaming cgemm_itcopy_nomacros.s as cgemm_itcopy_nomacros.c does fix this issue.
So, probably that this code was correctly built with GCC v6 but breaks with GCC v8.4 ?! Moreover, it seems that there are optimizations that may depend on the kind of processor. Details about the difference between cgemm_itcopy.s and cgemm_itcopy_nomacros.s :
|
This looks more like a bug in your GCC installation or include paths to me - types.h not matching the expectations of stdlib.h or something like that, or your m4 is misbehaving. |
Yes. Probably that something has changed. |
No. I have the same behavior with GCC 6.3 on AIX 6.1 :
|
Does it work when you rename the .s to .c again (or add |
Could you try to produce m4 output with AIX m4 (something old shipped with their sendmail) and GNU m4 from linux toolbox, then run diff between files. Please help with the diff output (make it *.txt, edit out private paths if any) uploaded. |
I know very very few about how openblas is built. Anyway, after some experiment, I think that things have been done for AIX that are not required for my AIX environment.
appear in kernel/Makefile.L3 . They were added for Power8 I think. The .spec file I'm using comes from IBM. It defines:
I'm using a Power8 VM, but, for compatibility with older machines for our customers, it was compiled with TARGET=POWER6 . Though changes done within: #2338 clearly have been done for POWER8. By simply replacing "AIX" by "AIX-POWER8" (which does not exist) everywhere in kernel/Makefile.L3 , the else part of "ifeq ($(OS), AIX)" is executed, and everything works perfectly, till the tests, which are all PASSED. However, then, changing the .spec file : make BINARY=64 TARGET=POWER8 , WITHOUT changing AIX by AIX-POWER8, does not succeed: I still have the code breaking at:
So, maybe that there are more options to be set before building, starting with version 0.3.8 ?
However, looking at kernel/power, there are :
so the README.md file needs some update. So, my expectation was that the lines in kernel/Makefile.L3 were added for AIX & POWER8 only, breaking building for POWER6. But that still break when TARGET=POWER8 . |
I have requested an account on the gcc compile farm to be able to debug such issues in the future. |
I'll discuss of this with IBM people, ayappanec, this afternoon. Maybe I am not launching the build correctly. However, looking at their last version of the .spec file, for OpenBlas v0.3.6, they seem to do the same I do, but for power8 and for power6. So, they may not be aware of these more recent changes that appeared in v0.3.8 . Last Minute: When experimenting with POWER8 target, I forgot to remove the patch changing -E by -S , which is wrong when compiling with POWER 8. It's building now. If OK, that means that the changes made for Power8, and named as "ifeq ($(OS), AIX)" are not compatible when building with Power6. So the test should be more detailed: "if OS==AIX and TARGET>=POWER8", something like this. |
Yes. The build with TARGET=POWER8 and without my wrong previous patch -E --> -S goes further, close to running the tests (Updated: all OK). So, I think that the addition of new stuff for POWER8 for AIX , appearing with version 0.3.8, did break building with TARGET=POWER6 for AIX. The test if OS=AIX must be improved to not apply when TARGET=POWER6 . |
Probably something like (untested)
instead of the |
Thx! I do not master this syntax. |
That does not work. Some issue with "(" and " " that I have fixed, but it does not do what we expect. I'm trying to find a solution. |
I think I have a solution, split between kernel/Makefile and kernel/Makefile.L3 . |
The idea is:
It works fine. Tested with Power6 and Power8. Maybe that the variable AIXPOWER8910 could be: AIX_POWER_GE8 rather. Your choice. I have a patch for v0.3.8 . If you want it. |
Hummm Still an issue in Power8 now... Grrr Corrected: I applied the patch only for the power8 case, not for the power6 case (the RPM .spec file makes use of 2 different directories and were applying patches for the power8 only. I was stupid and added my patch in the wrong place). So, that should work. Anyway, I'll have to try your last suggestion (3 comments below) which is simpler than mine. |
Apart from the excess braces I could almost claim I was close, but the filter was reversed. Can you try this please:
EDIT: ugh, sorry, scratch that. it does not fulfill the condition that we want anything non-AIX to take the else branch as well |
Yes. I had already tried this. But it compares AIXPOWER8 to AIX, or AIXPOWER9 to AIX. |
think this matches the original intention (but my brain is totally twisted now) |
Correction to my failure described four comments upper: "I applied the patch only to the power8 case, not for the power6 case (the RPM .spec file makes use of 2 different directories and is applying patches for the power8 only. I was stupid and added my patch in the wrong place). So, that should work. Anyway, I'll have to try your last suggestion which is simpler than mine." However, discussing of this with my colleague Clément, we still do not understand why gcc -E is used on a .c file to produce a .s file. That looks nonsense. Clément plans to comment. Moreover, though I did not write Makefiles since ages, I think that there are some ways to not repeat the same code so many times (as done when managing this AIX/Power8 case), using some (more complex) feature of Make. However, I do not remember how. |
Hi guys, In order to understand the issue behind, I think two questions must first be answered:
My guess is that a true fix, would be to removed these |
@trex58 would be nice if you provide your RPM SPEC file, because it is not produced by OpenBLAS, and interferes with diagnostics, and also check compilation by typing |
Hi @martin-frbg The exact line which seems to work is:
I'm now rebuilding all, from scratch and with the RPM .spec file. It will take some time. Then, I'll provide the patch (just replacing: ($(OS), AIX) with: ($(findstring AIXPOW, Once OK for v0.3.8, I'll move to 0.3.10 (more changes I think). @brada4 The RPM .spec file is OpenSource. I have attached the current one I am using for v0.3.8 within this comment, if you want to have a look at. It has been written by IBM for v0.3.6 and I have quickly adapted it to v0.3.8 . We'll very probably do changes for the final version. Moreover, since we need both the 64bit and 32bit versions, in addition to build power6 and power8 versions, it will manage to build and test the openblas code 4 times. |
@trex58 sorry, the "MY" parts were leftovers from my experimenting with made-up make variables on x86 |
@brada4 The .spec file I gave you is perfectly OK (on my AIX 7.2 VM). It enabled to build and test 100% for both Power6 and 8, and it generated the RPM files:
If you want to experiment with it, you need to have RPM v4.13 installed on your AIX machine. And you need to remove the ".txt" suffix I had to add to the 4 files. openblas-0.3.8-AIX-TARGETS.patch.txt |
I dont have AIX for a while. |
Hmm. That |
I see that the fix for #2338 fails for POWER6. Till the fix is available, please run the below commands in the OpenBLAS directory and then initiate the build: sed -i "s/^.AIX./ifdef AIX_POWER8/g" kernel/Makefile.L3 sed -i "4iifeq ($(CORE), POWER8)\nifeq ($(OS), AIX)\nAIX_POWER8 = 1\nendif\nendif\n" kernel/Makefile.L3 |
@kavanabhat could you comment on what AIX and compiler versions your fix was based on ? We have come up with more versatile variations of your changed |
Agreed the extension of the output file for "CC -E ." should have been ".c". These changes worked on POWER8, AIX 7.2 with gcc version "8.1.0". It was even tested for Redhat on POWER8 |
Was able to successfully build the changes even now with the above configuration. |
@kavanabhat Good! I'll wait for a patch been available, for testing in my environment. |
Alright, part of my rushed test yesterday was flawed because target autodetection is/was not working correctly on AIX, leading to a power5 build instead of power8. What to make of this however (this is with 0.3.8 for reference), seems to be missing a
Also seen |
https://www.ibm.com/support/knowledgecenter/ssw_aix_72/m_commands/m4.html It is what I tried to figure between aix and gnu m4, it seems that expansion buffer of 4k somehow does not abort conversion, but generates faulty output. |
More: |
I'd already updated my comment to mention -B 16384, going to even larger values does not appear to affect the subsequent |
I haven't hit these m4 issues. What I observe is, '-S' option of gcc outputs the code to stdout when there are still unresolved macros and if not to a '.s' file. Forcing it to output to stdout always and then using m4 to have the macros expanded is the solution. This works for all power versions. @trex58, please use the attached patch and let me know if any issues. Thanks. |
Might actually depend on the version of |
GNU m4 is being invoked. |
I'm quite lost with what should be done... On my side, I just can show what happens in my environment (AIX 7.2, GCC 8.4):
File ../kernel/power/sgemm_tcopy_8_power8.S is made of #define and #include and of assembler code.
File : ./kernel/power/gemm_tcopy_4.S is made of some #define and of assembler code. If I replace the lines:
by:
That breaks as:
However, if I reuse this script and replace ../kernel/power/sgemm_tcopy_8_power8.S by ../kernel/power/gemm_tcopy_4.S , that works !!
is not accepted.
So, I'm not an expert of assembler, but it really seems that, on AIX, using m4 is required in case of this Power8-specific file. |
@trex58, please use the below patch and let me know if you are able to compile for both power6 and power8. Ensure GNU m4 command is invoked. |
@kavanabhat how about the assembler - I assume now GNU |
I'm in several meeting calls this afternoon. Will go back later. |
Another question for those familiar with AIX - is this statement (from #1803 (comment)) correct at all ?
|
AIX native shared objects are archives of shared objects. The normal archive file, contains a shared object (shr.o or libopenblas.1.so) or whatever. AIX linker has a mode to look for bare shared objects, but that can introduce other problems. |
GNU Assembler does not work on AIX. It is not required. One can use .S CPP-preprocessed assembly language. AIX assembler does not understand register with letter prefixes, like v0, fr0. Only numbers. |
@trex58 It's not OpenBLAS job to debug AIX assembler. |
The Makefile rules that create ".c" or ".s" pre-processed file are explicit. It seems like a bug / typo that ".s" was chosen. One can change the (AIX-specific) rule to use the correct suffixes.
The assembly file definitely is relying upon m4 (possibly requiring GNU m4) and requires that pre-processing step. Torbjorn Granlund uses m4 to pre-process PowerPC assembly code in GMP library to work on both Linux and AIX, so there are other working examples. I don't know exactly what produced
but |
AIX m4 (and solaris amd64 gives impression it comes from POSIX) requires extending one of three buffers to process bigger macros, just that it is less bug reports if it builds perfectly using "standard" tools, no matter how old that standard is. |
Hopefully all fixed in 0.3.11 now |
Hi,
Between v0.3.7 and v0.3.8, a change was done in file kernel/Makefile.L3 , and thus building v0.3.8 breaks in my AIX 7.2 environment though building v0.3.7 builds/tests OK.
The following lines were added in line 529 of Makefile.L3 :
Which generates:
File cgemm_itcopy.s starts with:
This is due to the "-E" option used in line:
This option requires the C compiler (gcc here) to run the C Preprocessor only. Thus the generated file cgemm_itcopy.s is kind of C source code but not assembler.
I guess that the correct option to use is: -S .
Changing -E by -S in all places (all for AIX) does fix the issue. OpenBlas 0.3.8 now builds... up to some other error I have to study.
This patch is WRONG. Do not use it.
openblas-0.3.8-CC-E-S.patch.txt
The text was updated successfully, but these errors were encountered: