Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on PPC970, resulting in [some] dependents of OpenBLAS being unusable #5034

Closed
barracuda156 opened this issue Dec 29, 2024 · 45 comments · Fixed by #5045 or #5048
Closed

Regression on PPC970, resulting in [some] dependents of OpenBLAS being unusable #5034

barracuda156 opened this issue Dec 29, 2024 · 45 comments · Fixed by #5045 or #5048
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS

Comments

@barracuda156
Copy link
Contributor

barracuda156 commented Dec 29, 2024

I have built OpenBLAS from the current master 17803e7 and it turns out the library is broken on Darwin PowerPC.
Updated. Release 0.3.28 works fine without native optimizations, but exhibits the same problem when optimizations are enabled.

Specifically, I tried building py-scipy against the new OpenBLAS, and it failed at configure. Meson was not really helpful with the output, however eventually I found where exactly configure fails and just tried the same command outside of the build:

/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 -c 'import os; os.chdir(".."); import numpy; print(numpy.get_include())'

It led to a bus error:

Process:         Python [1164]
Path:            /opt/local/Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       PPC (Native)
Parent Process:  bash [313]

Date/Time:       2024-12-29 11:54:28.182 +0800
OS Version:      Mac OS X 10.6.8 (10K549)
Report Version:  6

Anonymous UUID:                      BC8371F6-179E-4C81-B2D5-052FA5025D07

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000003
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libopenblas.0.dylib           	0x01c200c0 dgeqr2_ + 316

Thread 1:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 2:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 3:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 0 crashed with PPC Thread State 32:
  srr0: 0x01c200c0  srr1: 0x0200f030   dar: 0x00000003 dsisr: 0x40000000
    r0: 0x01c20160    r1: 0xbfffc980    r2: 0x00000001    r3: 0x00000000
    r4: 0x00000008    r5: 0x00000660    r6: 0x0631d8a0    r7: 0x0213b704
    r8: 0x0631d8c8    r9: 0x00000008   r10: 0x0631d918   r11: 0xbfffc980
   r12: 0x48244402   r13: 0x00000003   r14: 0x0631d8d0   r15: 0x0213ff94
   r16: 0x00000004   r17: 0x00000002   r18: 0xbfffc9c8   r19: 0x00000004
   r20: 0x00000002   r21: 0x00000003   r22: 0x00000005   r23: 0x0213b708
   r24: 0x00000030   r25: 0xbfffcd28   r26: 0x0213b704   r27: 0x0631d8a0
   r28: 0xbfffcaa8   r29: 0x06bc4408   r30: 0x0631d8f8   r31: 0x01c1ff94
    cr: 0x82244444   xer: 0x20000000    lr: 0x01c20160   ctr: 0x00000000
vrsave: 0xc3fc0000

Binary Images:
    0x1000 -     0x2fff +org.python.python 3.12.8 (3.12.8) <D0AEA538-7475-6388-5F1B-2AD80948C399> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
    0x5000 -   0x2dbffb +org.python.python 3.12.8, (c) 2001-2023 Python Software Foundation. (3.12.8) <65C58188-686B-2103-DFA0-4803AD869824> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/Python
  0x5ba000 -   0x5cdfff +libintl.8.dylib ??? (???) <1D434055-DFEA-0613-38DF-82D9591B78F2> /opt/local/lib/libintl.8.dylib
  0x5d3000 -   0x6d8fff +libiconv.2.dylib ??? (???) <1AE293B0-F61B-7FC5-D35C-A157C464B293> /opt/local/lib/libiconv.2.dylib
  0x7e4000 -   0x7eaff7 +libgcc_s.1.1.dylib ??? (???) <5935AE2A-933C-B949-4655-7E4EFEEB13B3> /opt/local/lib/libgcc/libgcc_s.1.1.dylib
  0x7ed000 -   0x7f7fff +math.cpython-312-darwin.so ??? (???) <9974D2E5-98CA-3583-98A1-9A85E453385B> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/math.cpython-312-darwin.so
  0xae0000 -   0xeeeffb +_multiarray_umath.cpython-312-darwin.so ??? (???) <EF61165A-0B54-DC2B-83B9-884FF2FC42DF> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/core/_multiarray_umath.cpython-312-darwin.so
  0xf70000 -   0xf7effb +_datetime.cpython-312-darwin.so ??? (???) <F95B693A-4709-341F-2CBF-CD791C65F5BE> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_datetime.cpython-312-darwin.so
  0xfc9000 -   0xfcffff +_struct.cpython-312-darwin.so ??? (???) <3E52A171-E5E7-78CF-CDD1-3F7B0E136DBA> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_struct.cpython-312-darwin.so
  0xfd6000 -   0xfe7fff +_pickle.cpython-312-darwin.so ??? (???) <0FA4D399-8FDB-90A5-1597-D44CE36191D1> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_pickle.cpython-312-darwin.so
  0xff0000 -   0xff0ffd +_contextvars.cpython-312-darwin.so ??? (???) <CA525B06-2B7F-CA6F-319B-8050E85054BD> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_contextvars.cpython-312-darwin.so
  0xff3000 -   0xff8ff7 +libffi.8.dylib ??? (???) <8F9EBF03-F8E7-5AD2-C402-3C6BC4159824> /opt/local/lib/libffi.8.dylib
  0xffb000 -   0xffcfff +_random.cpython-312-darwin.so ??? (???) <CEFE7584-06C8-6FED-CB93-19A636264F80> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_random.cpython-312-darwin.so
 0x1800000 -  0x2159fef +libopenblas.0.dylib ??? (???) <C4508703-E162-747D-458C-7CE0870CBAA8> /opt/local/lib/libopenblas.0.dylib
 0x21c0000 -  0x229efff +libgfortran.5.dylib ??? (???) <47073CEF-25C9-8824-5B40-235211C61203> /opt/local/lib/libgcc/libgfortran.5.dylib
 0x62ee000 -  0x62fafff +_pocketfft_internal.cpython-312-darwin.so ??? (???) <76FE4F65-472D-285B-9569-F995B452BFF3> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/fft/_pocketfft_internal.cpython-312-darwin.so
 0x62fd000 -  0x62fdffe +_opcode.cpython-312-darwin.so ??? (???) <74041AE0-157D-31E2-7535-38B593EF734C> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_opcode.cpython-312-darwin.so
 0x64e2000 -  0x64e6fff +binascii.cpython-312-darwin.so ??? (???) <3A7F7D74-80F1-7253-F579-2D454065776D> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/binascii.cpython-312-darwin.so
 0x64ea000 -  0x64fcffc +libz.1.dylib ??? (???) <A7D052E5-99BB-A5E8-2179-2F973818B7B0> /opt/local/lib/libz.1.dylib
 0x66e5000 -  0x66ffffb +_multiarray_tests.cpython-312-darwin.so ??? (???) <0CB4ADED-CB1F-C4C5-0B05-C688B6D0C7EB> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/core/_multiarray_tests.cpython-312-darwin.so
 0x6786000 -  0x6799ffb +_ctypes.cpython-312-darwin.so ??? (???) <B7E19F64-4174-D641-46B3-6F262DB84F66> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_ctypes.cpython-312-darwin.so
 0x67e7000 -  0x67eeffc +_hashlib.cpython-312-darwin.so ??? (???) <E17B6DC3-2569-29CB-2A93-BDD10BCC9CC6> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_hashlib.cpython-312-darwin.so
 0x67f5000 -  0x67f7ffd +_bisect.cpython-312-darwin.so ??? (???) <19DFDC81-D870-737B-B133-BF80B2CEDE44> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bisect.cpython-312-darwin.so
 0x7040000 -  0x706affb +_umath_linalg.cpython-312-darwin.so ??? (???) <7C5B0CC5-EEA7-561B-DB07-5C70A52FD71E> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/linalg/_umath_linalg.cpython-312-darwin.so
 0x7131000 -  0x71cffff +mtrand.cpython-312-darwin.so ??? (???) <9791A75F-CCB1-504E-EBD4-A9600F6D59D8> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/mtrand.cpython-312-darwin.so
 0x71dc000 -  0x7207ff7 +bit_generator.cpython-312-darwin.so ??? (???) <FE7AD0F1-8C54-95D8-820B-B53A6329BA77> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/bit_generator.cpython-312-darwin.so
 0x7211000 -  0x7247fff +_common.cpython-312-darwin.so ??? (???) <B41A4952-873F-D0EF-BBEF-280C462F60B7> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_common.cpython-312-darwin.so
 0x728e000 -  0x7518feb +libcrypto.3.dylib ??? (???) <9CEBA2D6-5BF6-7348-22EA-8617A351B3C7> /opt/local/libexec/openssl3/lib/libcrypto.3.dylib
 0x75fd000 -  0x7605fff +_blake2.cpython-312-darwin.so ??? (???) <3CC5A3A2-FD20-B285-234B-7EB415D90DA1> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_blake2.cpython-312-darwin.so
 0x760a000 -  0x7618ffc +_sha2.cpython-312-darwin.so ??? (???) <7833E3B3-77F7-3E27-DE4B-AF6987744D1A> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_sha2.cpython-312-darwin.so
 0x761e000 -  0x7668fff +_bounded_integers.cpython-312-darwin.so ??? (???) <351D33E9-5FE9-BBA5-AE49-414CA4A5A508> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_bounded_integers.cpython-312-darwin.so
 0x766e000 -  0x7683ff7 +_mt19937.cpython-312-darwin.so ??? (???) <F256145F-8235-89E0-1C8D-45ADD06F1990> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_mt19937.cpython-312-darwin.so
 0x7688000 -  0x769afff +_philox.cpython-312-darwin.so ??? (???) <15E10E6B-2ACA-E3CD-FBF3-8C0511812F17> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_philox.cpython-312-darwin.so
 0x769f000 -  0x76b7fff +_pcg64.cpython-312-darwin.so ??? (???) <42E581C4-8C67-67A6-CAE4-94CF53EFD231> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_pcg64.cpython-312-darwin.so
 0x76bd000 -  0x76c7fff +_sfc64.cpython-312-darwin.so ??? (???) <F821E00E-A55B-B8AC-E88A-8FDED4690F34> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_sfc64.cpython-312-darwin.so
 0x76cc000 -  0x7794ff7 +_generator.cpython-312-darwin.so ??? (???) <4F23FF47-D8BE-63CF-314B-2A5192F79A42> /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/random/_generator.cpython-312-darwin.so
0x8fe00000 - 0x8fe3d7c7  dyld 132.1 (???) <25642EAE-381E-F8E4-65D7-202623802C57> /usr/lib/dyld
0x90003000 - 0x9002bff3  com.apple.DictionaryServices 1.1.2 (1.1.2) <28D18429-6770-B127-886A-D632AEC97FB0> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/DictionaryServices.framework/Versions/A/DictionaryServices
0x90656000 - 0x907e4ff3  libicucore.A.dylib ??? (???) <3E465D6F-535F-E51D-93F1-7464C8AA934C> /usr/lib/libicucore.A.dylib
0x90886000 - 0x90893fff  libkxld.dylib ??? (???) <7C1EE264-5B42-1FBA-AA8B-B0F011D476EB> /usr/lib/system/libkxld.dylib
0x9094e000 - 0x9098bfff  com.apple.SystemConfiguration 1.10.8 (1.10.2) <40364E85-F915-7EA1-98A4-4AFC1FFB2D3D> /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration
0x9098c000 - 0x909b8fff  libxslt.1.dylib ??? (???) <88189D5E-6F3D-BC47-0D34-8EA6465D0277> /usr/lib/libxslt.1.dylib
0x909b9000 - 0x909c2fff  com.apple.DiskArbitration 2.3 (2.3) <F5681EE4-97D8-4145-CCAD-51F78D899D80> /System/Library/Frameworks/DiskArbitration.framework/Versions/A/DiskArbitration
0x909c3000 - 0x909d1fff  libz.1.dylib ??? (???) <3B22B5A3-9A90-EF48-84D9-07E6D0A1EDBF> /usr/lib/libz.1.dylib
0x9109a000 - 0x910d3ff3  com.apple.AE 496.5 (496.5) <D9E8C3F3-37C1-6BFA-FB1F-5D8C884D84CE> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/AE.framework/Versions/A/AE
0x910d4000 - 0x911e5fff  libxml2.2.dylib ??? (???) <F89A83D2-BF18-C82B-717B-6FCD96B8B7D0> /usr/lib/libxml2.2.dylib
0x911e6000 - 0x911ebffe  libmathCommon.A.dylib ??? (???) <C74C35F1-C121-4418-6BFF-503AA9D0B889> /usr/lib/system/libmathCommon.A.dylib
0x9146c000 - 0x91521ffb  com.apple.CFNetwork 454.12.4 (454.12.4) <F25C54B7-3D47-E79D-6443-F8595690907B> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CFNetwork.framework/Versions/A/CFNetwork
0x9156e000 - 0x916dbff7  com.apple.CoreFoundation 6.6.6 (550.44) <01239ACC-BF77-CD89-FB11-62612ADA6BBE> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
0x91c00000 - 0x91c43ff3  libauto.dylib ??? (???) <7F36ED55-D49F-3E2C-6330-6F0F7F645AFA> /usr/lib/libauto.dylib
0x91c52000 - 0x91cb2fff  com.apple.framework.IOKit 2.0 (???) <71C8FB5C-FE64-8001-2DA1-EF8BC7E3273A> /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit
0x91cd3000 - 0x91d88ffb  libsqlite3.dylib ??? (???) <BC14CA88-E9DA-5E7E-B624-D36846C54574> /usr/lib/libsqlite3.dylib
0x91dc4000 - 0x9210dffb  com.apple.CoreServices.CarbonCore 861.39 (861.39) <5F30A629-B51D-8D46-7323-E3F368CCBE61> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
0x92253000 - 0x92267fff  libbsm.0.dylib ??? (???) <AE1E2515-4ED9-9AFD-0DAF-ADFA421D39C5> /usr/lib/libbsm.0.dylib
0x92268000 - 0x922dbff7  libstdc++.6.dylib ??? (???) <3B1C6DCD-B165-49B8-060D-9FCC4C09FBDE> /usr/lib/libstdc++.6.dylib
0x922dc000 - 0x9238eff3  libobjc.A.dylib ??? (???) <37861DF9-840C-8F1C-57E9-2E4D4CF4A5D3> /usr/lib/libobjc.A.dylib
0x924b5000 - 0x9268dffb  libSystem.B.dylib ??? (???) <45298DD5-31D9-AE52-CCE7-D02224C22208> /usr/lib/libSystem.B.dylib
0x926c5000 - 0x92769ff3  com.apple.LaunchServices 362.3 (362.3) <8EEC6B1A-5174-443D-942A-C253358D206C> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/LaunchServices
0x927a2000 - 0x92828fff  com.apple.SearchKit 1.3.0 (1.3.0) <FDCCACEC-1294-9871-DBDD-DCC0C53F8169> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SearchKit.framework/Versions/A/SearchKit
0x9289d000 - 0x928abffe  com.apple.NetFS 3.2.2 (3.2.2) <1B167948-4975-38B7-492A-E1540FE665D7> /System/Library/Frameworks/NetFS.framework/Versions/A/NetFS
0x92940000 - 0x92bb6ff7  com.apple.security 6.0 (36910) <F73CE32C-B154-86BC-2B13-966B9DB0CD44> /System/Library/Frameworks/Security.framework/Versions/A/Security
0x92bb7000 - 0x92bfeff7  com.apple.Metadata 10.6.3 (507.15) <6AB3D3B7-BFEA-6024-8BA6-F639E3920670> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
0x92c02000 - 0x92ce0ff7  com.apple.CoreServices.OSServices 359.2 (359.2) <6A759D29-CB94-342E-45A7-5F1684A0FEEC> /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSServices.framework/Versions/A/OSServices
0x92d93000 - 0x92d93ffa  com.apple.CoreServices 44 (44) <EEDF28CB-C144-9A7B-7467-32EB629CC620> /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices
0xffff8000 - 0xffff9703  libSystem.B.dylib ??? (???) <45298DD5-31D9-AE52-CCE7-D02224C22208> /usr/lib/libSystem.B.dylib

Model: PowerMac11,2, BootROM 5.2.7f1, 4 processors, PowerPC G5 (1.1), 2.5 GHz, 16 GB, SMC 
Graphics: NVIDIA Quadro FX 4500, Quadro FX 4500, PCIe, 512 MB
Memory Module: DIMM0/J6700, 2 GB, DDR2 SDRAM, PC2-3200U-288
Memory Module: DIMM1/J6800, 2 GB, DDR2 SDRAM, PC2-3200U-288
Memory Module: DIMM2/J6900, 2 GB, DDR2 SDRAM, PC2-4200U-444
Memory Module: DIMM3/J7000, 2 GB, DDR2 SDRAM, PC2-4200U-444
Memory Module: DIMM4/J7100, 2 GB, DDR2 SDRAM, PC2-3200U-288
Memory Module: DIMM5/J7200, 2 GB, DDR2 SDRAM, PC2-3200U-288
Memory Module: DIMM6/J7300, 2 GB, DDR2 SDRAM, PC2-3200U-288
Memory Module: DIMM7/J7400, 2 GB, DDR2 SDRAM, PC2-3200U-288
Network Service: Ethernet 1, Ethernet, en0
PCI Card: Quadro FX 4500, Display, SLOT-1
PCI Card: Apple 5714, network, GIGE
PCI Card: Apple 5714, network, GIGE
PCI Card: pci16b8,6a02, AHCI Controller, SLOT-4
Serial ATA Device: WDC WD2500JS-41MVB1, 232.89 GB
Serial ATA Device: VO0600ECHPP, 558.91 GB
Parallel ATA Device: HL-DT-ST DVD-RW GWA-4165B, 744.6 MB
USB Device: USB Keyboard, 0x04f2  (Chicony Electronics Co., Ltd.), 0x1516, 0x0b200000
USB Device: HP USB Optical Mouse, 0x03f0  (Hewlett Packard), 0x134a, 0x2b200000
FireWire Device: unknown_device, unknown_value, Unknown
FireWire Device: unknown_device, unknown_value, Unknown

Switching to 0.3.28, I was able to configure py-scipy normally, and the build runs now with no issues. Nothing else changed, I did not even have to rebuild py-numpy, only switch OpenBLAS from 17803e7 to the release.

Importantly, MacPorts build OpenBLAS-devel (the one tracking upstream) with native optimizations, while releases are built without those.

@barracuda156
Copy link
Contributor Author

barracuda156 commented Dec 29, 2024

@martin-frbg Could you please take a look? I can do bisecting is needed, but it will be helpful to have an idea what is likely to have gone wrong. Otherwise it will take forever, OpenBLAS is not a quick thing to build on a G5.

Update. I will try to rebuild 0.3.28 with native opts to make sure if the regression happened in fact after it or just went unnoticed.

@barracuda156 barracuda156 changed the title Regression on PowerPC, resulting in dependents of OpenBLAS being unusable Regression on PowerPC, resulting in [some] dependents of OpenBLAS being unusable Dec 29, 2024
@barracuda156
Copy link
Contributor Author

@martin-frbg Turns out 0.3.28 is also broken if built with optimizations. So it is not something after 0.3.28 for sure.

Could be optimization-specific bug in OpenBLAS, could be a bug in GCC.

@martin-frbg
Copy link
Collaborator

Would you happen to know which older version definitely worked for you ? This is probably an older bug, if it is a bug in OpenBLAS at all - last time I changed something for the old G-series was 0.3.26 I think, and most BLAS kernels are likely original GotoBLAS material from 15 years ago. Any new "POWERPC" code is targeting at least POWER8, and I do not have any G4/G5 era hardware

@barracuda156
Copy link
Contributor Author

@martin-frbg We had this issue earlier, related to optimizations: #4376
Not sure if it is related to this issue.

It is hard to say which version worked for sure (i.e. worked with optimizations enabled), since the criterion is unclear (I mean, the bug may have been present but was unnoticed). Everything was building against both versions of OpenBLAS (with the exception of the issue linked), and I was using OpenBLAS-devel quite often, I think. However that in itself does not guarantee that it was built correctly.

If this is a possible bug in GCC, I will bring it to upstream, but I need some meaningful and minimal test case to reproduce the problem.

Could you advise what to check specifically, given the context we have?

@martin-frbg
Copy link
Collaborator

OK, that would be the change in 0.3.26 that I remembered - but I do not think I wii be able to be of much help here. If the error is intermittent, maybe it depends on register use or data alignment - I can look into that, but the ppc970 is an ancient and unfamiliar cpu to me.

@martin-frbg martin-frbg changed the title Regression on PowerPC, resulting in [some] dependents of OpenBLAS being unusable Regression on PPC970, resulting in [some] dependents of OpenBLAS being unusable Dec 29, 2024
@barracuda156
Copy link
Contributor Author

I can certainly build 2–3 releases with +native on G5 to see if at least some will not exhibit the error. If you will recall some specific commits which potentially may be problematic, I can try bisecting those.

@martin-frbg
Copy link
Collaborator

The thing is that I do not think any recent commit could have been particularly problematic, as all ongoing powerpc development is focused on current server cpus. If there is a way for you to figure out from your Python backtrace which BLAS function is causing the crash, I can take a closer look - but there should be minimal overlap, if any, between the optimized kernels on a POWER9 or 10 cpu and what gets invoked on a 20+ year old desktop machine.
If MacPorts manages to provide binaries, at least it seems to be something specific enough that it does not cause any of the build-time tests to blow up - unless they disable those, or are doing cross-builds without later testing on actual G5 hardware.

@martin-frbg
Copy link
Collaborator

That said, what (I) changed in 0.3.28 was the SCAL kernel in #4807 - introducing an additional variable and conditional into the assembly code that is used across multiple ppc cpus. But if that commit is broken, it should have terminated the build on more that one target, no matter if "native" optimization is enabled or not...

@barracuda156
Copy link
Contributor Author

@martin-frbg Thank you. The log seems to mention dgeqr2, though I have no idea whether it was the actual reason for a crash.

P. S. MacPorts does not build anything at all on PowerPC, buildbots are not there. I am building some ports (not in a reference environment), but it is unfeasible to test everything which is being built. I try to test important stuff though or when needed to debug something, and OpenBLAS is important certainly.

@martin-frbg
Copy link
Collaborator

thanks - DGEQR2 is (unoptimized Fortran) LAPACK but the call graph at https://www.netlib.org/lapack/explore-html/d6/da5/group__geqr2_ga0ff91490bc2e246cabb8fe02f3f1da97.html touches DSCAL so it could have been me (if the new variable lives at a different stack location on MacOS, which I have no way of testing). Reverting #4807 should cause a number of test failures unless you disable the SCAL-related utests - this is an unfortunate trade-off berween fixing the NaN handling in user-generated calls vs. the naive behaviour of SCAL when called by other BLAS functions, often as a quick way to zero an array

@martin-frbg
Copy link
Collaborator

Looking at this again (and on more than just a phone screen), it occurs to me that (1) if the DSCAL changes were wrong, they should blow up both with and without native optimizations - if by native optimizations you mean compiler flags (?)
and (2) the backtrace ending in dgeqr2 rather than any function called from it suggests that the problem already occurs in whatever optimized machine code the Fortran compiler creates from Reference-LAPACKs dgeqr2 code that OpenBLAS includes. Perhaps you could try including DEBUG=1 (equivalent to the -g compiler flag) in your build flags for OpenBLAS (if that is not already there) to see if we can get a more exact failure location ?

@martin-frbg martin-frbg added the Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS label Dec 29, 2024
@martin-frbg
Copy link
Collaborator

Maybe time to bring GCC's Iain Sandoe back into the loop (though I think he'd want a very small and self-contained reproducer while we're not even sure where we are exactly) ? Or try dropping the altivec from the compile flags now too, although it seemed not to matter when you tried in #4376 ?
Given the experience with the earlier issue, I'm labeling this as a compiler issue now until there is evidence that OpenBLAS' code is actually at fault here.

@barracuda156
Copy link
Contributor Author

I will try your suggestion, thanks.
Is the likely bug in gfortran specifically? I can ask in gfortran mailing list about the issue too.

@barracuda156
Copy link
Contributor Author

@martin-frbg OpenBLAS from 20240704 with native opts (which adds -mtune=native -maltivec) built with gcc14 does not seem to exhibit the problem. I verified this on another installation where I had that version pre-built, so it still does not exclude possibilities that a bug is in gcc or as. I will rebuild now on the same system to make sure.

@martin-frbg
Copy link
Collaborator

gfortran would be implied as the culprit if the crash is in DGEQR2, simply because dgeqr2 is LAPACK code copied from the Reference-LAPACK project, which is entirely written in Fortran.
I assume -mtune=970 and-mtune=native will be identical when compiling on G5, but it is not clear to me what influence they could have on code that is already written in assembly

@barracuda156
Copy link
Contributor Author

I am running the build of OpenBLAS from e1eef56 (20240704) now on a system where the failure was observed, so soon I will know if it is broken or now.
You are right, -mtune=970 is added, but indeed those are identical, AFAIK (on a G5).

On another test system where 20240704 worked it was built with gcc 14.1.0, so we got different gcc versions. I just thought whether gcc config matters: I have an optimized version with was built with --with-tune-cpu=G5 and a generic one.

So there are a number of variables here which may affect the result. I will run several builds to exclude, hopefully, some possibilities, at least those which do not require rebuilding gcc itself.

@barracuda156
Copy link
Contributor Author

@martin-frbg Okay, e1eef56 +native works fine with gcc 14.2.0 build with --with-tune-cpu=G5. So the only possibility is OpenBLAS regression, IMO.

36-39:~ svacchanda\$ port -v installed gcc14
The following ports are currently installed:
  gcc14 @14.2.0_1+G5+stdlib_flag (active) requested_variants='+G5+stdlib_flag-universal' platform='darwin 10' archs='ppc' date='2024-12-26T10:49:25+0800'

36-39:~ svacchanda\$ port -v installed OpenBLAS-devel
The following ports are currently installed:
  OpenBLAS-devel @20240704-e1eef56e_0+gcc14+lapack+native (active) requested_variants='' platform='darwin 10' archs='ppc' date='2024-12-30T17:07:17+0800'
  OpenBLAS-devel @20241225-17803e79_0+gcc14+lapack+native requested_variants='' platform='darwin 10' archs='ppc' date='2024-12-29T02:13:21+0800'
--->  Building py312-scipy
Executing:  cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1" && /opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 -m build --no-isolation --wheel --outdir /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work -Csetup-args=-Dblas=openblas -Csetup-args=-Dlapack=openblas -Csetup-args=-Dpkg_config_path=/opt/local/lib/pkgconfig 
* Getting build dependencies for wheel...
* Building wheel...
+ meson-3.12 setup /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-ekhal5ad -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md -Dblas=openblas -Dlapack=openblas -Dpkg_config_path=/opt/local/lib/pkgconfig --native-file=/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-ekhal5ad/meson-python-native-file.ini
The Meson build system
Version: 1.6.1
Source dir: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1
Build dir: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-ekhal5ad
Build type: native build
Project name: scipy
Project version: 1.14.1
C compiler for the host machine: /opt/local/bin/gcc-mp-14 (gcc 14.2.0 "gcc-mp-14 (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
C linker for the host machine: /opt/local/bin/gcc-mp-14 ld64 97.17
C++ compiler for the host machine: /opt/local/bin/g++-mp-14 (gcc 14.2.0 "g++-mp-14 (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
C++ linker for the host machine: /opt/local/bin/g++-mp-14 ld64 97.17
Cython compiler for the host machine: cython (cython 3.0.11)
Host machine cpu family: ppc
Host machine cpu: power macintosh
Program python found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12)
Found pkg-config: YES (/opt/local/bin/pkg-config) 0.29.2
Run-time dependency python found: YES 3.12
Program cython found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/cython)
Compiler for C supports arguments -Wno-unused-but-set-variable: YES 
Compiler for C supports arguments -Wno-unused-function: YES 
Compiler for C supports arguments -Wno-conversion: YES 
Compiler for C supports arguments -Wno-misleading-indentation: YES 
Library m found: YES
Fortran compiler for the host machine: /opt/local/bin/gfortran-mp-14 (gcc 14.2.0 "GNU Fortran (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
Fortran linker for the host machine: /opt/local/bin/gfortran-mp-14 ld64 97.17
Compiler for Fortran supports arguments -Wno-conversion: YES 
Compiler for C supports link arguments -Wl,-dead_strip: NO 
Checking if "-Wl,--version-script" : links: NO 
Program tools/generate_f2pymod.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/tools/generate_f2pymod.py)
Program scipy/_build_utils/tempita.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/_build_utils/tempita.py)
Program pythran found: YES 0.17.0 0.17.0 (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/pythran)
Found CMake: /opt/local/bin/cmake (3.31.2)
WARNING: CMake Toolchain: Failed to determine CMake compilers state
Run-time dependency xsimd found: NO (tried pkgconfig, framework and cmake)
Run-time dependency threads found: YES
Library npymath found: YES
pybind11-config found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/pybind11-config) 2.13.6
Run-time dependency pybind11 found: YES 2.13.6
Program f2py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/f2py)
Run-time dependency scipy-openblas found: NO (tried pkgconfig)
Run-time dependency openblas found: YES 0.3.27.dev
Dependency openblas found: YES 0.3.27.dev (cached)
Program ../tools/version_utils.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/../tools/version_utils.py)
Compiler for C supports arguments -Wno-maybe-uninitialized: YES 
Compiler for C supports arguments -Wno-discarded-qualifiers: YES 
Compiler for C supports arguments -Wno-empty-body: YES 
Compiler for C supports arguments -Wno-implicit-function-declaration: YES 
Compiler for C supports arguments -Wno-parentheses: YES 
Compiler for C supports arguments -Wno-switch: YES 
Compiler for C supports arguments -Wno-unused-label: YES 
Compiler for C supports arguments -Wno-unused-result: YES 
Compiler for C supports arguments -Wno-unused-variable: YES 
Compiler for C++ supports arguments -Wno-bitwise-instead-of-logical: NO 
Compiler for C++ supports arguments -Wno-cpp: YES 
Compiler for C++ supports arguments -Wno-class-memaccess: YES 
Compiler for C++ supports arguments -Wno-deprecated-declarations: YES 
Compiler for C++ supports arguments -Wno-deprecated-builtins: NO 
Compiler for C++ supports arguments -Wno-format-truncation: YES 
Compiler for C++ supports arguments -Wno-non-virtual-dtor: YES 
Compiler for C++ supports arguments -Wno-sign-compare: YES 
Compiler for C++ supports arguments -Wno-switch: YES 
Compiler for C++ supports arguments -Wno-terminate: YES 
Compiler for C++ supports arguments -Wno-unused-but-set-variable: YES 
Compiler for C++ supports arguments -Wno-unused-function: YES 
Compiler for C++ supports arguments -Wno-unused-local-typedefs: YES 
Compiler for C++ supports arguments -Wno-unused-variable: YES 
Compiler for C++ supports arguments -Wno-int-in-bool-context: YES 
Compiler for Fortran supports arguments -Wno-argument-mismatch: YES 
Compiler for Fortran supports arguments -Wno-conversion: YES (cached)
Compiler for Fortran supports arguments -Wno-intrinsic-shadow: YES 
Compiler for Fortran supports arguments -Wno-maybe-uninitialized: YES 
Compiler for Fortran supports arguments -Wno-surprising: YES 
Compiler for Fortran supports arguments -Wno-uninitialized: YES 
Compiler for Fortran supports arguments -Wno-unused-dummy-argument: YES 
Compiler for Fortran supports arguments -Wno-unused-label: YES 
Compiler for Fortran supports arguments -Wno-unused-variable: YES 
Compiler for Fortran supports arguments -Wno-tabs: YES 
Compiler for Fortran supports arguments -Wno-argument-mismatch: YES (cached)
Compiler for Fortran supports arguments -Wno-conversion: YES (cached)
Compiler for Fortran supports arguments -Wno-maybe-uninitialized: YES (cached)
Compiler for Fortran supports arguments -Wno-unused-dummy-argument: YES (cached)
Compiler for Fortran supports arguments -Wno-unused-label: YES (cached)
Compiler for Fortran supports arguments -Wno-unused-variable: YES (cached)
Compiler for Fortran supports arguments -Wno-tabs: YES (cached)
Checking if "Check atomic builtins without -latomic" : links: NO 
Library atomic found: YES
Checking if "Check atomic builtins with -latomic" with dependency -latomic: links: YES 
Configuring __config__.py using configuration
Checking for function "open_memstream" : NO 
Configuring messagestream_config.h using configuration
Program _generate_pyx.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/special/_generate_pyx.py)
Program _generate_pyx.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/linalg/_generate_pyx.py)
Program ../_build_utils/echo.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/linalg/../_build_utils/echo.py)
Compiler for Fortran supports arguments -w: YES 
Program ../_generate_sparsetools.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/sparse/sparsetools/../_generate_sparsetools.py)
Checking for size of "void*" : 4 
Compiler for Fortran supports arguments -w: YES (cached)
Build targets in project: 197

scipy 1.14.1

  User defined options
    Native files   : /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-ekhal5ad/meson-python-native-file.ini
    b_ndebug       : if-release
    b_vscrt        : md
    blas           : openblas
    buildtype      : release
    lapack         : openblas
    pkg_config_path: /opt/local/lib/pkgconfig

Found ninja-1.12.1 at /opt/local/bin/ninja
+ /opt/local/bin/ninja
[1/1383] Generating scipy/linalg/cython_linalg with a custom command
[2/1383] Module scanner.
. . .

@barracuda156
Copy link
Contributor Author

Let me try bisecting from here. We know that e1eef56 works but 0.3.28 does not. There are perhaps not too many rebuilds needed to identify a breaking commit.

@barracuda156
Copy link
Contributor Author

@martin-frbg Bisecting was quick. It is indeed #4807 PR broken it. 15c53dd is still fine, but fb7c53c fails:

--->  Building py312-scipy
Executing:  cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1" && /opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 -m build --no-isolation --wheel --outdir /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work -Csetup-args=-Dblas=openblas -Csetup-args=-Dlapack=openblas -Csetup-args=-Dpkg_config_path=/opt/local/lib/pkgconfig 
* Getting build dependencies for wheel...
* Building wheel...
+ meson-3.12 setup /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-6l3dtfxi -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md -Dblas=openblas -Dlapack=openblas -Dpkg_config_path=/opt/local/lib/pkgconfig --native-file=/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-6l3dtfxi/meson-python-native-file.ini
The Meson build system
Version: 1.6.1
Source dir: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1
Build dir: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-6l3dtfxi
Build type: native build
Project name: scipy
Project version: 1.14.1
C compiler for the host machine: /opt/local/bin/gcc-mp-14 (gcc 14.2.0 "gcc-mp-14 (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
C linker for the host machine: /opt/local/bin/gcc-mp-14 ld64 97.17
C++ compiler for the host machine: /opt/local/bin/g++-mp-14 (gcc 14.2.0 "g++-mp-14 (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
C++ linker for the host machine: /opt/local/bin/g++-mp-14 ld64 97.17
Cython compiler for the host machine: cython (cython 3.0.11)
Host machine cpu family: ppc
Host machine cpu: power macintosh
Program python found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12)
Found pkg-config: YES (/opt/local/bin/pkg-config) 0.29.2
Run-time dependency python found: YES 3.12
Program cython found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/cython)
Compiler for C supports arguments -Wno-unused-but-set-variable: YES 
Compiler for C supports arguments -Wno-unused-function: YES 
Compiler for C supports arguments -Wno-conversion: YES 
Compiler for C supports arguments -Wno-misleading-indentation: YES 
Library m found: YES
Fortran compiler for the host machine: /opt/local/bin/gfortran-mp-14 (gcc 14.2.0 "GNU Fortran (MacPorts gcc14 14.2.0_1+G5+stdlib_flag) 14.2.0")
Fortran linker for the host machine: /opt/local/bin/gfortran-mp-14 ld64 97.17
Compiler for Fortran supports arguments -Wno-conversion: YES 
Compiler for C supports link arguments -Wl,-dead_strip: NO 
Checking if "-Wl,--version-script" : links: NO 
Program tools/generate_f2pymod.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/tools/generate_f2pymod.py)
Program scipy/_build_utils/tempita.py found: YES (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/scipy/_build_utils/tempita.py)
Program pythran found: YES 0.17.0 0.17.0 (/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/pythran)
Found CMake: /opt/local/bin/cmake (3.31.2)
WARNING: CMake Toolchain: Failed to determine CMake compilers state
Run-time dependency xsimd found: NO (tried pkgconfig, framework and cmake)
Run-time dependency threads found: YES

../scipy/meson.build:48:17: ERROR: Command `/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12 -c 'import os
os.chdir(os.path.join("..", "tools"))
import numpy as np
try:
  incdir = os.path.relpath(np.get_include())
except Exception:
  incdir = np.get_include()
print(incdir)
  '` failed with status -10.

A full log can be found at /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_python_py-scipy/py312-scipy/work/scipy-1.14.1/.mesonpy-6l3dtfxi/meson-logs/meson-log.txt

ERROR Backend subprocess exited when trying to invoke build_wheel

@barracuda156
Copy link
Contributor Author

@martin-frbg Since you have no older powerpc hardware, can I assist with finding more specifically what causes the breakage? Rebuilds on the Quad are quick, so if you can make a branch with possible fix, I can try that, or w/e else may be needed.

@martin-frbg
Copy link
Collaborator

#4807 would be about the only relevant change between July 4 and July 25 - at least I don't think #4796 could break anything. Quickest workaround would probably be to drop

SSCALKERNEL=../arm/scal.c
DSCALKERNEL=../arm/scal.c
CSCALKERNEL=../arm/zscal.c
ZSCALKERNEL=../arm/zscal.c

into kernel/power/KERNEL.PPC970

@barracuda156
Copy link
Contributor Author

Yes, it is exactly fb7c53c which introduced the breakage, since immediately previous commit works fine. I will try the suggestion.

@martin-frbg
Copy link
Collaborator

@martin-frbg Since you have no older powerpc hardware, can I assist with finding more specifically what causes the breakage? Rebuilds on the Quad are quick, so if you can make a branch with possible fix, I can try that, or w/e else may be needed.

Thank you - but frankly the problem is that I'm not really sure I know what I'm doing on these old machines. Looks like I'm either using a register that the OS or compiler want to have for different things, or the stack address I'm reading the flag value from is not the correct one when running MacOS. Guess I would need a stack dump from inside a debugger to figure out at least the second option. Not sure if that is worth the trouble, performance is probably not that much worse with the simple C kernels and I suspect nobody is using 20 years old hardware for performance reasons anyway.

What might be good to know is if the same problem exists on the PPC440 or PPCG4 targets with current develop, as I butchered the scal_ppc440.S in similar ways in early October.

@barracuda156
Copy link
Contributor Author

barracuda156 commented Dec 30, 2024

@martin-frbg Regarding registers, if you tell me in a couple of words how R2, R11, R12, R13 are used, I can verify with ABI docs or we can ask Iain to help here. Few things to note:

  1. Darwin ppc and ppc64 differ somewhat with how some registers are used. That may or may not be consequential for a specific code.
  2. No TOC on Darwin, R2 is available.
  3. R13 is a normal register in ppc ABI, but reserved in ppc64 ABI.
  4. There is some specific re R11 vs R12, which I do not understand well, but in practice I have seen that ELF and Darwin use those in reverse (i.e. it was done that way in credible examples which were known to work).
    Details in a concise form can be found here: http://personal.denison.edu/~bressoud/cs281-s07/MacOSXLowLevelABI.pdf

I will check G4 build. Rebuilding now from the develop branch with added patch which you suggested to try.

P. S. It is worth fixing this, IMO, as long as it does not take huge effort, at least for two reasons. Of course, nobody will use OpenBLAS commercially on a G5, but a) it is used on a number of systems (all three BSDs run on PowerPC Macs, some Linux distros do, some it is not just Apple-relevant), and users may end up with a broken OpenBLAS, once they build cpu-specific code; b) some computation-heavy stuff depends on OpenBLAS, and testing that takes time, so performance reason is still there.

@barracuda156
Copy link
Contributor Author

@martin-frbg OpenBLAS +native from 36b0fb3 works fine on G5 with this patch added:

--- kernel/power/KERNEL.PPC970.orig	2024-12-30 20:49:18.000000000 +0800
+++ kernel/power/KERNEL.PPC970	2024-12-30 20:49:30.000000000 +0800
@@ -83,6 +83,10 @@
 CTRSMKERNEL_RN	=  ztrsm_kernel_LT.S
 CTRSMKERNEL_RT	=  ztrsm_kernel_RT.S
 
+SSCALKERNEL  = ../arm/scal.c
+DSCALKERNEL  = ../arm/scal.c
+CSCALKERNEL  = ../arm/zscal.c
+ZSCALKERNEL  = ../arm/zscal.c
 
 SROTKERNEL   = ../arm/rot.c
 DROTKERNEL   = ../arm/rot.c

@barracuda156
Copy link
Contributor Author

To test G4 kernel, do I need to physically build that on a G4 or I can just force the build to use G4-specific options, running on G5? And also, should I start with the same patch or without it?

@martin-frbg
Copy link
Collaborator

I think simply building on your G5 with TARGET=PPC440 or PPCG4 should be sufficient, thank you.

Regarding register use, r12 is simply used to take an integer that (I assume) is passed on the stack, and compare that to see if it is zero or one. I have used r13 instead of r12 in a branch that I think does not get executed at all in your case, for DSCAL in ppc mode (I was blindly following the register use in existing code there).

In principle, any other available register would do to serve as temporary storage for the FLAG value, I saw r12 used in existing code e.g. gemv so assumed it would be safe.

@martin-frbg
Copy link
Collaborator

From the ABI document, I still think my assumption that the added parameter would be in SP+120 on the stack is correct.
Perhaps we could try #define FLAG r10 (or any r value beyond 13, say r14, if the G4/G5 actually have that many general purpose registers) in all of the four instances currently defining FLAG as r11,r12,r13 and r12 respectively in kernel/power/scal.S ? That would at least let the code stay clear of all general-purpose registers that could have a dual use on Mac.

@barracuda156
Copy link
Contributor Author

@martin-frbg Sorry, I missed that earlier: what MacPorts does by default (as an allegedly non-optimized build) is just forcing G4 kernel for ppc:

--->  Configuring OpenBLAS
        (using ccache)
Executing:  cd "/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_OpenBLAS/OpenBLAS/work/build" && /opt/local/bin/cmake -G "CodeBlocks - Unix Makefiles" -DCMAKE_BUILD_TYPE=MacPorts -DCMAKE_INSTALL_PREFIX="/opt/local" -DCMAKE_INSTALL_NAME_DIR="/opt/local/lib" -DCMAKE_SYSTEM_PREFIX_PATH="/opt/local;/usr" -DCMAKE_C_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_CXX_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_Fortran_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_OBJC_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_OBJCXX_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_ISPC_COMPILER_LAUNCHER=/opt/local/bin/ccache -DCMAKE_C_COMPILER="$CC" -DCMAKE_CXX_COMPILER="$CXX" -DCMAKE_OBJC_COMPILER="$CC" -DCMAKE_OBJCXX_COMPILER="$CXX" -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_POLICY_DEFAULT_CMP0060=NEW -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_COLOR_MAKEFILE=ON -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_MAKE_PROGRAM=/usr/bin/make -DCMAKE_MODULE_PATH="/opt/local/share/cmake/Modules" -DCMAKE_PREFIX_PATH="/opt/local/share/cmake/Modules" -DCMAKE_BUILD_WITH_INSTALL_RPATH:BOOL=OFF -DCMAKE_INSTALL_RPATH="/opt/local/lib" -Wno-dev -DCOMMON_PROF=-pg -DNUM_THREADS=56 -DTARGET=PPCG4 -DNO_AVX=ON -DNO_AVX2=ON -DNO_AVX512=ON -DCMAKE_AR=/opt/local/bin/ar -DCMAKE_NM=/opt/local/bin/nm -DCMAKE_OBJDUMP=/opt/local/bin/objdump -DCMAKE_RANLIB=/opt/local/bin/ranlib -DCMAKE_STRIP=/opt/local/bin/strip -DCMAKE_LINKER=/opt/local/bin/ld -DBUILD_SHARED_LIBS=ON -DCMAKE_OSX_ARCHITECTURES="ppc" -DCMAKE_OSX_DEPLOYMENT_TARGET="10.6" -DCMAKE_OSX_SYSROOT="/" /opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_OpenBLAS/OpenBLAS/work/OpenBLAS-0.3.28

So no need to test that, we know that this build worked fine. Only PPC970 was affected.

Allow me to look into the ABI tomorrow, it was a long day.

P. S. By the way why adding those from arm dir fixed the problem?

+SSCALKERNEL  = ../arm/scal.c
+DSCALKERNEL  = ../arm/scal.c
+CSCALKERNEL  = ../arm/zscal.c
+ZSCALKERNEL  = ../arm/zscal.c

@martin-frbg
Copy link
Collaborator

Good to know that G4 is unaffected. Adding the "generic" C kernels to the build configuration overrides the default from kernel/Makefile.L1 which specifies SSCALKERNEL=scal.S and DSCALKERNEL=scal.S . (Overriding the defaults for CSCAL and ZSCAL should indeed be unnecessary, as the new flag parameter for NaN handling was only added to the real-space functions. The fact that a lot of the generic BLAS kernels ended up in "arm" or "mips" rather than "generic" is just a historic oddity - not sure if OpenBLAS inherited this from GotoBLAS already, or if it crept in later as more architectures got added).
Thank you very much for your help so far - I guess you must be very close to the New Year already, if you are anywhere near the location in your profile. I hope it will be a good year...

@barracuda156
Copy link
Contributor Author

Thank you, and the same to you! (Yes, I am in TW at the moment, so time is accurate.)
I will look at the code tomorrow, and perhaps we could ask GCC upstream if they could assist us too.

@barracuda156
Copy link
Contributor Author

barracuda156 commented Jan 4, 2025

@martin-frbg Yes, I did not forget, sorry for a delay. In plans for this weekend.

@martin-frbg
Copy link
Collaborator

No problem, I just wanted to make the temporary fix public, and perhaps release the much-delayed 0.3.29 with it while I have free time.

@barracuda156
Copy link
Contributor Author

No problem, I just wanted to make the temporary fix public, and perhaps release the much-delayed 0.3.29 with it while I have free time.

@martin-frbg Hold on, let’s give it a try to fix it now. Since the issue is localized now, we can probably figure out the correct code for Darwin ABI just by trying a number of possible options. I do not need to rebuild the whole of OpenBLAS for that, it is four files, so should be a matter of less than an hour to try everything which can make sense.
Could you suggest a few changes to try? I will build the current master and then build variants with just those changes per build.

@martin-frbg
Copy link
Collaborator

Easiest would be changing the register for "FLAG" in scal.S to r10 or r14 (if r14 exists on PPC970):

diff --git a/kernel/power/scal.S b/kernel/power/scal.S
index 5e92a88aa..28f07dbda 100644
--- a/kernel/power/scal.S
+++ b/kernel/power/scal.S
@@ -47,11 +47,11 @@
 #ifndef __64BIT__
 #define X r6
 #define INCX r7
-#define FLAG r11
+#define FLAG r14
 #else
 #define X r7
 #define INCX r8
-#define FLAG r12
+#define FLAG r14
 #endif
 #endif
 
@@ -59,11 +59,11 @@
 #if !defined(__64BIT__) && defined(DOUBLE)
 #define X r8
 #define INCX r9
-#define FLAG r13
+#define FLAG r14
 #else
 #define X r7
 #define INCX r8
-#define FLAG r12
+#define FLAG r14
 #endif
 #endif

@barracuda156
Copy link
Contributor Author

I will try build now with this patch and let you know soon.

@barracuda156
Copy link
Contributor Author

Ok, the patch with r14 from #5034 (comment) did not work:

Process:         Python [5228]
Path:            /opt/local/Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       PPC (Native)
Parent Process:  Python [4787]

Date/Time:       2025-01-04 22:33:14.829 +0800
OS Version:      Mac OS X 10.6.8 (10K549)
Report Version:  6

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000003
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libopenblas.0.dylib           	0x01c20160 dgeqr2_ + 476

Thread 1:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 2:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 3:
0   libSystem.B.dylib             	0x92554300 __semwait_signal + 12

Thread 0 crashed with PPC Thread State 32:
  srr0: 0x01c20160  srr1: 0x0200f030   dar: 0x00000003 dsisr: 0x42000000
    r0: 0x01c20160    r1: 0xbfffba60    r2: 0x00000001    r3: 0x00000000
    r4: 0x00000008    r5: 0x00000660    r6: 0x0631d8a0    r7: 0x0213b704
    r8: 0x0631d8c8    r9: 0x00000008   r10: 0x0631d918   r11: 0xbfffba60
   r12: 0x48244402   r13: 0xbfffbb8c   r14: 0x00000003   r15: 0x0213ff94
   r16: 0x00000004   r17: 0x00000001   r18: 0xbfffbaa8   r19: 0xffffffff
   r20: 0x00000002   r21: 0x00000003   r22: 0x00000005   r23: 0x0213b708
   r24: 0x00000030   r25: 0xbfffbe08   r26: 0x0213b704   r27: 0x0631d8a0
   r28: 0xbfffbb88   r29: 0x06bc4400   r30: 0x0631d8c8   r31: 0x01c1ff94
    cr: 0x22244444   xer: 0x20000000    lr: 0x01c20160   ctr: 0x00000000
vrsave: 0xc3fc0000

I will try r10.

@barracuda156
Copy link
Contributor Author

I have an idea, let me try something else.

@barracuda156
Copy link
Contributor Author

Less than an hour was overly optimistic estimate LOL

Comparing G4 kernel code for scal vs G5 code, besides FLAG setting, on G4 load insns are bitness-dependent, while on G5 they are not, as I understand (I extract relevant pieces, these are not immediately sequential, of course). Why is that?

G4 version:

#if defined(OS_AIX) || defined(OS_DARWIN)
#ifndef __64BIT__
#define FRAMESLOT(X) (((X) * 4) + 56)
#else
#define FRAMESLOT(X) (((X) * 8) + 112)
#endif
#endif

li	PRE, 3 * 16 * SIZE

lwz     FLAG, FRAMESLOT(0)(SP)

G5 version:

#define L1_PREFETCHSIZE (96 + 128 * 12)

li	PREA, L1_PREFETCHSIZE

ld      FLAG,    48+64+8(SP)

Also, shouldn't it be LDLONG and not ld in the second case? (So that lwz is used for 32 bits)?

@barracuda156
Copy link
Contributor Author

Oh, I have fixed it by the way.

@martin-frbg
Copy link
Collaborator

To be honest, I have no idea - this is very old code and a platform I'm neither familiar with nor have available for testing. My addition was mostly a guess based on what worked on other platforms and what the other code for G4/G5 looked like. Thanks for fixing this.

@martin-frbg martin-frbg reopened this Jan 4, 2025
@martin-frbg martin-frbg linked a pull request Jan 4, 2025 that will close this issue
@barracuda156
Copy link
Contributor Author

@martin-frbg Did you or someone test the new code on 32-bit AIX and/or Linux? I know ABI differs from Darwin, but loads are bitness-conditional for all OS in ppc440 but are fixed for all OS in ppc970, so that would probably be sufficient in this context.

@martin-frbg
Copy link
Collaborator

All I can do for this old hardware is try to compile-test on Power7/AIX (Gcc Compile Farm machine provided by OSUOSL). I guess there would eventually be a bug report like yours from some Linux distribution still catering to these old systems.

@barracuda156
Copy link
Contributor Author

@glaubitz may have G5 with Debian ppc32 and @pkubaj might be able to test it on FreeBSD. I could try asking in OpenBSD ppc mailing list if someone could test this on OpenBSD, there are some users with G5 machines, hopefully.

@barracuda156
Copy link
Contributor Author

@martin-frbg Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS
Projects
None yet
2 participants