Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two tests fail on macOS PPC: dbcsr_unittest2, dbcsr_unittest3 #645

Open
barracuda156 opened this issue Dec 30, 2022 · 32 comments
Open

Two tests fail on macOS PPC: dbcsr_unittest2, dbcsr_unittest3 #645

barracuda156 opened this issue Dec 30, 2022 · 32 comments

Comments

@barracuda156
Copy link

barracuda156 commented Dec 30, 2022

I am bringing dbcsr to Macports, where we support all range of macOS including old ones (at least 10.5+).
Two tests fail on 10.6.8 Rosetta (I cannot test native PPC at the moment, away from PPC hardware): dbcsr_unittest2, dbcsr_unittest3.

--->  Testing dbcsr
Executing:  cd "/opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build" && ctest test 
Test project /opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/19 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed  581.03 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/19 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    1.06 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/19 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed   10.55 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/19 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    1.01 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/19 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed    8.58 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/19 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    0.74 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/19 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    0.73 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/19 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed    3.15 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/19 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    4.10 sec
      Start 10: dbcsr_perf:inputs/test_square_sparse_rma.perf
10/19 Test #10: dbcsr_perf:inputs/test_square_sparse_rma.perf .........   Passed    2.90 sec
      Start 11: dbcsr_unittest1
11/19 Test #11: dbcsr_unittest1 .......................................   Passed  158.55 sec
      Start 12: dbcsr_unittest2
12/19 Test #12: dbcsr_unittest2 .......................................***Failed    0.78 sec
      Start 13: dbcsr_unittest3
13/19 Test #13: dbcsr_unittest3 .......................................***Failed   13.93 sec
      Start 14: dbcsr_unittest4
14/19 Test #14: dbcsr_unittest4 .......................................   Passed    0.62 sec
      Start 15: dbcsr_tensor_unittest
15/19 Test #15: dbcsr_tensor_unittest .................................   Passed   10.30 sec
      Start 16: dbcsr_tas_unittest
16/19 Test #16: dbcsr_tas_unittest ....................................   Passed    4.37 sec
      Start 17: dbcsr_test_csr_conversions
17/19 Test #17: dbcsr_test_csr_conversions ............................   Passed    4.73 sec
      Start 18: dbcsr_test
18/19 Test #18: dbcsr_test ............................................   Passed    0.53 sec
      Start 19: dbcsr_tensor_test
19/19 Test #19: dbcsr_tensor_test .....................................   Passed    0.84 sec

89% tests passed, 2 tests failed out of 19

@alazzaro Suggestions how to fix that are greatly appreciated.

Environment:
macOS 10.6.8 Rosetta (ppc32)
gcc 12.2.0
mpich-gcc12 @4.0.2+fortran
cmake-devel 20221130-3.25.1
ninja @1.11.1
OpenBLAS @0.3.21+gcc12+lapack+native
python310 @3.10.9
py-fypp @3.1

Portfile used: https://github.com/macports/macports-ports/blob/6e401b768cff5631fba66cca8ef346600a175c5a/math/dbcsr/Portfile

@barracuda156
Copy link
Author

Complete log for tests:

tests_log.txt

@alazzaro
Copy link
Member

alazzaro commented Jan 3, 2023

We never tested such old versions of OSX, so I have no idea what the error can be.
Some notes:

  1. From your log, it seems you are not running under MPI. I can see several repetitions in the log, e.g.
 DBCSR| CPU Multiplication driver                                           BLAS (D)
 DBCSR| CPU Multiplication driver                                           BLAS (D)
 DBCSR| CPU Multiplication driver                                           BLAS (D)
 DBCSR| CPU Multiplication driver                                           BLAS (D)

It seems there are 4 simultaneous instances (which is what you are running with "/usr/bin/mpiexec" "-n" "4"), but then DBCSR reports

DBCSR| MPI: Number of processes                                               1

so there is something wrong...
I assume you should set the cmake flags:

-DMPIEXEC_EXECUTABLE="mpirun" \
-DTEST_MPI_RANKS="1" \
  1. I assume you are using BLAS for the block multiplications. Actually, on OSX we used to test with the Accelerate framework, so I wonder if it can introduce some issues here... This an example. Could yo confirm which BLAS library is used? You can add:
   -DBLAS_FOUND=ON -DBLAS_LIBRARIES="<path>" \
    -DLAPACK_FOUND=ON -DLAPACK_LIBRARIES="<path>" \

to make more specific.

@barracuda156
Copy link
Author

@alazzaro I spent quite some time today on this, but I cannot force it to use correct MPI settings for some reason. It still uses:

Command: "/usr/bin/mpiexec" "-n" "4" "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_math_dbcsr/dbcsr/work/build/tests/dbcsr_perf" "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_math_dbcsr/dbcsr/work/dbcsr-2.5.0/tests/inputs/test_H2O.perf"
Directory: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_math_dbcsr/dbcsr/work/build/tests
"dbcsr_perf:inputs/test_H2O.perf" start time: Jan 12 05:18 WIT

Despite I passed -DMPIEXEC_EXECUTABLE=${prefix}/bin/mpiexec-mpich-gcc12 and -DTEST_MPI_RANKS="1" to CMake. I tried a variety of ways, no effect whatsoever.
Where does this /usr/bin/mpiexec even come from?

I will try Accelerate, but I suspect that on old macOS OpenBLAS is a better bet.

@barracuda156
Copy link
Author

barracuda156 commented Jan 12, 2023

@alazzaro So, with Accelerate is seems to work better indeed (at least on 10.6.8, I cannot check on native PPC right now), but now dbcsr_unittest1 times out (was fine before with OpenBLAS):

--->  Testing dbcsr
Executing:  cd "/opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build" && ctest test 
Test project /opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/19 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed  802.29 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/19 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    7.54 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/19 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed   35.68 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/19 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    6.45 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/19 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed   34.73 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/19 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    1.20 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/19 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    2.05 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/19 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed   10.71 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/19 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    5.60 sec
      Start 10: dbcsr_perf:inputs/test_square_sparse_rma.perf
10/19 Test #10: dbcsr_perf:inputs/test_square_sparse_rma.perf .........   Passed    9.74 sec
      Start 11: dbcsr_unittest1
sh: /bin/ps: Operation not permitted
11/19 Test #11: dbcsr_unittest1 .......................................***Timeout 1499.97 sec
      Start 12: dbcsr_unittest2
12/19 Test #12: dbcsr_unittest2 .......................................   Passed  370.93 sec
      Start 13: dbcsr_unittest3
13/19 Test #13: dbcsr_unittest3 .......................................   Passed  153.45 sec
      Start 14: dbcsr_unittest4
14/19 Test #14: dbcsr_unittest4 .......................................   Passed    1.60 sec
      Start 15: dbcsr_tensor_unittest
15/19 Test #15: dbcsr_tensor_unittest .................................   Passed   19.65 sec
      Start 16: dbcsr_tas_unittest
16/19 Test #16: dbcsr_tas_unittest ....................................   Passed    9.14 sec
      Start 17: dbcsr_test_csr_conversions
17/19 Test #17: dbcsr_test_csr_conversions ............................   Passed   23.92 sec
      Start 18: dbcsr_test
18/19 Test #18: dbcsr_test ............................................   Passed    1.34 sec
      Start 19: dbcsr_tensor_test
19/19 Test #19: dbcsr_tensor_test .....................................   Passed    1.97 sec

95% tests passed, 1 tests failed out of 19

Total Test time (real) = 2998.21 sec

The following tests FAILED:
	 11 - dbcsr_unittest1 (Timeout)

I have used this args:

configure.args-append \
                    -DBLAS_FOUND=ON \
                    -DBLAS_LIBRARIES=/usr/lib/libblas.dylib \
                    -DLAPACK_FOUND=ON \
                    -DLAPACK_LIBRARIES=/usr/lib/libLAPACK.dylib

if {[string match *gcc* ${configure.compiler}]} {
    configure.cflags-append \
                    -flax-vector-conversions
}

Complete log from tests:
tests_log_with_Accelerate.txt

@alazzaro
Copy link
Member

This was my suspicious, on OSX we allow assume that Accelerate is used... I will try to install a vagrant machine with OSX and try to fix this problem when people make Openblas available.

@barracuda156
Copy link
Author

This was my suspicious, on OSX we allow assume that Accelerate is used... I will try to install a vagrant machine with OSX and try to fix this problem when people make Openblas available.

Thank you very much!

P. S. By the way, why dbcsr_unittest1 times out now?

@alazzaro
Copy link
Member

This was my suspicious, on OSX we allow assume that Accelerate is used... I will try to install a vagrant machine with OSX and try to fix this problem when people make Openblas available.

Thank you very much!

P. S. By the way, why dbcsr_unittest1 times out now?

I can assume Accelerate is not really optimized, no sure though...

@barracuda156
Copy link
Author

I can assume Accelerate is not really optimized, no sure though...

Well, it is old, and on earlier systems will be worse, perhaps (and cannot be updated, being a system component).
If building with OpenBLAS is fixed, that would be great.

@barracuda156
Copy link
Author

barracuda156 commented Jan 12, 2023

@alazzaro I actually do not see what is wrong there: everything passes in dbcsr_unittest1, but then it reports failure:

 **********************************************************************
  -- TESTING dbcsr_multiply (T, N,            7 , S, S, N) ............... PASSED !
 **********************************************************************
<end of output>
Test time = 1499.97 sec
----------------------------------------------------------
Test Failed.
"dbcsr_unittest1" end time: Jan 13 00:52 WIT
"dbcsr_unittest1" time elapsed: 00:24:59

Not all tests were run due to time limit being set?

@barracuda156
Copy link
Author

P. S. This is determined not to cause test errors, but -DMPIEXEC_EXECUTABLE= keeps being ignored, and ancient version from system prefix is used instead of the one passed to CMake.

@alazzaro
Copy link
Member

So, could you run the tests via

env CTEST_OUTPUT_ON_FAILURE=1 make test ARGS="--timeout 2000" 

?
I'm guessing, I really need to reproduce it on my side...

@dev-zero
Copy link
Contributor

To detect MPI we are using FindMPI from CMake.
According to the documentation (and from what I remember to have tested), setting MPIEXEC_EXECUTABLE should be the correct way to override the MPI detection.

In my case I have both MPICH and OpenMPI installed via Homebrew and MPICH is the default (linked):

dbcsr/build.default-mpi on  develop [$] ❯ cmake ..
[...]
-- Found MPI_C: /usr/local/Cellar/mpich/4.0.3/lib/libmpi.dylib (found version "4.0")
-- Found MPI_CXX: /usr/local/Cellar/mpich/4.0.3/lib/libmpicxx.dylib (found version "4.0")
-- Found MPI_Fortran: /usr/local/Cellar/mpich/4.0.3/lib/libmpifort.dylib (found version "4.0")
-- Found MPI: TRUE (found version "4.0") found components: C CXX Fortran
-- Setting build type to 'Release' as none was specified.
-- Performing Test f2008-norm2
-- Performing Test f2008-norm2 - Success
-- Performing Test f2008-block_construct
-- Performing Test f2008-block_construct - Success
-- Performing Test f2008-contiguous
-- Performing Test f2008-contiguous - Success
-- Performing Test f95-reshape-order-allocatable
-- Performing Test f95-reshape-order-allocatable - Success
-- FYPP preprocessor found.
Tests will run with 8 MPI ranks and 2 OpenMP threads each
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/tiziano/Projects/cp2k/dbcsr/build.default-mpi

Running in a different/fresh build directory:

dbcsr/build.openmpi on  develop [$?] ❯ cmake -DMPIEXEC_EXECUTABLE=/usr/local/Cellar/open-mpi/4.1.4_2/bin/mpiexec -DTEST_MPI_RANKS=1 ..
[...]
-- Found MPI_C: /usr/local/Cellar/open-mpi/4.1.4_2/lib/libmpi.dylib (found version "3.1")
-- Found MPI_CXX: /usr/local/Cellar/open-mpi/4.1.4_2/lib/libmpi.dylib (found version "3.1")
-- Found MPI_Fortran: /usr/local/Cellar/open-mpi/4.1.4_2/lib/libmpi_usempif08.dylib (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: C CXX Fortran
-- Setting build type to 'Release' as none was specified.
-- Performing Test f2008-norm2
-- Performing Test f2008-norm2 - Success
-- Performing Test f2008-block_construct
-- Performing Test f2008-block_construct - Success
-- Performing Test f2008-contiguous
-- Performing Test f2008-contiguous - Success
-- Performing Test f95-reshape-order-allocatable
-- Performing Test f95-reshape-order-allocatable - Success
-- FYPP preprocessor found.
Tests will run with 1 MPI ranks and 2 OpenMP threads each
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/tiziano/Projects/cp2k/dbcsr/build.openmpi

So, at least here it seems that CMake is following the instructions correctly.
I assume you are aware of this, but for many variables (like MPIEXEC_EXECUTABLE) you have to pass --fresh to reconfigure the full tree. TEST_MPI_RANKS gets picked up on a simple reconfigure, though.

@dev-zero
Copy link
Contributor

For future reference, can you please also post the complete CMake configure log? @barracuda156

@barracuda156
Copy link
Author

@dev-zero Ok, with TEST_MPI_RANKS the problem was quote marks, haha. -DTEST_MPI_RANKS=2 work fine.
I will update soon on the tests, let me try something.

@barracuda156
Copy link
Author

@alazzaro Until OpenBLAS linking on Apple is fixed, maybe worth adding a note in docs, like you have it for Power9?
https://cp2k.github.io/dbcsr/develop/page/2-user-guide/1-installation/index.html

Or is it actually PowerPC-specific issue and not Apple-specific?

@barracuda156
Copy link
Author

I have figured out finally how to force Macports mpich being used in tests, but it seems to have broken down everything: the very first test fails with timeout:

Test project /opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/19 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................***Timeout 1500.03 sec

This is just for the record. I will look for a combination of settings that works optimally.

@dev-zero
Copy link
Contributor

@barracuda156 the timeout with MPICH in test_H2O.perf on macOS is something we are aware of, but didn't have time to track this down yet. Linking to OpenBLAS on macOS is doable, it's just that by default CMake finds Accelerate first until told to look elsewhere:

cmake -DCMAKE_PREFIX_PATH="/usr/local/opt/openblas" ..

So far I've been able to successfully run all the tests on macOS with OpenBLAS+OpenMPI. Github Action runners fail, though.

@barracuda156
Copy link
Author

@dev-zero There is no problem to link to OpenBLAS, in fact I have done that originally. The problem is that two tests fail, at least on 10.6.8 ppc32 (which was the reason for this ticket in the first place). We can use different linear algebra libs on different systems and/or archs, but that is harder to maintain.

the timeout with MPICH in test_H2O.perf on macOS is something we are aware of, but didn't have time to track this down yet

Hopefully that can be fixed. While on 10.6 old system mpi works better than the new mpich, I am not sure it gonna work on 10.5, which we also want to support.
I vaguely recall OpenMPI is kinda broken either on PPC or on old MacOS, but need to try testing it with this port.

@barracuda156
Copy link
Author

Build and test log (multiple builds):
misc_build_test.txt

Conclusions for now:

  1. Use either Accelerate directly or vecLibFort (supposed to be needed if BLAS is called from Fortran); apparently either works. OpenBLAS does not, failing badly with wrong results.
  2. Running tests with mpich 4.0.3 is broken. System mpi kinda works, only one test times out.

P. S. sh: /bin/ps: Operation not permitted error is solved like this (adding it to portfile):

pre-test {
    # test infrastructure uses /bin/ps, which is forbidden by sandboxing
    append portsandbox_profile " (allow process-exec (literal \"/bin/ps\") (with no-profile))"
}

You may consider fixing it in the source code though.

@barracuda156
Copy link
Author

So, could you run the tests via

env CTEST_OUTPUT_ON_FAILURE=1 make test ARGS="--timeout 2000" 

? I'm guessing, I really need to reproduce it on my side...

@alazzaro How to pass it to ctest? This does not have any effect: ctest test --timeout=2000. (With make it does not work, complaining about missing target for test.)

@barracuda156
Copy link
Author

@alazzaro @dev-zero Interesting that when linking to OpenBLAS, the first unit test passes quickly, while unit tests 2 and 3 fail badly with wrong results:

      Start 11: dbcsr_unittest1
11/19 Test #11: dbcsr_unittest1 .......................................   Passed  158.55 sec
      Start 12: dbcsr_unittest2
12/19 Test #12: dbcsr_unittest2 .......................................***Failed    0.78 sec
      Start 13: dbcsr_unittest3
13/19 Test #13: dbcsr_unittest3 .......................................***Failed   13.93 sec

When linking to vecLibFort or Accelerate, unit tests 2 and 3 pass, but test 1 times out (takes more than 10 times longer!), though no wrong results.

@barracuda156
Copy link
Author

barracuda156 commented Jan 14, 2023

I have noticed that -D__ACCELERATE is not picked automatically – neither with Accelerate nor with vecLibFort. Made a patch to src/CMakeLists now and testing again.

UPD. No difference to test results whatsoever. Whether this flag does smth or not, its absence was not causing the problem.

@barracuda156
Copy link
Author

Other tests run at full load on every core, dbcsr_unittest1 is barely doing something:
dbcsr

Likely the reason behind why it takes 10 times more time (that when run under OpenBLAS) and times out eventually.

@dev-zero
Copy link
Contributor

You may consider fixing it in the source code though.

AFAIK we're not calling ps directly, therefore this must originate from either MPI or ctest and therefore unlikely to get fixed by us. Also, I would consider ps a rather essential tool to which a test system should have access.

@dev-zero
Copy link
Contributor

I have noticed that -D__ACCELERATE is not picked automatically – neither with Accelerate nor with vecLibFort. Made a patch to src/CMakeLists now and testing again.

UPD. No difference to test results whatsoever. Whether this flag does smth or not, its absence was not causing the problem.

That's not good. macOS' accelerate has a slightly different LAPACK API (returning double precision for single precision calls). Not defining can therefore lead to wrong results.

We have this:

if (APPLE)
  # fix /proc/self/statm can not be opened on macOS
  target_compile_definitions(dbcsr PRIVATE __NO_STATM_ACCESS)

  if (BLAS_LIBRARIES MATCHES "Accelerate")
    target_compile_definitions(dbcsr PRIVATE __ACCELERATE)
  endif ()
endif ()

Which doesn't get triggered since you're setting BLAS_LIBRARIES to a generic path (as per the logs you supplied), rather than the usual /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/Accelerate.framework I see in my logs.

I wonder, maybe we should instead check against BLA_VENDOR instead.

@barracuda156
Copy link
Author

AFAIK we're not calling ps directly, therefore this must originate from either MPI or ctest and therefore unlikely to get fixed by us. Also, I would consider ps a rather essential tool to which a test system should have access.

Got it. We added a fix in Macports. FWIU, it is a Mac sandboxing issue.

@barracuda156
Copy link
Author

barracuda156 commented Jan 16, 2023

That's not good. macOS' accelerate has a slightly different LAPACK API (returning double precision for single precision calls). Not defining can therefore lead to wrong results.
Which doesn't get triggered since you're setting BLAS_LIBRARIES to a generic path (as per the logs you supplied), rather than the usual /Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/System/Library/Frameworks/Accelerate.framework I see in my logs.
I wonder, maybe we should instead check against BLA_VENDOR instead.

@dev-zero Macports standard handling of this is via linear_algebra PG: https://github.com/macports/macports-ports/blob/92f50e73e1ae04fc6ced3ddec56e088edf99ce86/_resources/port1.0/group/linear_algebra-1.0.tcl#L63
So it looks like BLA_VENDOR should work (and it is a common definition, not Macports-specific).
It is perhaps desirable to expand the check to vecLibFort too: https://github.com/mcg1969/vecLibFort – which is used instead of Accelerate directly when Fortran support is needed.

P. S. BLAS and LAPACK dylibs in /usr/lib are symlinks to Accelerate. But specific path to Accelerate differs depending on OS version.

@dev-zero
Copy link
Contributor

It is perhaps desirable to expand the check to vecLibFort too: https://github.com/mcg1969/vecLibFort – which is used instead of Accelerate directly when Fortran support is needed.

According to the documentation this is exactly what vecLibFort is supposed to be fixing.
I wonder whether we shouldn't just start bundling vecLibFort as recommended by the project and then we can forget about our __ACCELERATE guards. @alazzaro what do you think? This might have to be synchronized with CP2K.

@barracuda156
Copy link
Author

According to the documentation this is exactly what vecLibFort is supposed to be fixing. I wonder whether we shouldn't just start bundling vecLibFort as recommended by the project and then we can forget about our __ACCELERATE guards. @alazzaro what do you think? This might have to be synchronized with CP2K.

@dev-zero Macports defaults to vecLibFort unless the port explicitly asks not to use it. Do I get it right that we do not need __ACCELERATE macros then at all? (If they are not needed but do not cause issues with vecLibFort, I will keep them for now, just to avoid making another PR to Macports in a row for the same port.)

If you decide to use vecLibFort directly, please consider allowing an external one as well. Many users will have it installed already, whether in Macports or otherwise.

@barracuda156
Copy link
Author

@dev-zero @alazzaro I do not close the issue, since above discussion may be relevant for further improvements (fixing OpenBLAS on Mac, using vecLibFort), but I got all tests passing on PPC:

--->  Testing dbcsr
Executing:  cd "/opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build" && ctest test 
Test project /opt/local/var/macports/build/_opt_PPCRosettaPorts_math_dbcsr/dbcsr/work/build
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/19 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed  329.59 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/19 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    2.69 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/19 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed   12.66 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/19 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    2.35 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/19 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed    9.97 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/19 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    0.71 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/19 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    0.94 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/19 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed    3.30 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/19 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    2.78 sec
      Start 10: dbcsr_perf:inputs/test_square_sparse_rma.perf
10/19 Test #10: dbcsr_perf:inputs/test_square_sparse_rma.perf .........   Passed    3.53 sec
      Start 11: dbcsr_unittest1
11/19 Test #11: dbcsr_unittest1 .......................................   Passed  1235.95 sec
      Start 12: dbcsr_unittest2
12/19 Test #12: dbcsr_unittest2 .......................................   Passed  333.61 sec
      Start 13: dbcsr_unittest3
13/19 Test #13: dbcsr_unittest3 .......................................   Passed  191.42 sec
      Start 14: dbcsr_unittest4
14/19 Test #14: dbcsr_unittest4 .......................................   Passed    1.14 sec
      Start 15: dbcsr_tensor_unittest
15/19 Test #15: dbcsr_tensor_unittest .................................   Passed   11.51 sec
      Start 16: dbcsr_tas_unittest
16/19 Test #16: dbcsr_tas_unittest ....................................   Passed    9.60 sec
      Start 17: dbcsr_test_csr_conversions
17/19 Test #17: dbcsr_test_csr_conversions ............................   Passed   22.04 sec
      Start 18: dbcsr_test
18/19 Test #18: dbcsr_test ............................................   Passed    0.53 sec
      Start 19: dbcsr_tensor_test
19/19 Test #19: dbcsr_tensor_test .....................................   Passed    1.10 sec

100% tests passed, 0 tests failed out of 19

Total Test time (real) = 2175.63 sec

Used MPI_RANKS=2, prevented time-out.

@alazzaro
Copy link
Member

@barracuda156 Thanks for your finding, this is a good news! Yes, plan is to get vecLibFort as a requirement (and drop ACCELERATE macro in the first instance) and add OpenBLAS support for macos. It will take a while though... I agree, we can keep this ticket open as a reminder...

@barracuda156
Copy link
Author

@barracuda156 Yes, plan is to get vecLibFort as a requirement (and drop ACCELERATE macro in the first instance) and add OpenBLAS support for macos. It will take a while though... I agree, we can keep this ticket open as a reminder...

@alazzaro Sounds good!

By the way, are there any opinions on why there are issues with the latest MPICH on Mac? I was thinking it is just semi-broken on PPC, but seems that the problem is not limited to one arch: #645 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants