-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert blas/lapack/mkl to new cmake style #5972
Conversation
@@ -11,4 +11,3 @@ endif() | |||
if (MPILIB STREQUAL mpi-serial AND NOT compile_threaded) | |||
set(PFUNIT_PATH "$ENV{SEMS_PFUNIT_ROOT}") | |||
endif() | |||
string(APPEND SLIBS " -lblas -llapack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is melvin still around? I thought it was no longer online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for anlworkstation (already replaced by anlgce). Time to clean up these deprecated machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we badly need to go through all our configs and purge unused machines. I think we still have a lot of NCAR machine configs that we never use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at CircleCI fail.
@rljacob , I thought blas and lapack were required packages for E3SM? It looks like the circleci container doesn't install them? @lukaszlacinski |
That's a build-only test so it must have been possible to build without them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circle CI fail is because container doesn't have blas/lapack. Need to redo container or switch to GH actions (with a newer container).
8f04835
to
6c0a316
Compare
6c0a316
to
9f6e496
Compare
The containers do |
@jgfouca perhaps we need to set the correct BLA_VENDOR variable in We could also temporarily hack make the singularity build, to cat the e3sm.bldlog if case.build fails. That may show some more context from the cmake log. |
Happy to help debug, but I am not really clear about the goals with the cmake changes yet...
re this, could sophisticate our logic to handle all sorts of available ones and have precedence if desired (e.g., this) |
One more comment: You could also link against "generic" BLAS/LAPACK and then switch the impl at runtime. That may be the solution you were hoping to find?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of your edits require cmake 3.18 (or higher). Could you at least set cmake_minimum_required(VERSION 3.18)
where appropriate as part of this PR?
b94f5bc
to
d936aee
Compare
|
For completeness, here are all the edits implemented in the new image. This takes me much closer to automating things, but I still need time. notables:
(this way, we also don't need a new machine in config_machines, we will just use the singularity one or we can rename it) |
…lapack * origin/master: Fix nvidia_pm-cpu diffs for some tests Fix PIO dep on mpi-serial add ghp preview
Convert blas/lapack/mkl to new cmake style This was a more challenging task than I thought it would. FindBLAS / FindLAPACK are built in to cmake and can also handle MKL stuff via the BLA_VENDOR setting (if you set it to Intel... you get various flavors of MKL). For machines that looked like they were trying to use MKL in their cache files, I added set(BLA_VENDOR Intel10_64_dyn). I'm not 100% sure this is what we want in all cases, but it should at least get us building and running. Machine POCs should refine the settings on their machines using https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors as as guide. I suspect many of these cache files were copy/pasted without a ton of consideration. The documentation indicates that the find modules for blas may be smart enough to auto detect the bla_vendor: BLA_VENDOR Set to one of the [BLAS/LAPACK Vendors] to search for BLAS only from the specified vendor. If not set, all vendors are considered. Once again, I am impressed at how much clutter is removed when we do things t "right" (using find_package instead of packing stuff into SLIBS) way. [BFB]
Get rid of all the BLA_VENDOR stuff in the macros. It had no impact because deps are loading before macros are processed. We should prefer using the environment to set this stuff like we are doing for the other libs. pm-cpu_nvidia seems to be the one platform sensitive to lapack/blas so can be used as an example of how to specify these libraries.
Merge 2 for this PR. * jgfouca/cmake_blas_lapack: blas/lapack settings should not depend on mpilib Refactor blas and lapack stuff a bit more
Convert blas/lapack/mkl to new cmake style This was a more challenging task than I thought it would. FindBLAS / FindLAPACK are built in to cmake and can also handle MKL stuff via the BLA_VENDOR setting (if you set it to Intel... you get various flavors of MKL). The find_package for blas/lapack is pretty smart and should do the right thing most of the time. Machine POCs should refine the settings on their machines using https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors as as guide. I suspect many of these cache files were copy/pasted without a ton of consideration. As an example, pm-cpu_nvidia is one platform where you will get DIFFs if you don't use the right blas/lapack. To set this up, I added the following to the environment block for that machine/compiler combo: <environment_variables compiler="nvidia"> <env name="BLAS_ROOT">$SHELL{if [ -z "$BLAS_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$BLAS_ROOT"; fi}</env> <env name="LAPACK_ROOT">$SHELL{if [ -z "$LAPACK_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$LAPACK_ROOT"; fi}</env> <env name="BLA_VENDOR">NVHPC</env> </environment_variables> The documentation indicates that the find modules for blas may be smart enough to auto detect the bla_vendor: BLA_VENDOR Set to one of the [BLAS/LAPACK Vendors] to search for BLAS only from the specified vendor. If not set, all vendors are considered. Once again, I am impressed at how much clutter is removed when we do things t "right" (using find_package instead of packing stuff into SLIBS) way. [BFB]
This was a more challenging task than I thought it would. FindBLAS / FindLAPACK are built in to cmake and can also handle MKL stuff via the BLA_VENDOR setting (if you set it to Intel... you get various flavors of MKL). The find_package for blas/lapack is pretty smart and should do the right thing most of the time. Machine POCs should refine the settings on their machines using https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors as as guide. I suspect many of these cache files were copy/pasted without a ton of consideration.
As an example, pm-cpu_nvidia is one platform where you will get DIFFs if you don't use the right blas/lapack. To set this up, I added the following to the environment block for that machine/compiler combo:
The documentation indicates that the find modules for blas may be smart enough to auto detect the bla_vendor:
Once again, I am impressed at how much clutter is removed when we do things t "right" (using find_package instead of packing stuff into SLIBS) way.
[BFB]