Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert blas/lapack/mkl to new cmake style #5972

Merged
merged 6 commits into from
Oct 12, 2023
Merged

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Oct 5, 2023

This was a more challenging task than I thought it would. FindBLAS / FindLAPACK are built in to cmake and can also handle MKL stuff via the BLA_VENDOR setting (if you set it to Intel... you get various flavors of MKL). The find_package for blas/lapack is pretty smart and should do the right thing most of the time. Machine POCs should refine the settings on their machines using https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors as as guide. I suspect many of these cache files were copy/pasted without a ton of consideration.

As an example, pm-cpu_nvidia is one platform where you will get DIFFs if you don't use the right blas/lapack. To set this up, I added the following to the environment block for that machine/compiler combo:

    <environment_variables compiler="nvidia">
      <env name="BLAS_ROOT">$SHELL{if [ -z "$BLAS_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$BLAS_ROOT"; fi}</env>
      <env name="LAPACK_ROOT">$SHELL{if [ -z "$LAPACK_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$LAPACK_ROOT"; fi}</env>
      <env name="BLA_VENDOR">NVHPC</env>
    </environment_variables>

The documentation indicates that the find modules for blas may be smart enough to auto detect the bla_vendor:

BLA_VENDOR
Set to one of the [BLAS/LAPACK Vendors] to search for BLAS only from the specified vendor.
If not set, all vendors are considered.

Once again, I am impressed at how much clutter is removed when we do things t "right" (using find_package instead of packing stuff into SLIBS) way.

[BFB]

@jgfouca jgfouca self-assigned this Oct 5, 2023
@jgfouca jgfouca requested review from bartgol and dqwu October 5, 2023 22:08
@@ -11,4 +11,3 @@ endif()
if (MPILIB STREQUAL mpi-serial AND NOT compile_threaded)
set(PFUNIT_PATH "$ENV{SEMS_PFUNIT_ROOT}")
endif()
string(APPEND SLIBS " -lblas -llapack")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is melvin still around? I thought it was no longer online.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for anlworkstation (already replaced by anlgce). Time to clean up these deprecated machines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we badly need to go through all our configs and purge unused machines. I think we still have a lot of NCAR machine configs that we never use.

Copy link
Member

@rljacob rljacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at CircleCI fail.

@jgfouca
Copy link
Member Author

jgfouca commented Oct 6, 2023

@rljacob , I thought blas and lapack were required packages for E3SM? It looks like the circleci container doesn't install them? @lukaszlacinski

@rljacob
Copy link
Member

rljacob commented Oct 6, 2023

That's a build-only test so it must have been possible to build without them.

Copy link
Member

@rljacob rljacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circle CI fail is because container doesn't have blas/lapack. Need to redo container or switch to GH actions (with a newer container).

@jgfouca jgfouca mentioned this pull request Oct 5, 2023
49 tasks
@mahf708 mahf708 force-pushed the jgfouca/cmake_blas_lapack branch from 8f04835 to 6c0a316 Compare October 7, 2023 16:27
mahf708

This comment was marked as outdated.

@mahf708 mahf708 force-pushed the jgfouca/cmake_blas_lapack branch from 6c0a316 to 9f6e496 Compare October 7, 2023 16:41
@mahf708
Copy link
Contributor

mahf708 commented Oct 7, 2023

Circle CI fail is because container doesn't have blas/lapack.

The containers do likelycertainly have blas/lapack already. The cmake logic cannot find them for some reason :/

@bartgol
Copy link
Contributor

bartgol commented Oct 8, 2023

@jgfouca perhaps we need to set the correct BLA_VENDOR variable in gnu_signularity.cmake?

We could also temporarily hack make the singularity build, to cat the e3sm.bldlog if case.build fails. That may show some more context from the cmake log.

@mahf708
Copy link
Contributor

mahf708 commented Oct 8, 2023

Happy to help debug, but I am not really clear about the goals with the cmake changes yet...

FindBLAS / FindLAPACK are built in to cmake and can also handle MKL stuff via the BLA_VENDOR setting (if you set it to Intel... you get various flavors of MKL). For machines that looked like they were trying to use MKL in their cache files, I added set(BLA_VENDOR Intel10_64_dyn). I'm not 100% sure this is what we want in all cases, but it should at least get us building and running.

re this, could sophisticate our logic to handle all sorts of available ones and have precedence if desired (e.g., this)

@mahf708
Copy link
Contributor

mahf708 commented Oct 8, 2023

One more comment: You could also link against "generic" BLAS/LAPACK and then switch the impl at runtime. That may be the solution you were hoping to find?

I'm not 100% sure this is what we want in all cases, but it should at least get us building and running.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of your edits require cmake 3.18 (or higher). Could you at least set cmake_minimum_required(VERSION 3.18) where appropriate as part of this PR?

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

PR Preview Action v1.4.4
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-5972/
on branch gh-pages at 2023-10-11 21:36 UTC

@mahf708
Copy link
Contributor

mahf708 commented Oct 9, 2023

For completeness, here are all the edits implemented in the new image. This takes me much closer to automating things, but I still need time.

mahf708/e3sm-imgs@de16688

notables:

  1. add cmake to container
  2. abstract stuff
  3. remove sources

(this way, we also don't need a new machine in config_machines, we will just use the singularity one or we can rename it)

…lapack

* origin/master:
  Fix nvidia_pm-cpu diffs for some tests
  Fix PIO dep on mpi-serial
  add ghp preview
jgfouca added a commit that referenced this pull request Oct 11, 2023
Convert blas/lapack/mkl to new cmake style

This was a more challenging task than I thought it would. FindBLAS /
FindLAPACK are built in to cmake and can also handle MKL stuff via the
BLA_VENDOR setting (if you set it to Intel... you get various flavors
of MKL). For machines that looked like they were trying to use MKL in
their cache files, I added set(BLA_VENDOR Intel10_64_dyn). I'm not
100% sure this is what we want in all cases, but it should at least
get us building and running. Machine POCs should refine the settings
on their machines using
https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors
as as guide. I suspect many of these cache files were copy/pasted
without a ton of consideration.

The documentation indicates that the find modules for blas may be smart enough to auto detect the bla_vendor:
BLA_VENDOR
Set to one of the [BLAS/LAPACK Vendors] to search for BLAS only from the specified vendor.
If not set, all vendors are considered.

Once again, I am impressed at how much clutter is removed when we do
things t "right" (using find_package instead of packing stuff into
SLIBS) way.

[BFB]
Get rid of all the BLA_VENDOR stuff in the macros. It had no impact
because deps are loading before macros are processed. We should prefer
using the environment to set this stuff like we are doing for the other
libs. pm-cpu_nvidia seems to be the one platform sensitive to lapack/blas
so can be used as an example of how to specify these libraries.
jgfouca added a commit that referenced this pull request Oct 11, 2023
Merge 2 for this PR.

* jgfouca/cmake_blas_lapack:
  blas/lapack settings should not depend on mpilib
  Refactor blas and lapack stuff a bit more
jgfouca added a commit that referenced this pull request Oct 12, 2023
jgfouca added a commit that referenced this pull request Oct 12, 2023
Convert blas/lapack/mkl to new cmake style

This was a more challenging task than I thought it would. FindBLAS /
FindLAPACK are built in to cmake and can also handle MKL stuff via the
BLA_VENDOR setting (if you set it to Intel... you get various flavors
of MKL). The find_package for blas/lapack is pretty smart and should
do the right thing most of the time. Machine POCs should refine the
settings on their machines using
https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors
as as guide. I suspect many of these cache files were copy/pasted
without a ton of consideration.

As an example, pm-cpu_nvidia is one platform where you will get DIFFs
if you don't use the right blas/lapack. To set this up, I added the
following to the environment block for that machine/compiler combo:

    <environment_variables compiler="nvidia">
      <env name="BLAS_ROOT">$SHELL{if [ -z "$BLAS_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$BLAS_ROOT"; fi}</env>
      <env name="LAPACK_ROOT">$SHELL{if [ -z "$LAPACK_ROOT" ]; then echo /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers; else echo "$LAPACK_ROOT"; fi}</env>
      <env name="BLA_VENDOR">NVHPC</env>
    </environment_variables>

The documentation indicates that the find modules for blas may be
smart enough to auto detect the bla_vendor:

BLA_VENDOR
Set to one of the [BLAS/LAPACK Vendors] to search for BLAS only from the specified vendor.
If not set, all vendors are considered.
Once again, I am impressed at how much clutter is removed when we do things t "right" (using find_package instead of packing stuff into SLIBS) way.

[BFB]
@jgfouca jgfouca merged commit 26170af into master Oct 12, 2023
2 checks passed
@jgfouca jgfouca deleted the jgfouca/cmake_blas_lapack branch October 12, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants