Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRF4DVAR fails with intel LLVM compilers #1957

Closed
HathewayWill opened this issue Dec 26, 2023 · 32 comments
Closed

WRF4DVAR fails with intel LLVM compilers #1957

HathewayWill opened this issue Dec 26, 2023 · 32 comments
Assignees

Comments

@HathewayWill
Copy link

HathewayWill commented Dec 26, 2023

Describe the bug
Intel LLVM compilers for WRF da fails to build all the required exe files.

libufr fails to build.

To Reproduce
fails.zip

using option 40 for intel llvm dmpar

Expected behavior
expected 43 exe in /varr/da
expectted 1 exe in var/obsproc/src

got 42 exe in /var/da
got 0 iexe in /var/obsproc/src

Screenshots
If applicable, add screenshots to help explain your problem.

Attachments
works.zip

fix: add the following flags for llvm compilers to CFLAGS

CFLAGS_LOCAL = -w -O3 -ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types#-xHost -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars # -DRSL0_ONLY

Additional context
Add any other context about the problem here, such as:

  • This used to work with a previous version.
  • The documentation is different from the exhibited behavior.
@weiwangncar
Copy link
Collaborator

@HathewayWill We are aware of the compilation issue for DA and Chem code using the newest Intel compiler.

@HathewayWill
Copy link
Author

Wasn't sure. Didn't see it in the GitHub discussion.

@weiwangncar

@weiwangncar
Copy link
Collaborator

@HathewayWill I have added a note in the release note.

@HathewayWill
Copy link
Author

I'll try to find a solution. I have a lot of free time.

@HathewayWill
Copy link
Author

HathewayWill commented Dec 27, 2023

@weiwangncar

Good morning,

I was sucessfully able to get Intel LLVM to install WRFPLUS/WRF4DVAR by adding the following commands:

        sed -i '144s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFPLUS/configure.wrf
        sed -i '145s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFPLUS/configure.wrf

        sed -i '144s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFDA/configure.wrf
        sed -i '145s|-ip|-ip -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types |g' $WRF_FOLDER/WRFDA/configure.wrf

One thing I did notice for WRFPLUS and 4DVAR there is a big memroy leak somewhere during installation.

I have 64GB of RAM and 64GB of SWAP RAM and it was maxing out my physical RAM and then half of my SWAP. I didn't get to see which module was causing it but I think I saw @islas mention there was a process that needed -j 1 somewhere in the documentation for the release notes.

Hope this helps you.

@HathewayWill
Copy link
Author

HathewayWill commented Dec 27, 2023

Here's the output during the memory leak.
Memory Leak.log

Also affects WRF chem

@liujake
Copy link
Contributor

liujake commented Jan 2, 2024

@HathewayWill Can you make a PR with your fixes for WRFDA/WRFPlus compilation with Intel-OneAPI compiler?

@HathewayWill
Copy link
Author

@liujake @weiwangncar

I don't know how to do a PR so I was letting NCAR staff look at my comments and files and let them do it.

@HathewayWill
Copy link
Author

@liujake the memory leak is another problem I don't know how to fix but it's documented in the zip file.

@islas
Copy link
Collaborator

islas commented Jan 6, 2024

@HathewayWill I'm not sure I follow how adding these flags:
-Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types
solves the compilation issues as all they are doing is suppressing warnings, unless WRFDA or the new Intel icx standard used treats those as errors. If that is the case the fix is to actually fix that code rather than suppress them.

It sounds like these flag additions were independent of the memory leak issue, is that correct?

@HathewayWill
Copy link
Author

@islas I'm current sick with COVID and in isolation for 2 weeks. Let me get back to you when I feel better.

@HathewayWill
Copy link
Author

@HathewayWill I'm not sure I follow how adding these flags: -Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types solves the compilation issues as all they are doing is suppressing warnings, unless WRFDA or the new Intel icx standard used treats those as errors. If that is the case the fix is to actually fix that code rather than suppress them.

It sounds like these flag additions were independent of the memory leak issue, is that correct?

@islas

Yes the memeory leak is different. I will rerun the installation without the flags added to show the errors that popped up

@HathewayWill
Copy link
Author

@islas

So here are two log files from WRF v4.5.2 that doesn't compile correctly when those flags are not included.
compile.wrf1.log
compile.wrf2.log

@islas
Copy link
Collaborator

islas commented Jan 8, 2024

Thanks @HathewayWill
The first log seems to fail because module_gfs_machine is being compiled in parallel with module_bl_mynn_common, not before:

2165  rm -f module_gfs_machine.G module_gfs_machine.bb
 2166  rm -f module_bl_qnsepbl.G module_bl_qnsepbl.bb
 2167  time mpiifx -o module_cam_error_function.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_cam_error_function.f90
 2168  rm -f module_bl_acm.G module_bl_acm.bb
 2169  rm -f module_bl_mrf.G module_bl_mrf.bb
 2170  time mpiifx -o complex_number_module.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  complex_number_module.f90
 2171  time mpiifx -o module_bl_ysu.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_ysu.f90
 2172  rm -f module_bl_fogdes.G module_bl_fogdes.bb
 2173  rm -f module_bl_mynn_common.G module_bl_mynn_common.bb
 2174  rm -f module_bl_myjurb.G module_bl_myjurb.bb
 2175  time mpiifx -o module_cam_shr_kind_mod.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_cam_shr_kind_mod.f90
 2176  time mpiifx -o module_bl_shinhong.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_shinhong.f90
 2177  time mpiifx -o module_gfs_machine.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_gfs_machine.f90
 2178  time mpiifx -o module_bl_qnsepbl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_qnsepbl.f90
 2179  time mpiifx -o module_bl_acm.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_acm.f90
 2180  time mpiifx -o module_bl_mrf.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_mrf.f90
 2181  rm -f module_bl_gwdo_gsl.G module_bl_gwdo_gsl.bb
 2182  rm -f module_bl_myjpbl.G module_bl_myjpbl.bb
 2183  time mpiifx -o module_bl_mynn_common.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_mynn_common.f90
 2184  rm -f module_bl_boulac.G module_bl_boulac.bb
 2185  rm -f module_bl_gwdo.G module_bl_gwdo.bb
 2186  time mpiifx -o module_bl_fogdes.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_fogdes.f90
 2187  time mpiifx -o module_bl_myjurb.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_myjurb.f90
 2188  time mpiifx -o module_bl_gwdo_gsl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_gwdo_gsl.f90
 2189  time mpiifx -o module_bl_myjpbl.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_myjpbl.f90
 2190  time mpiifx -o module_bl_boulac.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_boulac.f90
 2191  time mpiifx -o module_bl_gwdo.o -c -O3 -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian    -I../dyn_em  -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/esmf_time_f90  -I/home/workhorse/WRF_Intel/WRFV4.5.2/main -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_netcdf -I/home/workhorse/WRF_Intel/WRFV4.5.2/external/io_int -I/home/workhorse/WRF_Intel/WRFV4.5.2/frame -I/home/workhorse/WRF_Intel/WRFV4.5.2/share -I/home/workhorse/WRF_Intel/WRFV4.5.2/phys -I/home/workhorse/WRF_Intel/WRFV4.5.2/wrftladj -I/home/workhorse/WRF_Intel/WRFV4.5.2/chem -I/home/workhorse/WRF_Intel/WRFV4.5.2/inc -I/home/workhorse/WRF_Intel/Libs/NETCDF/include  -real-size 32 -i4  module_bl_gwdo.f90
 2192  module_bl_mynn_common.f90(21): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [MODULE_GFS_MACHINE]
 2193    use module_gfs_machine,  only : kind_phys
 2194  ------^
 2195  module_bl_mynn_common.f90(21): error #6580: Name in only-list does not exist or is not accessible.   [KIND_PHYS]
 2196    use module_gfs_machine,  only : kind_phys
 2197  ----------------------------------^
 2198  compilation aborted for module_bl_mynn_common.f90 (code 1)

It may be a little difficult to tell, but a sure way to identify it is to look at when the real compile command happens vs when the rm command happens with respect to other files (rm is the first command in the WRF makerule for these files):

  • rm for module_gfs_machine starts at line 2165
  • rm for module_bl_mynn_common starts at line 2173
    • However compile command for module_gfs_machine happens at line 2177, meaning compilation for module_bl_mynn_common has started before module_gfs_machine has even finished compiling

PR #1950 aims to fix these issues - if you search the depende.common we can see that the dependency of module_bl_mynn_common on module_gfs_machine is clearly missing, but exists in the edits for that PR

Log 2 is a little more confusing, I'm not too sure about what the issue is there and I can't definitively rule out an environment issue - however the flags in question should not affect whether MPI or not could be found.

In the end, I am confident log 1 is not truly remedied by the flags and is most likely just adjusting the compilation race condition that exists (i.e. getting lucky) and log 2 shouldn't be affected by those flags.

@HathewayWill
Copy link
Author

@islas

okay that's good information, so those flags that were turning errors into warnings was just a lucky guess then?

@HathewayWill
Copy link
Author

HathewayWill commented Jan 8, 2024

@islas

#1967

could that also be part of the problem with WRF CHEM too?

@islas
Copy link
Collaborator

islas commented Jan 9, 2024

Yes, though I haven't taken a look at the logs you posted in that issue the problem described matches the erroneous behavior pretty well. I suspect it may help, though #1950 only affects the WRF core objects, so if dependencies under the chem or da are missing those will still be issues.

@HathewayWill
Copy link
Author

Yes, though I haven't taken a look at the logs you posted in that issue the problem described matches the erroneous behavior pretty well. I suspect it may help, though #1950 only affects the WRF core objects, so if dependencies under the chem or da are missing those will still be issues.

@islas so that will involve more detailed dives into the logs. Let me know which tests I can do to help because I think those log files for DA and Chem used the flags to make it work.

I can always rerun it without them

@HathewayWill
Copy link
Author

Here are the log files @islas without any flags added for WRFDA 4DVAR

Failure_WRFDA.zip

@HathewayWill
Copy link
Author

@islas

Any updates on these issues plaguing llvm?
#1992
#1981
#1967
#1957

I think they are all related

@islas
Copy link
Collaborator

islas commented Mar 18, 2024

Are you seeing these issues on either the latest updates from develop (9e265af) or the current release candidate (release-v4.6.0)?

These now include build dependency fixes and syntax/flag updates for the new Intel oneAPI compilers for WRF, WRFDA, and WRF-Chem

@HathewayWill
Copy link
Author

@islas

Is there a .tar file for these? I'm not really familiar with how to pull with github.

@weiwangncar
Copy link
Collaborator

@HathewayWill Do this:
git clone https://github.com/wrf-model/WRF.git
cd WRF/
git checkout release-v4.6.0

@HathewayWill
Copy link
Author

@HathewayWill Do this: git clone https://github.com/wrf-model/WRF.git cd WRF/ git checkout release-v4.6.0

Thank you @weiwangncar I'll try it today

@HathewayWill
Copy link
Author

HathewayWill commented Mar 20, 2024

Good morning @weiwangncar @islas @kkeene44 @mgduda

Here are the log files for each issue and their update.

Tested on Ubuntu 22.04.4, 64GB of physical RAM 64GB of SWAP RAM, release candidate 4.6.0

#1992
WRF_4.6.0_intel_LLVM.zip
(PASS)

#1981
WRFCHEM_4.6.0_intel_LLVM_memory_leak.zip
(FAILS, Memory Leak)

#1967
wrfchemda_4.6_intel_llvm.zip
(PASS)

#1957
WRFPLUS_4.6.0_intel_LLVM_memory_leak.zip
(FAIL, Memory Leak)

The Memory leaks in chem and wrfplus maxed out my 128GB worth of RAM and shut down the computer. Happens at the same exact spot on each compilation which is confusing to me.

@islas
Copy link
Collaborator

islas commented Mar 22, 2024

I've taken a closer look at the issue, and as far as I can tell this is not a memory leak (though it does take an exorbitant amount of memory) on the WRF-side of things. I can't say for certain whether it is an ifx memory "issue" per se, but the problem can be isolated to just the compilation of large files (>10k lines of code). I suspect the parsing and basic compiler optimizations are taking the most memory, as disabling all possible optimizations and outputting diagnostic info didn't yield anything of note.

Unfortunately, this is fundamental "feature" of the WRF autogenerated code from the registry and will require splitting some of the larger includes into separable files. I've already attempted this with Fortran submodules with limited success as not all WRF-supported compilers implement this well, thus a better approach for splitting the code up would need to be investigated.

For reference, attached are two outputs : one from the make build and another from the cmake build - both of which show massive spikes in compilation of module_domain.F, which has ~23K lines. Both methods are done with the Intel oneAPI compilers and -j 1, and top sorted by memory usage to show that the process consuming the memory is xfortcom (the llvm compiler under the hood).
Screenshot from 2024-03-20 11-06-54
Screenshot from 2024-03-20 10-43-11

@HathewayWill
Copy link
Author

@weiwangncar @islas

Anything I can do on my side to help this?

@HathewayWill
Copy link
Author

@islas

You tried setting FCFLAGS and CFLAGS with no -03 correct?

@islas
Copy link
Collaborator

islas commented Mar 22, 2024

Correct, -O0 -no-ip and ensuring the MPI compiler wrapper does not sneak any flags in as well

@HathewayWill
Copy link
Author

Correct, -O0 -no-ip and ensuring the MPI compiler wrapper does not sneak any flags in as well

I have also tested different versions of the MPI compiler commands

mpiifx
mpiifort -fc=ifx
mpif90 -fc=ifx

all of them do the same thing with and without optimizations @islas

@HathewayWill
Copy link
Author

I've taken a closer look at the issue, and as far as I can tell this is not a memory leak (though it does take an exorbitant amount of memory) on the WRF-side of things. I can't say for certain whether it is an ifx memory "issue" per se, but the problem can be isolated to just the compilation of large files (>10k lines of code). I suspect the parsing and basic compiler optimizations are taking the most memory, as disabling all possible optimizations and outputting diagnostic info didn't yield anything of note.

Unfortunately, this is fundamental "feature" of the WRF autogenerated code from the registry and will require splitting some of the larger includes into separable files. I've already attempted this with Fortran submodules with limited success as not all WRF-supported compilers implement this well, thus a better approach for splitting the code up would need to be investigated.

For reference, attached are two outputs : one from the make build and another from the cmake build - both of which show massive spikes in compilation of module_domain.F, which has ~23K lines. Both methods are done with the Intel oneAPI compilers and -j 1, and top sorted by memory usage to show that the process consuming the memory is xfortcom (the llvm compiler under the hood). Screenshot from 2024-03-20 11-06-54 Screenshot from 2024-03-20 10-43-11

Quick question.

Why would having the warnings suppressed allow the WRF to install without any issue?

@islas islas closed this as completed in a9de8d2 May 9, 2024
@HathewayWill
Copy link
Author

HathewayWill commented May 12, 2024

reopening with new issue. @islas @weiwangncar @mgduda

see attached log files for errors, to many errors to list. The files crashed the pc
configure.log
configure.wrf.txt
wrfplus1.compile.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants