Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bufrlib C code fix for WRFDA build with intel oneAPI compiler and a run-time segfault Fortran bugfix #1972

Merged
merged 8 commits into from
Jan 10, 2024

Conversation

mos3r3n
Copy link
Contributor

@mos3r3n mos3r3n commented Jan 5, 2024

Bufrlib C code fix for WRFDA build with intel oneAPI compiler and a run-time segfault Fortran bugfix

TYPE: bugfix

KEYWORDS: WRFDA, Intel OneAPI, segfault

SOURCE: Tao Sun (NCAR)

ISSUE: For use when this PR closes an issue.
Fixes #1957

LIST OF MODIFIED FILES:
M var/da/da_obs/da_fill_obs_structures.inc
M var/external/bufr/preproc.sh
M var/external/bufr/stseq.c
M var/external/bufr/bufrlib.h

TESTS CONDUCTED:

  1. Successfully compiled and tested on Derecho with the intel oneAPI compiler.

@mos3r3n mos3r3n requested review from a team as code owners January 5, 2024 17:48
arch/configure.defaults Outdated Show resolved Hide resolved
@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@weiwangncar
Copy link
Collaborator

weiwangncar commented Jan 7, 2024

@mos3r3n I asked the contributor who made PR-1823 to correct many c code in WRF in order to make it compatible with icx/ifx, his suggestion is not to use '-Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types' but to fix the actual c code. See here.

@weiwangncar
Copy link
Collaborator

@mos3r3n Please also see discussion regarding Issue-1957.

@liujake liujake requested review from liujake and junmeiban and removed request for a team January 8, 2024 18:11
@liujake
Copy link
Contributor

liujake commented Jan 8, 2024

@weiwangncar We'd prefer to go with current flag solution given resource/time limitation. We could look at for a code fix in a future time.

@liujake liujake requested a review from weiwangncar January 8, 2024 18:13
liujake
liujake previously approved these changes Jan 8, 2024
junmeiban
junmeiban previously approved these changes Jan 8, 2024
@weiwangncar
Copy link
Collaborator

@islas @mgduda Do you want to comment on this PR?

@mgduda
Copy link
Collaborator

mgduda commented Jan 9, 2024

@islas @mgduda Do you want to comment on this PR?

In my opinion, it would be much better to fix the code than to add compiler flags (-Wno-implicit-function-declaration -Wno-incompatible-function-pointer-types) to turn errors into warnings. I do appreciate that it's hard to know how much work might be involved in fixing the code, and it's therefore difficult to commit to fixing it on any particular timeline. Nonetheless, the addition of these flags does affect more than just WRFDA, so perhaps it would be good to at least make an attempt at fixing the code?

@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

@weiwangncar @mgduda it is perhaps hard to fix this in code level in short time. It is fine to me to leave this out for now and inform users about this.

@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

Or is there a way we only add these flags for WRFDA build, but exclude them for WRF build?

@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

Tao Sun @mos3r3n told me that there are not too many C code in the bufr lib. He will look into it.

@mos3r3n mos3r3n dismissed stale reviews from junmeiban and liujake via d8ede5c January 9, 2024 20:30
@mos3r3n
Copy link
Contributor Author

mos3r3n commented Jan 9, 2024

Tao Sun @mos3r3n told me that there are not too many C code in the bufr lib. He will look into it.

I have modified the bufrlib.h and stseq.c in var/external/bufr. Now on Derecho with intel oneAPI compiler WRFDA can be compiled successfully. I also did two simple 3dvar tests to assimilate the conventional data from little_r format and from prepbufr format, respectively. Both tests succeed.

@liujake liujake requested a review from mgduda January 9, 2024 21:01
@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

@mos3r3n Thank you for quick fixes! I will leave @islas and @mgduda to review C code changes.

@islas islas self-requested a review January 9, 2024 22:57
@Bayu-Risanto
Copy link

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error
cpp: error: CONFIGURE_D_CTSM: No such file or directory
cpp: warning: ‘-x c’ after last input file has no effect
cpp: error: CONFIGURE_D_CTSM: No such file or directory
cpp: warning: ‘-x c’ after last input file has no effect
cpp: fatal error: no input files
compilation terminated.

Am I missing something here?

var/external/bufr/bufrlib.h Show resolved Hide resolved
var/da/da_obs/da_fill_obs_structures.inc Outdated Show resolved Hide resolved
var/da/da_obs/da_fill_obs_structures.inc Show resolved Hide resolved
@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: fatal error: no input files compilation terminated.

Am I missing something here?

Can you provide more info about how you did exactly? e.g., what module environment you loaded, then followed by what configure/compile steps?

islas
islas previously approved these changes Jan 9, 2024
@islas islas self-requested a review January 9, 2024 23:22
@Bayu-Risanto
Copy link

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: fatal error: no input files compilation terminated.
Am I missing something here?

Can you provide more info about how you did exactly? e.g., what module environment you loaded, then followed by what configure/compile steps?

I replaced the original files in WRFDA with the modified preproc.sh, da_fill_obs_structures.inc, and configure.defaults from this github. I loaded intel-oneAPI to replace intel compiler. The other modules I have are ncarenv/23.06, craype/2.7.20, hdf5/1.12.2, netcdf/4.9.2, cray-mpich/8.1.25, ncarcompilers/1.00. I cleaned up the WRFDA directory using ./clean -aa. Then I did the configuration i.e., ./configure wrfda with option #50, followed by the compilation i.e, ./compile all_wrfvar >& compile.out.

@liujake
Copy link
Contributor

liujake commented Jan 9, 2024

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: fatal error: no input files compilation terminated.
Am I missing something here?

Can you provide more info about how you did exactly? e.g., what module environment you loaded, then followed by what configure/compile steps?

I replaced the original files in WRFDA with the modified preproc.sh, da_fill_obs_structures.inc, and configure.defaults from this github. I loaded intel-oneAPI to replace intel compiler. The other modules I have are ncarenv/23.06, craype/2.7.20, hdf5/1.12.2, netcdf/4.9.2, cray-mpich/8.1.25, ncarcompilers/1.00. I cleaned up the WRFDA directory using ./clean -aa. Then I did the configuration i.e., ./configure wrfda with option #50, followed by the compilation i.e, ./compile all_wrfvar >& compile.out.

Are you compiling the latest develop branch along with 3 modified files or an older version?

@Bayu-Risanto
Copy link

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: fatal error: no input files compilation terminated.
Am I missing something here?

Can you provide more info about how you did exactly? e.g., what module environment you loaded, then followed by what configure/compile steps?

I replaced the original files in WRFDA with the modified preproc.sh, da_fill_obs_structures.inc, and configure.defaults from this github. I loaded intel-oneAPI to replace intel compiler. The other modules I have are ncarenv/23.06, craype/2.7.20, hdf5/1.12.2, netcdf/4.9.2, cray-mpich/8.1.25, ncarcompilers/1.00. I cleaned up the WRFDA directory using ./clean -aa. Then I did the configuration i.e., ./configure wrfda with option #50, followed by the compilation i.e, ./compile all_wrfvar >& compile.out.

Are you compiling the latest develop branch along with 3 modified files or an older version?

I am using wrfda from wrf4.2.2. It is an older version.

@mos3r3n
Copy link
Contributor Author

mos3r3n commented Jan 9, 2024

I use the three modified files and compile WRFDA with intel oneAPI on Derecho, and yet I got error cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: error: CONFIGURE_D_CTSM: No such file or directory cpp: warning: ‘-x c’ after last input file has no effect cpp: fatal error: no input files compilation terminated.
Am I missing something here?

Can you provide more info about how you did exactly? e.g., what module environment you loaded, then followed by what configure/compile steps?

I replaced the original files in WRFDA with the modified preproc.sh, da_fill_obs_structures.inc, and configure.defaults from this github. I loaded intel-oneAPI to replace intel compiler. The other modules I have are ncarenv/23.06, craype/2.7.20, hdf5/1.12.2, netcdf/4.9.2, cray-mpich/8.1.25, ncarcompilers/1.00. I cleaned up the WRFDA directory using ./clean -aa. Then I did the configuration i.e., ./configure wrfda with option #50, followed by the compilation i.e, ./compile all_wrfvar >& compile.out.

Are you compiling the latest develop branch along with 3 modified files or an older version?

I am using wrfda from wrf4.2.2. It is an older version.

These are the modules I am using for the develop branch:

  1. ncarenv/23.06 (S) 2) craype/2.7.20 3) hdf5-mpi/1.12.2 4) netcdf-mpi/4.9.2 5) ncl/6.6.2 6) nco/5.1.4 7) cray-mpich/8.1.25 8) ncarcompilers/1.0.0 9) intel-oneapi/2023.0.0

When doing configure, I chose 78 instead of 50. The option 50 is not for intel oneAPI. You need to switch to intel-classic for option 50.

Copy link
Collaborator

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me. I'd only suggest filling in the details in the PR description template.

@weiwangncar
Copy link
Collaborator

The last change passed the regression test:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@weiwangncar
Copy link
Collaborator

@Bayu-Risanto If you're using code prior to 4.5.2, you should not use oneAPI. Try module load intel-classic/2023.0.0, and you should be able to compile the WRFDA code. There are other changes in 4.5.2 that may affect your use of oneAPI compiler.

@liujake liujake self-requested a review January 10, 2024 02:58
@liujake liujake merged commit a9de8d2 into wrf-model:develop Jan 10, 2024
1 of 3 checks passed
@liujake liujake changed the title bugfix of WRFDA for intel oneAPI compiler Bufrlib C code fix for WRFDA build with intel oneAPI compiler and a run-time segfault Fortran bugfix Jan 10, 2024
@Bayu-Risanto
Copy link

@Bayu-Risanto If you're using code prior to 4.5.2, you should not use oneAPI. Try module load intel-classic/2023.0.0, and you should be able to compile the WRFDA code. There are other changes in 4.5.2 that may affect your use of oneAPI compiler.

Okay. Thank you. I installed the WRFDA from WRFv4.5.2 successfully, though I still have a problem with running the da_wrfvar.exe. Some ens members still crashed, producing segmentation faults. Some others were completed successfully. I am not sure where it went wrong.

@Bayu-Risanto
Copy link

Bayu-Risanto commented Jan 12, 2024

@Bayu-Risanto If you're using code prior to 4.5.2, you should not use oneAPI. Try module load intel-classic/2023.0.0, and you should be able to compile the WRFDA code. There are other changes in 4.5.2 that may affect your use of oneAPI compiler.

Hello... I am using da_wrfvar.exe to perturb my model (HRRR) initiation. However, in some of the members, I get something like this. the rsl.error.0000 says that it is completed but it also says segmentation fault. Can somebody help and tell me what is happening here? Thank you.

`taskid: 0 hostname: dec1644
module_io_quilt_old.F 2931 T
Namelist logging not found in namelist.input. Using registry defaults for varia
bles in logging.
Ntasks in X 8 , ntasks in Y 8


Parent domain
ids,ide,jds,jde 1 457 1 392
ims,ime,jms,jme -4 64 -4 56
ips,ipe,jps,jpe 1 57 1 49


DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 268992700 bytes allocated
Input data is acceptable to use: fg
Tile Strategy is not specified. Assuming 1D-Y
WRF TILE 1 IS 1 IE 57 JS 1 JE 13
WRF TILE 2 IS 1 IE 57 JS 14 JE 25
WRF TILE 3 IS 1 IE 57 JS 26 JE 37
WRF TILE 4 IS 1 IE 57 JS 38 JE 49
WRF NUMBER OF TILES = 4
--------------------------- WARNING ---------------------------
WARNING FROM FILE: da_scan_obs_ascii.inc LINE: 69
Error 29 opening gts obs file ob.ascii

--------------------------- WARNING ---------------------------
WARNING FROM FILE: da_read_obs_ascii.inc LINE: 107
Error 29 opening gts obs file ob.ascii

*** WRF-Var completed successfully ***
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.31.s 000014E4339978C0 Unknown Unknown Unknown
libhdf5.so.200.2. 000014E42DCCB874 H5F_get_nrefs Unknown Unknown
libhdf5.so.200.2. 000014E42DF31FF6 H5VL__native_file Unknown Unknown
libhdf5.so.200.2. 000014E42DF1CBE1 H5VL_file_close Unknown Unknown
libhdf5.so.200.2. 000014E42DCC8D9B Unknown Unknown Unknown
libhdf5.so.200.2. 000014E42DD405A9 H5I_clear_type Unknown Unknown
libhdf5.so.200.2. 000014E42DCC02D8 H5F_term_package Unknown Unknown
libhdf5.so.200.2. 000014E42DC0EC17 H5_term_library Unknown Unknown
libc-2.31.so 000014E430399AE9 Unknown Unknown Unknown
libc-2.31.so 000014E430399C7A Unknown Unknown Unknown
libc-2.31.so 000014E4303812A4 __libc_start_main Unknown Unknown
da_wrfvar.exe 00000000004102FA Unknown Unknown Unknown`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants