bug: model size is sometimes i8 sometimes i4 #278

hkershaw-brown · 2021-08-27T17:50:17Z

🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.

Originally reported by a user trying to run a very wrf large model.

Describe the bug

List the steps someone needs to take to reproduce the bug.
use wrf model mod with model size declared as i8
run perfect model_obs
What was the expected outcome?
I think a user should be able to have a model size i8
What actually happened?
bomb in direct_netcdf_mod.f90. For example subroutine read_transpose
169 integer, intent(inout) :: dart_index

There may be other locations where integer is being used instead of integer(i8)

Which model(s) are you working with?

WRF

Version of DART

v9.11.8

Have you modified the DART code?

Yes to add i8 to wrf model_mod.f90

The text was updated successfully, but these errors were encountered:

hkershaw-brown · 2021-08-30T20:16:08Z

big pop run with i8?

nancycollins · 2021-08-30T20:24:55Z

testing with 1/10th degree pop should smoke out any i8/i4 issues introduced into the main dart code. but if the WRF model_mod code has i8/i4 issues with the state vector size or indices into the state vector, you'll have to find/construct a large wrf state to test that.

hkershaw-brown · 2021-08-30T20:26:23Z

do you know where 1/10th degree pop is? Is this it: https://github.com/NCAR/DART_development/tree/static_data

nancycollins · 2021-08-30T20:32:00Z

the code? it's just the standard models/POP

the data? good question. possibly on image.ucar.edu in the data area, or on campaign storage, or lost to the sands of time.

hkershaw-brown · 2021-08-31T22:28:44Z

The big pop test case as is, is too small to catch direct_netcdf_mod.f90 i4 vs i8

big pop test case:
/glade/p/cisl/dares/DART_test_cases/pop/pop_tx0.1v2/

	# elements
model size	2,151,360,000
3D variable size	535,680,000
2D variable size	8,640,000
i4	2,147,483,647

 PE 0: verify_state_variables variable  1 is SALT_CUR, QTY_SALINITY, UPDATE
 PE 0: verify_state_variables variable  2 is TEMP_CUR, QTY_POTENTIAL_TEMPERATURE
 , UPDATE
 PE 0: verify_state_variables variable  3 is UVEL_CUR, QTY_U_CURRENT_COMPONENT, 
 UPDATE
 PE 0: verify_state_variables variable  4 is VVEL_CUR, QTY_V_CURRENT_COMPONENT, 
 UPDATE
 PE 0: verify_state_variables variable  5 is PSURF_CUR, QTY_SEA_SURFACE_PRESSURE

4 3D variables = 2,142,720,000 so you have 4,763,647 spare before you hit i4 vs i8 problems keeping track of the dart index in the read/write calls.

same deal with state_structure_mod.f90 offset calculation:

1394 integer :: ndims, offset
1401 offset = get_index_start(dom_id, var_id)

So the i4 i8 for netcdf and state_structure might have been in the code forever since no one has run anything big enough to hit problems.

-> Add more variables to big-pop or try mega-WRF.

see issue #278 why we did not catch this when running big-POP.

hkershaw-brown · 2021-09-02T21:56:14Z

side note on wrf: some nice clean up here that did not make it to main:
9609475

hkershaw-brown · 2021-09-08T14:47:08Z

model_mod.f90 docs (template and default) have the convert_vertical_obs arguments (loc_types(:)) in convert_vertical_state instead of the convert_vertical_state arguements (loc_indx(:))
https://docs.dart.ucar.edu/en/latest/models/utilities/default_model_mod.html?highlight=convert_vertical
https://docs.dart.ucar.edu/en/latest/models/template/readme.html?highlight=convert_vertical

hkershaw-brown · 2021-09-08T20:12:25Z

How does the itemcount check work in mpi_utitlies when you have a srcarray bigger than 2^31-1?

subroutine send_to(dest_id, srcarray, time, label)
integer :: itemcount
itemcount = size(srcarray)

DART/assimilation_code/modules/utilities/mpi_utilities_mod.f90

Line 155 in 62ecfa2

integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000

DART/assimilation_code/modules/utilities/mpi_utilities_mod.f90

Line 574 in 62ecfa2

if (itemcount <= SNDRCV_MAXSIZE) then

hkershaw-brown · 2021-09-08T21:41:46Z

ifort not so good:

hkershaw@cisl-fisher test-mpi $ cat big_itemcount.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision :: srcarray(length)
integer :: itemcount

srcarray(:) = 5.d0

itemcount = size(srcarray)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)

print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)
print*, srcarray(3120000000_i8)

end program itemcount_test
hkershaw@cisl-fisher test-mpi $ ifort -warn big_itemcount.f90 
hkershaw@cisl-fisher test-mpi $ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.00000000000000     
hkershaw@cisl-fisher test-mpi $

note you will need ifort -mcmodel=medium big_itemcount.f90 to compile this on Cheyenne

gfortran/11.1.0 will catch this at compile time:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount.f90 
big_itemcount.f90:12:17:

   12 | itemcount = size(srcarray)
      |                 1
Error: Result of SIZE overflows its kind at (1)
big_itemcount.f90:13:53:

   13 | print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)
      |                                                     1
Error: Result of SIZE overflows its kind at (1)

gfortran using allocatable arrays, no chance:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount_alloc.f90
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.0000000000000000     
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ cat big_itemcount_alloc.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision, allocatable :: srcarray(:)
integer :: itemcount

allocate(srcarray(length))
srcarray(:) = 5.d0

itemcount = size(srcarray)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)

print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)
print*, srcarray(3120000000_i8)

deallocate(srcarray)

end program itemcount_test
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount_alloc.f90
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.0000000000000000

i8 version:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ifort big_itemcount8_alloc.f90 
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray)            3120000000            3120000000
 (itemcount <= SNDRCV_MAXSIZE) F
   5.00000000000000     
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ cat big_itemcount8_alloc.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision, allocatable :: srcarray(:)
integer(i8) :: itemcount

allocate(srcarray(length))
srcarray(:) = 5.d0

itemcount = size(srcarray,KIND=i8)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray,KIND=i8)
print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)

print*, srcarray(3120000000_i8)
deallocate(srcarray)

end program itemcount_test
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ifort big_itemcount8_alloc.f90 
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray)            3120000000            3120000000
 (itemcount <= SNDRCV_MAXSIZE) F
   5.00000000000000

hkershaw-brown · 2021-09-10T22:32:17Z

noticed this when testing:

direct_netcdf_mod.f90 creates netcdf files in 64bit offset mode:

DART/assimilation_code/modules/io/direct_netcdf_mod.f90

Line 1617 in 62ecfa2

create_mode = ior(NF90_CLOBBER, NF90_64BIT_OFFSET)

netcdf_utilities_mod.f90 creates netcdf files in classic mode

DART/assimilation_code/modules/utilities/netcdf_utilities_mod.f90

Line 2409 in 62ecfa2

function nc_create_file(filename, context)

Do we care about this? I don't know. Just making a note. May move to a separate issue.

perfect_model_obs
input state file is netCDF-4 time as unlimited
dart created file from perfect_model_obs has a time dimension but not unlimited. 64-bit offset.

If you create the dart file as netCDF-4 (same as the template file) you

message: nf90_enddef end define mode: NetCDF: One or more variable sizes viola
 te format constraints

so the 64 bit offset is getting around not having the unlimited dimension.

hkershaw-brown added the Bug Something isn't working label Aug 27, 2021

hkershaw-brown added a commit that referenced this issue Sep 2, 2021

i8 variables for calculating state vector indices.

91e36fb

see issue #278 why we did not catch this when running big-POP.

This was referenced Sep 16, 2021

note on netcdf 64bit vs netcdf-4 file creation DART #287

Open

Model size i8 fix for state vector io and mpi_utilities #288

Merged

Feature request: Support for large model sizes #292

Closed

hkershaw-brown closed this as completed in #288 Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: model size is sometimes i8 sometimes i4 #278

bug: model size is sometimes i8 sometimes i4 #278

hkershaw-brown commented Aug 27, 2021

hkershaw-brown commented Aug 30, 2021

nancycollins commented Aug 30, 2021

hkershaw-brown commented Aug 30, 2021

nancycollins commented Aug 30, 2021

hkershaw-brown commented Aug 31, 2021

hkershaw-brown commented Sep 2, 2021 •

edited

Loading

hkershaw-brown commented Sep 8, 2021

hkershaw-brown commented Sep 8, 2021

hkershaw-brown commented Sep 8, 2021 •

edited

Loading

hkershaw-brown commented Sep 10, 2021 •

edited

Loading

bug: model size is sometimes i8 sometimes i4 #278

bug: model size is sometimes i8 sometimes i4 #278

Comments

hkershaw-brown commented Aug 27, 2021

Describe the bug

Which model(s) are you working with?

Version of DART

Have you modified the DART code?

hkershaw-brown commented Aug 30, 2021

nancycollins commented Aug 30, 2021

hkershaw-brown commented Aug 30, 2021

nancycollins commented Aug 30, 2021

hkershaw-brown commented Aug 31, 2021

hkershaw-brown commented Sep 2, 2021 • edited Loading

hkershaw-brown commented Sep 8, 2021

hkershaw-brown commented Sep 8, 2021

hkershaw-brown commented Sep 8, 2021 • edited Loading

hkershaw-brown commented Sep 10, 2021 • edited Loading

hkershaw-brown commented Sep 2, 2021 •

edited

Loading

hkershaw-brown commented Sep 8, 2021 •

edited

Loading

hkershaw-brown commented Sep 10, 2021 •

edited

Loading