Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: model size is sometimes i8 sometimes i4 #278

Closed
hkershaw-brown opened this issue Aug 27, 2021 · 10 comments · Fixed by #288
Closed

bug: model size is sometimes i8 sometimes i4 #278

hkershaw-brown opened this issue Aug 27, 2021 · 10 comments · Fixed by #288
Labels
Bug Something isn't working

Comments

@hkershaw-brown
Copy link
Member

🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.

Originally reported by a user trying to run a very wrf large model.

Describe the bug

  1. List the steps someone needs to take to reproduce the bug.
    use wrf model mod with model size declared as i8
    run perfect model_obs

  2. What was the expected outcome?
    I think a user should be able to have a model size i8

  3. What actually happened?
    bomb in direct_netcdf_mod.f90. For example subroutine read_transpose
    169 integer, intent(inout) :: dart_index

There may be other locations where integer is being used instead of integer(i8)

Which model(s) are you working with?

WRF

Version of DART

v9.11.8

Have you modified the DART code?

Yes to add i8 to wrf model_mod.f90

@hkershaw-brown hkershaw-brown added the Bug Something isn't working label Aug 27, 2021
@hkershaw-brown
Copy link
Member Author

big pop run with i8?

@nancycollins
Copy link
Collaborator

testing with 1/10th degree pop should smoke out any i8/i4 issues introduced into the main dart code. but if the WRF model_mod code has i8/i4 issues with the state vector size or indices into the state vector, you'll have to find/construct a large wrf state to test that.

@hkershaw-brown
Copy link
Member Author

do you know where 1/10th degree pop is? Is this it: https://github.com/NCAR/DART_development/tree/static_data

@nancycollins
Copy link
Collaborator

the code? it's just the standard models/POP

the data? good question. possibly on image.ucar.edu in the data area, or on campaign storage, or lost to the sands of time.

@hkershaw-brown
Copy link
Member Author

The big pop test case as is, is too small to catch direct_netcdf_mod.f90 i4 vs i8

big pop test case:
/glade/p/cisl/dares/DART_test_cases/pop/pop_tx0.1v2/

# elements
model size 2,151,360,000
3D variable size 535,680,000
2D variable size 8,640,000
i4 2,147,483,647
 PE 0: verify_state_variables variable  1 is SALT_CUR, QTY_SALINITY, UPDATE
 PE 0: verify_state_variables variable  2 is TEMP_CUR, QTY_POTENTIAL_TEMPERATURE
 , UPDATE
 PE 0: verify_state_variables variable  3 is UVEL_CUR, QTY_U_CURRENT_COMPONENT, 
 UPDATE
 PE 0: verify_state_variables variable  4 is VVEL_CUR, QTY_V_CURRENT_COMPONENT, 
 UPDATE
 PE 0: verify_state_variables variable  5 is PSURF_CUR, QTY_SEA_SURFACE_PRESSURE

4 3D variables = 2,142,720,000 so you have 4,763,647 spare before you hit i4 vs i8 problems keeping track of the dart index in the read/write calls.

same deal with state_structure_mod.f90 offset calculation:

1394 integer :: ndims, offset
1401 offset = get_index_start(dom_id, var_id)

So the i4 i8 for netcdf and state_structure might have been in the code forever since no one has run anything big enough to hit problems.

-> Add more variables to big-pop or try mega-WRF.

hkershaw-brown added a commit that referenced this issue Sep 2, 2021
see issue #278 why we did not catch this when running big-POP.
@hkershaw-brown
Copy link
Member Author

hkershaw-brown commented Sep 2, 2021

side note on wrf: some nice clean up here that did not make it to main:
9609475

@hkershaw-brown
Copy link
Member Author

model_mod.f90 docs (template and default) have the convert_vertical_obs arguments (loc_types(:)) in convert_vertical_state instead of the convert_vertical_state arguements (loc_indx(:))
https://docs.dart.ucar.edu/en/latest/models/utilities/default_model_mod.html?highlight=convert_vertical
https://docs.dart.ucar.edu/en/latest/models/template/readme.html?highlight=convert_vertical

@hkershaw-brown
Copy link
Member Author

How does the itemcount check work in mpi_utitlies when you have a srcarray bigger than 2^31-1?

subroutine send_to(dest_id, srcarray, time, label)
integer :: itemcount
itemcount = size(srcarray)

integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000

if (itemcount <= SNDRCV_MAXSIZE) then

@hkershaw-brown
Copy link
Member Author

hkershaw-brown commented Sep 8, 2021

ifort not so good:

hkershaw@cisl-fisher test-mpi $ cat big_itemcount.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision :: srcarray(length)
integer :: itemcount

srcarray(:) = 5.d0

itemcount = size(srcarray)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)

print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)
print*, srcarray(3120000000_i8)

end program itemcount_test
hkershaw@cisl-fisher test-mpi $ ifort -warn big_itemcount.f90 
hkershaw@cisl-fisher test-mpi $ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.00000000000000     
hkershaw@cisl-fisher test-mpi $ 

note you will need ifort -mcmodel=medium big_itemcount.f90 to compile this on Cheyenne

gfortran/11.1.0 will catch this at compile time:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount.f90 
big_itemcount.f90:12:17:

   12 | itemcount = size(srcarray)
      |                 1
Error: Result of SIZE overflows its kind at (1)
big_itemcount.f90:13:53:

   13 | print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)
      |                                                     1
Error: Result of SIZE overflows its kind at (1)

gfortran using allocatable arrays, no chance:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount_alloc.f90
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.0000000000000000     
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ cat big_itemcount_alloc.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision, allocatable :: srcarray(:)
integer :: itemcount

allocate(srcarray(length))
srcarray(:) = 5.d0

itemcount = size(srcarray)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray)

print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)
print*, srcarray(3120000000_i8)

deallocate(srcarray)

end program itemcount_test
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ gfortran  big_itemcount_alloc.f90
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray) -1174967296 -1174967296
 (itemcount <= SNDRCV_MAXSIZE) T
   5.0000000000000000     

i8 version:

hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ifort big_itemcount8_alloc.f90 
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray)            3120000000            3120000000
 (itemcount <= SNDRCV_MAXSIZE) F
   5.00000000000000     
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ cat big_itemcount8_alloc.f90 
program itemcount_test

integer, parameter :: i8 = SELECTED_INT_KIND(13)
integer, parameter :: SNDRCV_MAXSIZE = 2 * 1000 * 1000 * 1000 
integer(i8), parameter :: length = 3120000000_i8

double precision, allocatable :: srcarray(:)
integer(i8) :: itemcount

allocate(srcarray(length))
srcarray(:) = 5.d0

itemcount = size(srcarray,KIND=i8)
print*, 'itemcount, size(srcarray)', itemcount, size(srcarray,KIND=i8)
print*, '(itemcount <= SNDRCV_MAXSIZE)', (itemcount <= SNDRCV_MAXSIZE)

print*, srcarray(3120000000_i8)
deallocate(srcarray)

end program itemcount_test
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ifort big_itemcount8_alloc.f90 
hkershaw@cheyenne6:/glade/scratch/hkershaw/Temp/big_itemcount$ ./a.out 
 itemcount, size(srcarray)            3120000000            3120000000
 (itemcount <= SNDRCV_MAXSIZE) F
   5.00000000000000     

@hkershaw-brown
Copy link
Member Author

hkershaw-brown commented Sep 10, 2021

noticed this when testing:

direct_netcdf_mod.f90 creates netcdf files in 64bit offset mode:

create_mode = ior(NF90_CLOBBER, NF90_64BIT_OFFSET)

netcdf_utilities_mod.f90 creates netcdf files in classic mode

function nc_create_file(filename, context)

Do we care about this? I don't know. Just making a note. May move to a separate issue.

perfect_model_obs
input state file is netCDF-4 time as unlimited
dart created file from perfect_model_obs has a time dimension but not unlimited. 64-bit offset.

If you create the dart file as netCDF-4 (same as the template file) you

message: nf90_enddef end define mode: NetCDF: One or more variable sizes viola
 te format constraints

so the 64 bit offset is getting around not having the unlimited dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants