-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C/CD restarts #684
Comments
Ok. I just found something. The strocnx and strocny terms are written at the T points for the B grid. I am not writing these at the N and E points. |
The forcing is fine just after the scatter to Tair_data, etc. The problem appears to be in the call to interpolate_data. However, I do not understand how this could be impacted by the choice of grid_ice. Maybe we have an out of bounds issue? This is the forcing debugging before writing the restart: (JRA55_data)fdbg read recnum = 82 2 This is after the restart: (JRA55_data)fdbg read recnum = 41 1 |
Also, I had to add the following to icepack_shortwave.F90: ``
`` This should have crashed previously in debug tests, but I guess we never had a case where exp_min went to zero. This is safer though. |
So, it is in ice_forcing.F90 where it looks like the _data fields are interpolated to the internal fields. For some reason, it looks like it is only done on the master task. I'm going to try to run with 1 block per processor. Dave |
Could ice_read_nc be calling the wrong interface? |
No, it looks like it is calling ice_read_nc_xy as it should be. |
This is so weird. I can't explain what is going on. So, here is a summary so far.
fsw_data 652.99182128906250 1 I changed this test so it is one block per processor to simplify things. All blocks have valid values at 1,1,1. The second number is the task. I guess task/block 2 has a zero value.
So, only the master task has valid data. Here is the debug from ice_read_nc_xy for the global array: (ice_read_nc_xy) fid= 131072, lnrec = 41, varname = glbrad Then the global min/max of the fsw array is: So, it is somehow getting negative shortwave. There is no interpolation of the shortwave here: fsw(:,:,:) = fsw_data(:,:,1,:) So, what is happening? It is related to the restart of the CD/C grid. So, are the arrays not allocated for the CD grid on a restart? The sequence for a restart is: call input_data <- sets grid_ice So, everything should be allocated by the time init_forcing_atm is called. The only difference between the initial and restart is that the call to init_rest reads the initial file and init_shortwave is not called. So, why would that impact the reading of the JRA55 data? Also, why would it only fail for CD/C? Dave |
Just for interest sake. I removed the reads for all of the CD/C grid variables and the restart works fine. |
Are field_loc_Nface and field_loc_Eface reversed in ice_gather_scatter.F90? if (this_block%tripoleTFlag) then |
I guess not. It's curious that the cell center is 1,1 in ice_gather_scatter.F90, but it is 0,0 in ice_boundary.F90. |
Just FYI, I have on my task list to setup a unit test driver to test the halo updates and gather/scatters. That would be comprehensive in the sense that I'd test various grids, all field_loc values, and all other options in all combinations as best as I could. Again, one of those situations where we have some confidence in the calls/combinations we use, but not so much in the calls/combinations we don't. It doesn't surprise me that something that hasn't been used before may not working properly. @dabail10, have you found the problem in the current implementation or are you still looking? Thanks for your efforts! |
I am still looking. I removed the CD/C restart fields from the read and it runs fine. When I add just one back in, it aborts. So, I am suspicious of a scatter here. However, could it just be a memory issue? I'm completely puzzled. |
I just did a 64x1 case, so two nodes and 64 tasks. This still has the "same" problem: (JRA55_data)fdbg JRA55_bulk_data While fsw is no longer negative, Qa is negative. Plus, the uatm component has exceptionally large values. So, I am back to thinking it is the scattering. |
Debugging now. There was a problem in the query_field implementation. Only the master task was getting a valid value. I have no idea how the restart read didn't deadlock on the scatter, but it didn't and the non-master values were unset. I have fixed that and trying to do some further validation. I may not get exact restart working, but hoping to at least validate fields are being read correctly. I will PR a fix soon to cgridDEV. |
Fixed with apcraig#66. Restarts are now bit-for-bit for C/CD in current gridsys test suite. We may need to expand testing as we move forward. If new issues arise, new issues can be created. |
The atmospheric fields for the gx3 grid are not initialized properly when restarting for the C/CD grids.
The text was updated successfully, but these errors were encountered: