Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem writing parallel with compression? #264

Closed
edwardhartnett opened this issue Jun 16, 2020 · 13 comments
Closed

Potential problem writing parallel with compression? #264

edwardhartnett opened this issue Jun 16, 2020 · 13 comments

Comments

@edwardhartnett
Copy link
Contributor

The NOAA model is having trouble with netCDF writing some variables in parallel with compression.

Is this a netCDF problem? That is the question. I will add a test to mimic the situation and we will see if we can catch a bug, or confirm that netCDF is doing what it should.

@junwang-noaa
Copy link

Here is the piece of code:
if (lm > 1) then
call add_dim(ncid, "pfull", pfull_dimid, wrtgrid, rc)
call add_dim(ncid, "phalf", phalf_dimid, wrtgrid, rc)
end if
....
subroutine add_dim(ncid, dim_name, dimid, grid, rc)
integer, intent(in) :: ncid
character(len=*), intent(in) :: dim_name
integer, intent(inout) :: dimid
type(ESMF_Grid), intent(in) :: grid
integer, intent(out) :: rc
...
ncerr = nf90_def_dim(ncid, trim(dim_name), n, dimid); NC_ERR_STOP(ncerr)
...
ncerr = nf90_def_var(ncid, dim_name, NF90_REAL4, dimids=(/dimid/), varid=dim_varid); NC_ERR_STOP(ncerr)
ncerr = nf90_var_par_access(ncid, dim_varid, NF90_INDEPENDENT)
allocate(valueListR4(n))
call ESMF_AttributeGet(grid, convention="NetCDF", purpose="FV3", &
name=trim(dim_name), valueList=valueListR4, rc=rc); ESMF_ERR_RETURN(rc)
ncerr = nf90_enddef(ncid=ncid); NC_ERR_STOP(ncerr)
ncerr = nf90_put_var(ncid, dim_varid, values=valueListR4 ); NC_ERR_STOP(ncerr)
ncerr = nf90_redef(ncid=ncid); NC_ERR_STOP(ncerr)
deallocate(valueListR4)
...

The code is writing out pfull correct, but phalf with all zeros. However the value array valueListR4 holds correct values for both pfull and phalf.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Jun 16, 2020

What is the actual value of wrtgtid?

And what is the value of n? It should be equal to wrtgrid, correct?

Also, what is the ESMF_AttributGet doing? Is it getting an attribute value?

@edwardhartnett
Copy link
Contributor Author

Also how many processors is this running on?

@edwardhartnett
Copy link
Contributor Author

In your code there you only have one nf90_put_var() call writing valueListR4. You should have two calls, one to write to the phalf, and one to the pfull. You are putting the varids for both into dim_varid, but they are separate and different values. There are two vars, phalf, and pfull, and you must write to both of them.

Or are you actually writing both of them in some kind of loop here?

@junwang-noaa
Copy link

junwang-noaa commented Jun 16, 2020 via email

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Jun 16, 2020

What is the value of n please in these calls:
ncerr = nf90_def_dim(ncid, trim(dim_name), n, dimid); NC_ERR_STOP(ncerr)

That is, the lengths of the two dimensions.

@junwang-noaa
Copy link

junwang-noaa commented Jun 16, 2020 via email

@edwardhartnett
Copy link
Contributor Author

I need the actual values of n. The lengths of the dimensions.

Can you do an ncdump -h -s of the file you expect to get (i.e. the case where it works?)

@junwang-noaa
Copy link

junwang-noaa commented Jun 16, 2020 via email

@edwardhartnett
Copy link
Contributor Author

OK, thanks, let me work on that for a bit...

@edwardhartnett
Copy link
Contributor Author

@junwang-noaa I have a draft PR up, please look and answer the questions there.

Also, where is the I/O code for this? That is, what repo, what subdirectory/files, and approximately what lines of code? I would like to look at it directly...

@junwang-noaa
Copy link

Please check: https://github.com/NOAA-EMC/fv3atm/blob/develop/io/module_write_netcdf_parallel.F90

@edwardhartnett
Copy link
Contributor Author

OK, this turned out to be an install issue, not a netcdf-fortran issue. NetCDF is writing parallel with compression just fine. ;-)

I have added a test netcdf-fortran to confirm this, and as a starting point for the next time someone raises an issue like this.

I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants