Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indexing issues with correlated error in V16.x #233

Closed
KristenBathmann opened this issue Oct 15, 2021 · 21 comments
Closed

Fix indexing issues with correlated error in V16.x #233

KristenBathmann opened this issue Oct 15, 2021 · 21 comments

Comments

@KristenBathmann
Copy link
Contributor

A recent update to release/gfsda.v16.x allowed for flexibility in channel selection in the satinfo while preserving correlated error. Testing with GDAS was successful, but testing with GFS failed. This is because the gfsanal and gdasanal use different channel indices. The gfsanal completely ignores monitored channels and generates a satinfo file that does not contain them.

Each covariance file is generated as
open(26,file=trim(cov_file),form='unformatted')
write(26) nch_active, nctot, reclen
write(26) indR
write(26) Rcov
close(26)
nch_active is the number of active channels, for example 174 for IASI
nctot is the total number of channels, for example 616 for IASI
indR are satinfo relative indices, for example 1, 5, 9, 11... for IASI. These would correspond to channels 16, 38, 49 and 51.
Rcov is the covariance matrix.

The correlated obsmod in the master has logic in various routines to handle the inconsistency in channel indexing between the gfsanal and the gdasanal. The update to the set_routine in correlated_obsmod.F90 build a covariance matrix from the file based on the channels in the satinfo. Because of the way the channel indices are specified in the covariance file (1,5,9,11 rather than 16,38,49,51) there is no way for set_ to know which channel has been removed or added in the satinfo, and won't know which entries in Rcov to use. Fixing this issue will therefore require changing the indices in the covariance file (to 16,38,49,51,...), changing correlated_obsmod.F90 and updating the covariance utility util/Correlated_Obs to write the files correctly.

To make matters more complicated, GMAO does not have this indexing issue in their code, and the logical GMAO_ObsErrorCov will need to be used in set_.

@KristenBathmann
Copy link
Contributor Author

This work is being done in my fork KristenBathmann/GSI, in release/gfsda.v16.x
The util/Correlated_Obs directory in here is very out of date with the master. Care will need to be taken when merging this code into the master.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Oct 15, 2021 via email

@ADCollard
Copy link
Contributor

ADCollard commented Oct 15, 2021

So just so I am clear what you are suggesting, @RussTreadon-NOAA .

Should we

  1. Merge the master into v16.x
  2. Merge 16.x back into the master
  3. Introduce the latest changes that Kristen has introduced today into the master at this point?

I will work on Step number 3 once the other two are complete.

@KristenBathmann
Copy link
Contributor Author

I have put the potential fix into my branch. It works for gdasanal, but I don't have a script to test gfsanal. This will also need to be tested after adding and removing channels from the satinfo, for both gdasanal and gfsanal. You can test by grepping "cond=" in the stdout. It should get smaller when removing channels, and larger when adding, and obviously there should be no warnings about correlated error getting turned off.
This is what you should get when the channel selection is unchanged:
Rcov(Evals) for Instrument: iasi_metop-b:sea cond= 6.8648778493E+01
Rcov(Evals) for Instrument: iasi_metop-b:land cond= 1.5388534575E+02
Rcov(Evals) for Instrument: cris-fsr_n20:sea cond= 3.6394468822E+01

For the updates to the util, I changed how schind is defined in readsatobs.f90, and created an array in here called indRf, which is written to the covariance file in cov_calc.f90. It shouldn't be too difficult to merge this into the master.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Oct 15, 2021 via email

@RussTreadon-NOAA
Copy link
Contributor

My comments are obsolete. Kristen's changes are already in her forked copy of release/gfsda.v16.x as of 33e46e6.

This branch has been cloned, built, and run on Venus. The global_gsi.x runs past the point of previous failure for the 2021101500 gfs case. stdout does not contain the Rcov(Evals) values Kristen mentions in her comment. Instead, stdout contains

Rcov(Evals) for Instrument: iasi_metop-b:sea  cond=      9.1821883788E+01
Rcov(Evals) for Instrument: iasi_metop-b:land  cond=      5.3391529276E+01
Rcov(Evals) for Instrument: cris-fsr_n20:sea  cond=      1.5226357845E+02

This run is using Rcov files taken from the master. Is this the problem?

@KristenBathmann
Copy link
Contributor Author

Please point to these covariance files:
/gpfs/dell2/emc/modeling/noscrub/Kristen.Bathmann/GSI/fix/Rcov*

These are updated with different indices.
I can create a branch from the current master and just merge in the changes to the util/Correlated_Obs directory if that makes things easier. When you merge release/gfsda.v16.x into the master, it would then look identical. But I don't want to actually do a PR for the master right now.

@RussTreadon-NOAA
Copy link
Contributor

Test rerun with Rcov files from /gpfs/dell2/emc/modeling/noscrub/Kristen.Bathmann/GSI/fix/. Now see the following in stdout

Rcov(Evals) for Instrument: iasi_metop-b:sea  cond=      6.8648778493E+01
Rcov(Evals) for Instrument: iasi_metop-b:land  cond=      1.5388534575E+02
Rcov(Evals) for Instrument: cris-fsr_n20:sea  cond=      3.6394468822E+01

The updated Rcov files need to be committed.

@KristenBathmann
Copy link
Contributor Author

I created a branch to reflect what util/Correlated_Obs should look like when release/gfsda.v16.x is merged into the master. It is called v16.x_corrutil

I updated this fix directory with covariance files for IASI-A (I only updated the IASI-B and CrIS previously). These Rcov files do need to committed.

@RussTreadon-NOAA
Copy link
Contributor

Reran the 2021101500 gdas case using Kristen's forked copy of release/gfsda.v16.x as of 33e46e6. Use the updated Rcov files Kristen provided. Also run 2021101500 gdas using authoritative release/gfsda.v16.x at 23abb36 with updated Rcov files from Kristen. All other fix files came from Kristen's release/gfsda.v16.x.

I thought indexing was only a problem in gfs runs. I expected initial J terms to be identical between the two global_gsi.x executables for the 2021101500 gdas case. This is not true. The initial J for radiances differ.

The authoritative gfsda.v16.x has 6.2847632847397670E+05.
Kristen's fork has 8.3162092822719424E+05

@KristenBathmann
Copy link
Contributor Author

Using my updated fork copy requires the updated Rcov files, and this should work for both gdas and gfs. Using the current authoritative release/gfsda.v16.x requires the old Rcov files, but will only work for gdas.

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Oct 15, 2021

So we should expect differences in both gfs and gdas runs when using your updated fork with update Rcov files. This is what I see.

@KristenBathmann
Copy link
Contributor Author

Yes, you should expect the differences that you observed. If you rerun the gdasanal with the authoritative release/gfsda.v16.x and the old Rcov files, the result should be identical to using my fork with the new covariance files.

@RussTreadon-NOAA
Copy link
Contributor

I ran a clean test for 2021101600 gfs and gdas.   Two GSI versions were checked out and installed on Mars:

For all runs fixgsi was set to the fix checked out with NOAA-EMC gfsda.v16.x.   The only exception is that runs using Kristen's gfsda.v16.x used the Rcov files Kristen pointed us at (/gpfs/dell2/emc/modeling/noscrub/Kristen.Bathmann/GSI/fix/Rcov*).

For 2021101600 gfs, the NOAA-EMC global_gsi.x seg faulted.  Kristen's global_gsi.x ran to completion.   This is expected.

For 2021101600 gdas, the NOAA-EMC and Kristen's global_gsi.x both ran to completion.  The numbers printed for the first outer loop Initial cost function and Initial gradient norm are identical for all 19 printed digits.  

NOAA-EMC

Initial cost function =  2.125944993387707043E+06
Initial gradient norm =  5.296622843764786921E+03

Kristen's fork

Initial cost function =  2.125944993387707043E+06
Initial gradient norm =  5.296622843764786921E+03

However, the initial step sizes differs in the last few digits

NOAA-EMC:           2.571934798076587203E+00
Kristen's fork:     2.571934798076594753E+00

Could this reflect different ordering of information in the vectors processed by pcgsoi and routines called from pcgsoi?

At the start of the second outer loop the cost and gradient norm differ, though the leading digits are identical

NOAA-EMC

Initial cost function =  1.529773474892185070E+06
Initial gradient norm =  1.343542338745532106E+03

Kristen's fork

Initial cost function =  1.529773474891685881E+06
Initial gradient norm =  1.343562729594621942E+03 

The final cost and grad numbers are comparable
NOAA-EMC
cost,grad,step,b,step? = 2 150 1.468026390140016563E+06 7.947329347877215788E+00 1.155097915181486412E+00 9.449180669965929757E-01 good

Kristen's fork
cost,grad,step,b,step? = 2 150 1.468026362812372390E+06 7.949759601992846925E+00 1.155496369801310763E+00 9.476025479407522711E-01 good

compare_ncfile.py of the two siginc.nc files shows differences to be relatively small. compare_ncfile.py returns similar differences for sigi03.nc and sigi09.nc. siginc.nc differences are shown below

/gpfs/dell2/ptmp/emc.glopara/tmp766> compare_ncfile.py gfsda.v16.x_gdas.2021101600/siginc.nc kristen_gdas.2021101600/siginc.nc
lon min/max 1=0.0,359.766 min/max 2=0.0,359.766 max abs diff=0.0000000000
lat min/max 1=-89.8207,89.8207 min/max 2=-89.8207,89.8207 max abs diff=0.0000000000
lev min/max 1=1.0,127.0 min/max 2=1.0,127.0 max abs diff=0.0000000000
pfull min/max 1=1.0,127.0 min/max 2=1.0,127.0 max abs diff=0.0000000000
ilev min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
hyai min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
hybi min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
u_inc min/max 1=-39.3891,22.8589 min/max 2=-39.3889,22.8589 max abs diff=0.1614894867
v_inc min/max 1=-33.1067,20.7986 min/max 2=-33.1075,20.7986 max abs diff=0.1975569576
delp_inc min/max 1=-13.2386,15.8491 min/max 2=-13.2379,15.8491 max abs diff=0.0029287338
delz_inc min/max 1=-158.188,147.854 min/max 2=-158.216,147.854 max abs diff=0.6293458939
T_inc min/max 1=-14.6978,13.852 min/max 2=-14.6975,13.852 max abs diff=0.0871682167
sphum_inc min/max 1=-0.0090915,0.00732399 min/max 2=-0.00909151,0.00732399 max abs diff=0.0001566859
liq_wat_inc min/max 1=0.0,0.0 min/max 2=0.0,0.0 max abs diff=0.0000000000
o3mr_inc min/max 1=-2.29021e-06,2.19063e-06 min/max 2=-2.2902e-06,2.19063e-06 max abs diff=0.0000000016
icmr_inc min/max 1=0.0,0.0 min/max 2=0.0,0.0 max abs diff=0.0000000000

Taken as a whole is the behavior we observe expected and acceptable?

@ADCollard
Copy link
Contributor

@RussTreadon-NOAA, it is not immediately apparent to me why we would see a difference in the step size from this change (possibly related to different treatment of passive obs?).

I'd be OK with proceeding, but I'm going to take a couple of hours digging into the code if that is OK?

@RussTreadon-NOAA
Copy link
Contributor

Thank you, @ADCollard for looking at the code. I agree. I did not expect a difference for the gdas case. My guess is that the treatment of passive obs alters the order of information in the vectors which pcgsoi and the routines it calls process. This change in order leads to small roundoff differences in minimization calculations. Everything may be OK but it would be nice to have confirmation that everything is OK.

@ADCollard
Copy link
Contributor

ADCollard commented Oct 18, 2021

@RussTreadon-NOAA There's one difference not from Kristen. John added an extra line to genqsat.f90 that is not in Kristen's branch. And I'm guessing that might be more likely to produce the change we've seen.

Should I just go ahead and add it and then you can rerun?

@RussTreadon-NOAA
Copy link
Contributor

Excellent find, @ADCollard. You solved the mystery! Details matter. I missed this critical detail.

I added John's genqsat.f90 change to my working copy of Kristen's branch. global_gsi.x was rebuilt and the 2021101600 gdas case is being rerun. The job has not finished but it's into the minimization on the second outer loop. The minimization stats from the rerun are identical to the authoritative gfsda.v16.x run for all printed digits.

We are merging Kristen's branch into the authoritative branch. John's change to genqsat.f90 is already in the authoritative branch. I think it is sufficient that you explained the reason for the observed differences. We don't need to commit the John's genqsat.f90 to Kristen's branch.

We need to get Kristen's updated Rcov files into fix (DA_GFSv16.x_global_only). The local copy of Kristen's updated Rcov files are on Venus in /gpfs/dell2/emc/modeling/noscrub/Kristen.Bathmann/GSI/fix/. After we update DA_GFSv16.x_global_only with the new Rcov files, we have two choices. We can associate the updated DA_GFSv16.x_global_only with Kristen's branch and merge to the authoritative repo. We could also directly associate the updated DA_GFSv16.x_global_only with the authoritative gfsda.v16.x.

@RussTreadon-NOAA
Copy link
Contributor

Just to confirm that with the genqsat.f90 change @ADCollard noted, Kristen's release/gfsda.v16.x at 9a41e68 reproduces analysis increments from the authoritative release/gfsda.v16.x at 23abb36 for 2021101600 gdas. For example, here is a compare_ncfile.py difference for siginc.nc:

/gpfs/dell2/ptmp/emc.glopara/tmp766> compare_ncfile.py kristen_gdas.2021101600.update_genqsat/siginc.nc gfsda.v16.x_gdas.2021101600/siginc.nc
lon min/max 1=0.0,359.766 min/max 2=0.0,359.766 max abs diff=0.0000000000
lat min/max 1=-89.8207,89.8207 min/max 2=-89.8207,89.8207 max abs diff=0.0000000000
lev min/max 1=1.0,127.0 min/max 2=1.0,127.0 max abs diff=0.0000000000
pfull min/max 1=1.0,127.0 min/max 2=1.0,127.0 max abs diff=0.0000000000
ilev min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
hyai min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
hybi min/max 1=1.0,128.0 min/max 2=1.0,128.0 max abs diff=0.0000000000
u_inc min/max 1=-39.3891,22.8589 min/max 2=-39.3891,22.8589 max abs diff=0.0000000000
v_inc min/max 1=-33.1067,20.7991 min/max 2=-33.1067,20.7991 max abs diff=0.0000000000
delp_inc min/max 1=-13.2386,15.8484 min/max 2=-13.2386,15.8484 max abs diff=0.0000000000
delz_inc min/max 1=-158.188,147.85 min/max 2=-158.188,147.85 max abs diff=0.0000000000
T_inc min/max 1=-14.6978,13.852 min/max 2=-14.6978,13.852 max abs diff=0.0000000000
sphum_inc min/max 1=-0.0090915,0.00732401 min/max 2=-0.0090915,0.00732401 max abs diff=0.0000000000
liq_wat_inc min/max 1=0.0,0.0 min/max 2=0.0,0.0 max abs diff=0.0000000000
o3mr_inc min/max 1=-2.29021e-06,2.19059e-06 min/max 2=-2.29021e-06,2.19059e-06 max abs diff=0.0000000000
icmr_inc min/max 1=0.0,0.0 min/max 2=0.0,0.0 max abs diff=0.0000000000

@ADCollard
Copy link
Contributor

So I have added the 8 Rcov files to DA_GFSv16.x_global_only (land and sea files for the three IASIs and one each for the two CrISs). Thanks to @RussTreadon-NOAA and @CatherineThomas-NOAA for their patience as I remembered how to do it again!

@RussTreadon-NOAA
Copy link
Contributor

@ADCollard committed Rcov files with revised channel number information to DA_GFSv16.x_global_only at hash 709a1139.

Update NOAA-EMC/GSI release/gfsda.v16.x fix at 2c8b219 to point at 709a1139 DA_GFSv16.x_global_only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants