-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: inflation creating out-of-bounds values qceff #749
Comments
there are some early returns in probit_transform_mod DART/assimilation_code/modules/assimilation/probit_transform_mod.f90 Lines 350 to 353 in 464aa57
|
whole ensemble: before line 559 "i","value" after inflation "i","value" |
from_probit_bounded_normal_rh state_ens gets set to probit_ens not sure why the more_params(7) tail_sd_eft is negative? get_bnrh_sd(p) DART/assimilation_code/modules/assimilation/bnrh_distribution_mod.f90 Lines 91 to 100 in 464aa57
|
The sd == 0 so you never transform into (or back out of) probit space. Inflation pushes the
|
note there is issue #748 which is a different inflation QCEFF bug which looks like it may create out-of-bounds values also. |
This bug impacted the application of inflation, but it was potentially much more insidious. As noted by Helen, the problem occured when the bnrh transform was unable to complete because the ensemble standard deviation was not positive. This same problem can arise for all of the other places in assim_tools_mod.f90 where transforms are made. The most straightforward solution begins by having an error return from the probit transform. Only the bnrh can fail so all other types of distributions always complete and do not need to be considered. The original code design assumed that if the standard deviation was not positive, that all ensemble members must have the same value. However, as noted in Helen's examples, if the differences between ensemble members are very small, computational precision limits can result in a computed standard deviation that is not positive. The assumption for the inflation and the other parts of assim_tools was that any distribution with all members identical could not be changed so it didn't matter if the failure of the transform was just ignored. Note that the numerical analysis is very complex for the case where the computed SD for the bnrh transform is very small but non-zero. A number of tests suggest that this does not have inappropriate results, but I would not be shocked if something bad could happen in unusual cases. |
Moha and Jeff did a code review of the proposed bug fix which led to one additional clarification to the code. The Lorenz_06_tracer tests seem to be okay. This version has been pushed and is now ready for testing with the CAM reanalysis code by Kevin. edit: change is 1f2b7e1 |
note for performance later, DART/assimilation_code/modules/assimilation/probit_transform_mod.f90 Lines 382 to 389 in 1f2b7e1
check on ierr, and set again here: not needed? DART/assimilation_code/modules/assimilation/probit_transform_mod.f90 Lines 91 to 100 in 1f2b7e1
|
I've tidied my local branch, in which I set up the previous BNRH tests. Then we need to define the test. Are there other things to check? |
In the long-term, we need a case where we can see that things fail without
the fix, and don't fail with it. I don't know if you were literally failing
every time with the original code or not, so not sure if it is sufficient
to run a single test from your existing restart.
In other words, identify a case where original code fails when
fix_bound_violations is off.
Then run with the new code and see if it still fails.
…On Mon, Dec 30, 2024 at 3:22 PM kdraeder ***@***.***> wrote:
I've tidied my local branch, in which I set up the previous BNRH tests.
Then I set it to track the qceff_inflation_fix branch by fetching the _fix
branch and rebasing to it.
Lots of assimilation_code/modules/assimilation files were updated,
so I'm assuming that they are the new ones. I'll check, if anyone thinks
that's necessary.
Then we need to define the test.
I could continue the previous test because all of the restarts, etc.
(2020-02-01-00000)
seem to be in the RUNDIR, but with 2 changes; rebuilding filter and
turning off the fix_bound_violations.
The alternative is to set up a new case and start it from the beginning,
where we first saw the negative values.
This would also have fix_bound_violations turned off.
I haven't checked the ease of access to those restart files.
Are there other things to check?
Diagnostics to turn on ?
—
Reply to this email directly, view it on GitHub
<#749 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDHUIXVIIMJ5IYFFA6RZHT2IHBQ3AVCNFSM6AAAAABPJ7LLQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVHE3DOOBTHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
My notes say that it failed during the first assimilation for ~26,000 variables while processing I could run one cycle of the existing case with the old filter and the namelist fix turned off. |
Using the old filter with the namelist negative fixer turned off led to a failure somewhere in the first 10,000 obs: Replacing filter with one compiled from the latest qceff_inflation_fix branch resulted in a successful assimilation, which used the same set of state input files as the failed assimilation. |
I set up a new experiment to start from the same date as the assimilation which successfully used |
@hkershaw-brown wrote in email: So I'm on the same page here, this test is comparing 2 versions of a given code (Kevin's ~DART/Manhattan_git) KDM = Kevin's ~DART/Manhattan_git code pre Jeff's qceff_inflation_fix commits Where qceff_inflation_fix commits are some code that records true/false whether a transform to probit is successful. If false, the state elements/fwd ops are not part of the assimilation. The tests run are: A. KDM fix_bound_violations = .true. Pass: Fail for this test would be From this we learn:
The test is using a different version of cam model_mod to what we have in main. This is not relevant for running tests? ie. we can use main cam-fv model_mod to test with when we change qceff_inflation_fix to bring it into main. /glade/~raeder/DART/Manhattan_git(qceff_cam-fv)$ gitdiff --name-only \
@kdraeder responded: KDM has code for ignoring obs near the surface, but it was turned off via namelist for all of these experiments. |
🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.
Describe the bug
List the steps someone needs to take to reproduce the bug.
Run CAM-FV reanalysis single run of filter example /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run
inf_flavor 2
What was the expected outcome?
Run to completion
What actually happened?
filter errors out:
Error Message
Please provide any error messages.
Which model(s) are you working with?
CAM-FV
Screenshots
Here are the lines of code:
DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90
Lines 559 to 560 in 464aa57
Version of DART
Which version of DART are you using?
v11.8.1
Have you modified the DART code?
No
Here is a small reproducer
Or you can run CAM-FV filter
/glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run
The numbers for the test_maths.f90 program are from proc3 (out of 128) j==61138
Build information
Please describe:
also mac gfortran GNU Fortran (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0
The text was updated successfully, but these errors were encountered: