-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results should not be sensitive to choice of budget decomposition #18
Comments
@BrodiePearson this is very odd indeed. can you post your namelist and the configuration parameters for the init step? It seems highly unlikely this is truncation error. If you can share the setup I'll try run this. |
@vanroekel I've attached the folder that I'm using. It should be identical to the one that you originally sent me, with modifications to work for the new set of constants. I hadn't created any other examples yet, so I would be interested to know if these differences are (a) an error on my part, or (b) only specific to this case. |
@BrodiePearson and @vanroekel ...at least not for the default namelist parameters I am using. I am now going to test using the namelist parameters Brodie is using and let you know what I get... |
Thanks @katsmith133 a few questions
|
|
In my testing I see the same as @BrodiePearson even with |
@vanroekel @katsmith133 Thanks for both taking a look. I followed Kat's lead and found that her setup gave me very small differences in the final w2 profiles (~1e-21; we might expect this from small truncation differences between combining the terms or calculating them separately right?). It looks to me like On Lukes point, certain combinations of these parameters seem to cause crashes (I think Kat ran into this issue previously), I tried different combinations and found that I could disable Cmom/Ctherm as long as I switched splatting on. In this case, with all other parameters using Kats settings, the differences are ~1e-9 |
Hey all, just an update. If I do |
Hmmm. I just looked at the w2tend6 (splat) term and noticed there is a typo in the allocation code:
and it is not deallocated:
Could that be the problem? |
Hah, just found that too! I'm testing if this is the problem. The deallocate won't matter, but the other might. I'm a bit doubtful though given it only influences splat, but I'll let you know. |
unfortunately that wasn't it. Did not change my results. Important to fix though. Thanks for pointing that out. |
@BrodiePearson and @vanroekel I get a substantial difference between keeping them separated and combining them (this is after 19 hours): |
I have been looking at this some more, and did find something interesting. The nonBFB nature seems to stem initially from the wumd calculation I'm not sure what the solution is, but perhaps we should do some rounding to the 8th or 10th decimal perhaps? It makes me wonder if there is a different bug hanging around. |
@vanroekel when you say that you remove the sqrt calculation, do you mean you've replaced the sqrt function with ()**0.5 or just removed the square-rooting all together? I found that if I replaced the sqrt function with ()**0.5, I still get the same issue... meaning it's not a rounding issue inherent to the sqrt function. |
yes exactly right, remove the sqrt function. The diffs all go away. But It shouldn't work for long as the units are incorrect. I only ran for 1 minute, i'm not surprised it crashes quickly. |
Okay, I "may" have a good solution for this issue. I'm still working on a few more tests. Once I do those and clean up I'll write more about what I did and post a fix. But in quick summary it is related to the loop around w2t and w2s assignment. My reproducer is fixed with my changes, just trying to understand what change fixed things. |
Well I found the critical switch to "fix" this, but I don't like it... If you change the sign of this line The problem is fixed. This makes no sense to me. Could one of you try change that sign too and see if the problem disappears? I'd appreciate any thoughts you have on possible causes! |
@vanroekel I changed the sign of that term and the issue did not disappear for me. Not sure if that is because of different parameters or not, but I still see a difference. Not sure if this helps, but I also tried just setting that term to zero, as the LES for the cooling case implied w2t should be zero at the surface, and the problem still occurs. |
Nice find! I won't be able to test this until I'm back home this afternoon. I think this change makes sense physically (but I don't understand why it caused this issue). If we assume a basic down-gradient formulation for this term: The code defines this third-order term as
while wt(k,1,iCell) = wtsfc(iCell) and wstar>0 . This means that for a positive surface wT term, you need a negative sign here to make |
@katsmith133 interesting. I wonder what else I changed. It is definitely gone for me. But maybe I don't run long enough? How long until you check? I also have a tone of other changes I made to try fix this, perhaps it is the sum of all of them. @BrodiePearson even if it makes physical sense to change the sign, I'm very concerned that in my code/test at least having w2t less than some critical value introduce weird sensitivities. It doesn't seem this should happen even if the sign is not technically right. |
@vanroekel I checked at 12 hours. The differences at 1 hour are pretty small (1e-24), but by 12 hours its on the order of 1e-7 for u2, v2, and w2 and 5e-3 for temperature. I'm going to just check again, just to make sure... |
Okay, that could be it. But I will say with the sign as is the diff in w2 was that small you show (at 1 hour), but gone (hard zero) if I flip the sign. I'm going to keep on with these tests and hope that if I can get rid of that diff the longer term results will be the same too, we'll see! |
@vanroekel Sounds good. FYI, I have the changes in #19 implemented in my code. I don't think that should influence this, as my tests showed those changes had no effect on this issue, but just so you know. Also, I ran Valgrind on the code and there are lots of memory leaks throughout MPAS-O, mostly to do with not deallocating pools I think, but nothing specifically in the ADC code was throwing errors (besides not deallocating the pools associated with the ADC code, but I don't know how to do this in the code). TNote: this is after I implemented the changes from #19 (as that adds deallocation of any variables that were not and fixes an out of bounds indexing issue). |
@katsmith133 thanks for the update and thanks for checking valgrind, that's very good to know. I've been continuing to play with this and unfortunately haven't made progress. I need to put this aside for a bit, but have pushed my test branch to a new branch -- One other, slightly contradictory bit of evidence, when I moved w2 dissipation from implicit to explicit it changed the behavior of the model with regards to adding the two terms in w2tend or not. |
@vanroekel Yup, I'll keep playing with it |
@katsmith133 and @BrodiePearson an update: it turns out the closure is sensitive to combining terms in w3, w2, or wt, the pattern of difference is interestingly the same for w3 and wt. I've also tried setting things constant in the vertical to find the issue. The only thing that fixes it is setting w2 to a constant in time and space. Not clear what any of this means yet, but still poking at it |
@vanroekel and @BrodiePearson I am attempting to go back to different commits to see when this issue began. I'm thinking that might give us an idea of what's causing it...? |
@katsmith133 If you can figure out a commit that would be helpful! Although I am worried this issue may exist from the beginning/very early on (comparing different formulations of the budgets is not a standard test for the code) |
@katsmith133 I fully agree with @BrodiePearson it would be wonderful to pin this to a commit, but worry greatly this issue has been around forever. I never would have thought to test this to even add a test, it is so weird... |
@vanroekel and @BrodiePearson Updates:
Not super helpful updates, but just crossing off things that have been tried so we don't overlap efforts. |
Thanks for the update @katsmith133 it is indeed good to cross these off the list. I agree it is not beneficial to keep going backward in commits. I'm still working on this too, don't have anything definitive yet, but think I am making progress. |
@katsmith133 and @BrodiePearson I've added the rounding and it fixes the issue in my very short tests. Can you pick up the ocean/adcTestBranch and try it in your longer tests? A few notes
Hopefully this works. Thinking on this more, I actually think the idea from amrapalli is the best long term solution potentially as it could be a more elegant way forward by converting all velocity units to cm/s we could potentially push past the issue without having to guess where to truncate. Either way let me know how it goes and especially if you find an issue. |
@vanroekel and @BrodiePearson Ok, after some testing, I've found that this solution works so long as you set the rounding to the correct decimal. Using 1e16 saves too many digits and the garbage digits end up amplifying, making the problem worse than before. Using 1e12 instead produces very similar profiles to before the rounding, but also produces BFB matching when tendency terms are kept seperate or combined. This has only been tested for the 100W cooling case with |
I think the results of the ADC closure are quite sensitive to how we decompose the tendency equations! (possibly due to rounding errors?)
I was playing around with the w2 budget, and found that simply combining two tendency terms led to significant differences in final simulated profiles. In this case I ran two simulations where I either left the return to isotropy and buoyancy terms separate:
MPAS-Model/src/core_ocean/shared/mpas_ocn_adcReconstruct.F
Lines 571 to 573 in 103e68e
or I combined them into one term (which theoretically should be an identical system of equations):
w2tend3(k,iCell) = tauvVel(k,iCell)*(u2(i1,k,iCell) + v2(i1,k,iCell))/3.0_RKIND + &
(2.0_RKIND - 4.0_RKIND/3.0_RKIND*C_b)*Mc(k,iCell)* &
(grav*alphaT(k,iCell)*tumd(k,iCell) - grav*betaS(k,iCell)*sumd(k,iCell))
w2tend4(k,iCell) = 0.0_RKIND*(2.0_RKIND - 4.0_RKIND/3.0_RKIND*C_b)*Mc(k,iCell)* &
(grav*alphaT(k,iCell)*tumd(k,iCell) - grav*betaS(k,iCell)*sumd(k,iCell))
This figure shows the final profiles of w2 (left) after 24 hours of the conv2 setup using either split (standard) terms or the combined tendency terms. It also shows the sum
w2tend3 + w2tend4
(middle) and the individual terms (right).This apparent sensitivity of w2 profiles to equation formulation is not good, so it would be great if someone could do a similar test and see if these differences are real/reproducible.
The maximum difference of total tendency terms (middle panel) between the two simulations grows from 1e-23 on the first time step to 4e-8 after 24 hours. Perhaps this is a truncation error that grows - although it is surprising that it produces such a different w2 profile given that the model doesn't have the chaos of a truly turbulent flow.
The text was updated successfully, but these errors were encountered: