Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute unablatedVolumeDynCell among neighboring cells in li_apply_front_ablation_velocity #27

Conversation

trhille
Copy link

@trhille trhille commented Apr 11, 2022

If a cell is left with unablated volume after step 2c in li_apply_front_ablation_velocity, distribute that volume among its dynamic neighbors to prevent large errors in the ablated volume. This can be turned on or off using the config_distribute_unablatedVolumeDynCell namelist option. Testing with von Mises calving on the MISMIP+ domain indicates that this greatly reduces inaccuracies in the calving routine, but that very large calving events can still trigger the >10% calving inaccuracy error. However, this should only be a secondary means of getting accurate calving results; the primary means should be by using an appropriately small timestep determined by the ablation velocity in a CFL-like condition.

@trhille trhille requested a review from matthewhoffman April 11, 2022 21:34
@trhille trhille changed the title Distribute unablatedVolumeDynCell among neighboring cells Distribute unablatedVolumeDynCell among neighboring cells in li_apply_front_ablation_velocity Apr 11, 2022
@trhille
Copy link
Author

trhille commented Apr 11, 2022

Test with config_velocity_solver = 'none', von Mises threshold stresses of 35 kPa on spun up MISMIP+ domain, with specified uReconstructY = 0.0 and uReconstructX = xCell / 1.0e10 (which gives a similar velocity to diagnostic solve at calving front.):
image
image
image
image
image

@trhille
Copy link
Author

trhille commented Apr 11, 2022

Testing with config_velocity_solver = 'FO' on the same domain (without specified velocities), von Mises thresholds of 35 kPa. Using this branch, it ran 138 timesteps (60 years) and printed 50 WARNING: Failed to ablate messages before the error exceeded 10%. Using an exe compiled on the head of MALI-Dev/develop, it ran for 58 timesteps (30 years), with 55 WARNING: Failed to ablate messages before breaking the 10% threshold.

In the screenshots below, the top panel is from the run using this branch and bottom is from MALI-Dev/develop:
image

image

image

Continuing run on this branch after the MALI-Dev/develop run failed:
image
image

From globalStats.nc, blue is control run with no calving, orange is using MALI-Dev/develop, green is this branch:
image

@trhille
Copy link
Author

trhille commented Apr 11, 2022

Testing the effect of config_max_adaptive_timestep. I ran the same cases (35 kPa von Mises calving threshold, active velocity solver, MISMIP+ domain), setting config_max_adaptive_timestep to 20, 60, 90, 135, 180, 225, and 270 days on both this branch and MALI-dev/develop.

270 day max timestep:

  • develop: 58 timesteps, 30.5 years, 55 calving warning messages before encountering >10% error
  • this branch: 78 timesteps, 40.5 years, 13 calving warning messages (all ≤1%) before timing out after 20 minutes without encountering error

225 day max timestep (same as above because CFL condition limits timesteps to ~230–240 days anyway):

  • develop: 58 timesteps, 30.5 years, 55 calving warning messages before encountering >10% error
  • this branch: 78 timesteps, 40.5 years, 13 calving warning messages (all ≤1%) before timing out after 20 minutes without encountering error

180 day max timestep:

  • develop: 72 timesteps, 34 years, 68 calving warning messages before encountering >10% error
  • this branch: 147 timesteps, 61 years, 16 calving warning messages in first run (37 years), 41 in restart run before encountering >10% error

135 day max timestep:

  • develop: 93 timesteps, 32 years, 29 calving warning messages, timed out without encountering >10% error
  • this branch: 96 timesteps, 34 years, 0 calving warning messages!

Maximum timesteps ≤90 days encountered no "Failed to ablate" warnings on either branch.

The inaccuracies reported in "Failed to ablate" warnings on this branch are in general much smaller than on MALI-Dev/develop. Here is an example from the 225 day maximum timestep runs:
develop:

WARNING: Failed to ablate 1132203544.43164 m^3. (4.96332500482132% of total ablated)
WARNING: Failed to ablate 679729107.805054 m^3. (3.15101675364751% of total ablated)
WARNING: Failed to ablate 811104868.653835 m^3. (3.31285856306982% of total ablated)
WARNING: Failed to ablate 593089307.984791 m^3. (2.57389924800550% of total ablated)
WARNING: Failed to ablate 673804384.607821 m^3. (3.12230639039168% of total ablated)
WARNING: Failed to ablate 693584162.405356 m^3. (3.24238217610743% of total ablated)
WARNING: Failed to ablate 1189331102.35431 m^3. (4.99972340399423% of total ablated)
WARNING: Failed to ablate 1138332931.34200 m^3. (5.54177611788499% of total ablated)
WARNING: Failed to ablate 670046841.174305 m^3. (2.93710958928141% of total ablated)
WARNING: Failed to ablate 562882706.343968 m^3. (2.92782523156518% of total ablated)
WARNING: Failed to ablate 498847016.773593 m^3. (2.19613816307795% of total ablated)
...

(there are many more after that)

this branch:

WARNING: Failed to ablate 164034753.000446 m^3. (0.618725788900477% of total ablated)
WARNING: Failed to ablate 30356933.4819427 m^3. (0.130161725498177% of total ablated)
WARNING: Failed to ablate 222803197.534090 m^3. (0.948566809693993% of total ablated)
WARNING: Failed to ablate 69359409.6867872 m^3. (0.276013507176517% of total ablated)
WARNING: Failed to ablate 36886614.6635770 m^3. (0.180693019528107% of total ablated)
WARNING: Failed to ablate 239918072.169899 m^3. (0.969607933074530% of total ablated)
WARNING: Failed to ablate 150152352.052352 m^3. (0.604933657467073% of total ablated)
WARNING: Failed to ablate 182693958.985495 m^3. (0.744171728301439% of total ablated)
WARNING: Failed to ablate 242267572.676722 m^3. (1.00341422402982% of total ablated)
WARNING: Failed to ablate 53920015.0482887 m^3. (0.199726660789424% of total ablated)
WARNING: Failed to ablate 101644624.971015 m^3. (0.362588836149430% of total ablated)
WARNING: Failed to ablate 117852747.367989 m^3. (0.433885293070342% of total ablated)
WARNING: Failed to ablate 272720231.707401 m^3. (1.00508944740144% of total ablated)

(These are all the warnings after 40.5 years in 79 timesteps)

As an example, here are global stats plots for 225-day, 135-day, and 90-day timestep cases using both branches.
Blue: 225-day; develop
Orange: 225 day, this branch
green: 135 day, develop
red: 135 day, this branch
purple: 90 day, develop
brown: 90 day, this branch
The curves for the two 90 day runs are indistinguishable upon zooming in.
image

@trhille
Copy link
Author

trhille commented Apr 12, 2022

I tested the Humboldt_1to10km_r04_20210615.nc domain on Badger on 5 nodes (180 procs) and 2 nodes (72 procs) and compared with the same run from my Humboldt paper. This is using q=1/5, von Mises threshold of 160 kPa, and the 20 m/yr subshelf melt tuning.
Below, Green is the run on Cori that went into the Humboldt paper; orange is the run using 2 nodes and this branch on Badger; blue is using 5 nodes and this branch on Badger. The 5 node run did encounter the >10% calving error in 2093, while the 2 node run did not. The previous run from the Humboldt paper first encountered this error in 2078, judging from config_max_adaptive_timestep settings in restart files, so this is still an improvement.

image

The results on 2 and 5 nodes are not bit-for-bit, but this is using config_velocity_solver = 'FO', so it could result from Albany not being BFB across different numbers of processors. I will test this further with imposed velocity fields on the MISMIP domain.

@matthewhoffman
Copy link

@trhille , recap of our discussion today about this PR:

  • We could consider adding an iteration look for this new functionality that can be namelist-configurable to allow for 0 iterations (old behavior), 1 iteration (current behavior ), or many iterations.
  • Ideally we would make this behavior work BFB on different decompositions and/or global indexing order.

Copy link

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some minor suggestions. I don't think they will significantly affect the behavior.

do iEdge = 1, nEdgesOnCell(iCell)
iNeighbor = cellsOnCell(iEdge, iCell)
if ((li_mask_is_dynamic_ice(cellMask(iNeighbor))) .and. (bedTopography(iNeighbor) <= config_sea_level) &
.and. (cellVolume(iNeighbor) > 0.0_RKIND) ) then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have a tolerance than using 0.0 here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what that tolerance should be. Should it be something like 1 m thickness * areaCell(iNeighbor), some fraction of unablatedVolumeDynCell(iCell), or something else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining an eps or tol variable with a value of 1.0e-6 or something. The goal being just to catch round off level errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in commit 9202029.

@trhille
Copy link
Author

trhille commented Apr 12, 2022

Still testing on commit b92225d before addressing @matthewhoffman's comments. I tested the MISMIP case with the imposed velocity field as before, using 36 and 72 procs. These are close, but not bit-for-bit, indicating that more work is needed. but there is an unsettling feature in the calving flux:
image
zoomed in:
image

Both the restart file output interval and config_adaptive_timestep_force_interval are set to years, which is the interval at which those negative excursions occur regularly in the calvingFlux plot. But neither of those should effect this. Why is this happening? Will test further by changing config_adaptive_timestep_force_interval and see if this tracks it, then likewise with restart interval.

@trhille
Copy link
Author

trhille commented Apr 12, 2022

Reducing config_adaptive_timestep_force_interval to 5 years instead of 10 also causes those negative spikes to occur every five years (here using 36 procs):
image

However, this also happens when using MALI-Dev/develop, so this is not due to this PR. I will open up an issue and this can be addressed in another PR. This is probably just a symptom of the calving being timestep dependent, as we've seen before.

@trhille
Copy link
Author

trhille commented Apr 13, 2022

These same runs using MALI-Dev/develop with velocity solver turned off on 36 and 72 procs are also not bit-for-bit, so that issue does not seem to stem from this PR:
image

@trhille
Copy link
Author

trhille commented Apr 13, 2022

I tested this using @matthewhoffman's calving test suite in MPAS-Dev/compass#318 and rebasing onto the branch in PR #28. The changes here give bit-for-bit results across different numbers of processors when the changes in PR #28 are included:

Test Runtimes:
01:18 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-none
01:05 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-floating
01:04 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-eigencalving
01:04 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-specified_calving_velocity
01:03 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-von_mises_stress
01:04 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-damagecalving
01:05 FAIL landice_humboldt_mesh-default_decomposition_test_velo-none_calving-ismip6_retreat
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-none
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-floating
00:21 FAIL landice_humboldt_mesh-default_restart_test_velo-none_calving-eigencalving
00:24 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-specified_calving_velocity
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-von_mises_stress
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-damagecalving
00:24 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-ismip6_retreat
Total runtime 10:27
FAIL: 2 tests failed, see above.

For reference, here are the test results using MALI-Dev/develop:

Test Runtimes:
01:01 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-none
01:02 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-floating
01:02 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-eigencalving
01:02 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-specified_calving_velocity
01:02 FAIL landice_humboldt_mesh-default_decomposition_test_velo-none_calving-von_mises_stress
01:02 PASS landice_humboldt_mesh-default_decomposition_test_velo-none_calving-damagecalving
01:03 FAIL landice_humboldt_mesh-default_decomposition_test_velo-none_calving-ismip6_retreat
00:22 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-none
00:22 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-floating
00:20 FAIL landice_humboldt_mesh-default_restart_test_velo-none_calving-eigencalving
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-specified_calving_velocity
00:22 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-von_mises_stress
00:22 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-damagecalving
00:23 PASS landice_humboldt_mesh-default_restart_test_velo-none_calving-ismip6_retreat
Total runtime 09:48
FAIL: 3 tests failed, see above.

Note that I had to change config_calving_eigencalving_parameter_scalar_value from 3.14e16 to 3.14e14 to avoid the >10% failed ablation error, but that error still occurs in the restart test. This is strange, but not within the scope of this PR.

trhille added 2 commits April 18, 2022 14:45
If a cell is left with unablated volume after step 2c in li_apply_front_ablation_velocity,
distribute that volume among its dynamic neighbors. Testing with von Mises calving on the MISMIP+
domain indicates that this greatly reduces inaccuracies in the calving routine, but that very large
calving events can still trigger the >10% calving inaccuracy error. This could potentially
be alleviated by adding a do-while loop over the new section 3 in li_apply_front_ablation_velocity,
but we may want to control for extremely large calving events anyway.
…on in step 3

Require unablatedVolumeDynCell(iCell))>0.0_RKIND when applying ablation in step 3
in li_apply_front_ablation_velocity to avoid doing extra calculations where they
do not apply.
@trhille trhille force-pushed the distribute_unablatedVolume_to_neighboring_cells branch from d8d4de3 to a4cfdad Compare April 18, 2022 20:45
@trhille
Copy link
Author

trhille commented Apr 18, 2022

Force-push from d8d4de3 to a4cfdad is a rebase onto MALI-Dev/develop at commit 8df83d9

Use a tolerance consistent with round-off level ice thickness in
li_apply_front_ablation_velocity instead of requiring cellVolume > 0.0.
@trhille trhille force-pushed the distribute_unablatedVolume_to_neighboring_cells branch from 4bb2fda to 9202029 Compare April 20, 2022 19:57
Add a namelist option to turn distribution of unablatedVolumeDynCell
to neighboring dynamic cells on or off. Default is to have this feature
enabled, but it may be desirable to have it disabled to ensure that results
are not strongly dependent on this ad hoc cleanup, and that the primary
control on the accuracy of calving results comes from the length of
the timestep (i.e., that the code is not violating a CFL-like
condition based on ablation velocity).
@matthewhoffman
Copy link

I rested full_integration suite against current baseline, and all tests pass:

Test Runtimes:
00:09 PASS landice_dome_2000m_sia_restart_test
00:06 PASS landice_dome_2000m_sia_decomposition_test
00:07 PASS landice_dome_variable_resolution_sia_restart_test
00:04 PASS landice_dome_variable_resolution_sia_decomposition_test
00:31 PASS landice_enthalpy_benchmark_A
00:12 PASS landice_eismint2_decomposition_test
00:12 PASS landice_eismint2_enthalpy_decomposition_test
00:13 PASS landice_eismint2_restart_test
00:13 PASS landice_eismint2_enthalpy_restart_test
00:16 PASS landice_greenland_sia_restart_test
00:09 PASS landice_greenland_sia_decomposition_test
00:08 PASS landice_hydro_radial_restart_test
00:06 PASS landice_hydro_radial_decomposition_test
00:17 PASS landice_dome_2000m_fo_decomposition_test
00:15 PASS landice_dome_2000m_fo_restart_test
00:10 PASS landice_dome_variable_resolution_fo_decomposition_test
00:11 PASS landice_dome_variable_resolution_fo_restart_test
00:13 PASS landice_circular_shelf_decomposition_test
00:40 PASS landice_greenland_fo_decomposition_test
00:41 PASS landice_greenland_fo_restart_test
00:32 PASS landice_thwaites_decomposition_test
00:43 PASS landice_thwaites_restart_test
00:50 PASS landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 07:01
PASS: All passed successfully!

I also tested the humboldt_calving_tests suite (without a baseline), and all tests pass except the one that was already failing:

Test Runtimes:
00:11 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none
00:12 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-floating
00:11 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-eigencalving
00:11 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-specified_calving_velocity
00:12 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-von_mises_stress
00:11 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-damagecalving
00:12 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-ismip6_retreat
00:12 FAIL landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-von_mises_stress_damage-threshold_faceMelting
00:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none
00:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-floating
00:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-eigencalving
00:18 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-specified_calving_velocity
00:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-von_mises_stress
00:17 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-damagecalving
00:18 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-ismip6_retreat
00:18 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 03:52
FAIL: 1 test failed, see above.

It is likely that none of these tests exercise the new code, because if they did, they probably would have been generating runtime errors before this. That said, I confirm this code does no harm.

@trhille's initial testing above on custom cases where the calving inaccuracy occurs look solid. More careful testing will be done in conjunction with testing for #33 . Any issues that come up can be revisited then (including if we want this on by default).

@matthewhoffman matthewhoffman merged commit 6c06533 into MALI-Dev:develop Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants