memory leak in MOM6 #764

junwang-noaa · 2021-08-23T13:27:00Z

Description

The high resolution C384 coupled test failed due to the memory leak issue. The log files showed that MOM6 has memory leak issue.

20210819 210631.993 INFO PET312 Leaving MOM update_ocean_model: - MemInfo: VmPeak: 2112852 kB
...
20210819 211322.476 INFO PET312 Leaving MOM update_ocean_model: - MemInfo: VmPeak: 2112852 kB
20210819 211405.081 INFO PET312 Leaving MOM update_ocean_model: - MemInfo: VmPeak: 2611544 kB
...

To Reproduce:

Give explicit steps to reproduce the behavior.

1. Check out ufs-weather-model develop branch and turn on the memory profile check in tests/parm/nems.configure.cpld.IN by:
@@ -29,7 +29,7 @@ OCN_petlist_bounds:             @[ocn_petlist_bounds]
 OCN_attributes::
   Verbosity = 0
   DumpFields = false
-  ProfileMemory = false
+  ProfileMemory = true

run coupled test cpld_bmark_wave_v16_p7b.
Check the MOM PET files for memory info

Additional context

Some related discussion in issue #746:

I checked Jessica's run directory, just to confirm that the memory increase
is reduced, it has a ~2% memory increase just after 14 days, then memory
stays unchanged, just like the previous run without CA. MOM6 memory
increases from 3660532 kB to 4217332 kB, the increases only happen when
time steps are multiple of 12 (12,24,36, 60, 228...)

@junwang-noaa I tested the latest 3 commit of MOM6 in ufs, all of them have memory leak issue. Below is from Marshall Ward:

**I have started doing more aggressive memory checking, and recently
fixed many of them, but we know of a few that are not yet fixed.

Nearly all of the leaks are because we do not properly call the
MOM_end_*() functions during the finalization, so do not normally
affect the model during the run.

We are planning to enable valgrind testing once we've fixed all the
known leaks, but this is on hold until we finish up some other
projects.**

The text was updated successfully, but these errors were encountered:

DeniseWorthen · 2021-08-30T16:11:57Z

See associated Disscusion #779

arunchawla-NOAA · 2022-03-09T23:04:33Z

@junwang-noaa @DeniseWorthen @JessicaMeixner-NOAA @jiandewang is there an update on this ? Is there development happening on this at the GFDL end ?

jiandewang · 2022-03-09T23:40:38Z

Marshall is aware of this and it's on his to do list. In the discussion #799 it can be seen clearly that the minor memory leak is directly related to the using FMS module to write model output which Marshall believes it's the main cause.

DeniseWorthen · 2023-01-17T21:24:52Z

@jiandewang Have you heard anything recently about this issue? Earlier Marshall mentions that it mostly is an issue w/ the MOM_end_*() functions. Later you mention FMS and model output as the main culprit. I'm a little confused which thing is suspected as the cause of the memory leak.

marshallward · 2023-01-18T13:54:08Z

It's been a long time since I looked at this. I did not find any significant loss of memory in MOM6, other than memory which was not deallocated at cleanup (i.e. MOM_end_*). Realistically, this is not going to have much impact on any simulations.

There were some memory holes coming from FMS, but these were in the FMS1 I/O and may have been fixed in the FMS2 I/O. Even then, I think we're talking something like O(10M) per rank, which is not huge.

However, this was from our benchmark tests, and may not reflect production runs UFS runs. There could be some untested components in MOM6 which have poor memory usage.

DeniseWorthen · 2023-01-18T14:32:26Z

@marshallward Thanks for the update. We're trying to clean up/close old issues which led me to ask the status.

@jiandewang Looking at the discussion #779, maybe it could you repeat the test you did previously and report the current status? At that point we can make a decision whether to close the issue.

jiandewang · 2023-01-22T02:45:33Z

@marshallward Thanks for the update. We're trying to clean up/close old issues which led me to ask the status.

@jiandewang Looking at the discussion #779, maybe it could you repeat the test you did previously and report the current status? At that point we can make a decision whether to close the issue.

@DeniseWorthen sure I will repeat our previous test (sorry for the delayed response as I was out of town last week)

jiandewang · 2023-01-28T04:09:27Z

@DeniseWorthen I checkout the ufs-weather-model code (Jan 23, 2023 commit, hash # 70de7ef) and repeated what I did before by running C96 coupled with 1x1 ocean and ice for 10 days. Overall I don't see memory leak issue. This figure is from two randomly selected PE memory vmpeak values:

this one is for all PE (percentage of vmpeak value wrt. it's 2nd step value)

I think we can close this issue.

junwang-noaa · 2023-01-28T16:56:42Z

@jiandewang Thanks for the results. I will close the issue.

* Bug fix with FIELD_TABLE_FN * Modify crontab management, use config_defaults.sh. * Add status badge. * Update cheyenne crontab management. * source lmod-setup * Add main to set_predef_grid * Bug fix in predef_grid * Don't import dead params. * Fix bug in resetting VERBOSE * Minor fix in INI config. * Construct var_defns components from dictionary. * Allow also lower case variables to be exported. * Updates to python workflow due to PR #776 * Use python versions of link_fix and set_FV3_sfc in job script. * Use python versions of create_diag/model. * Some fixes addressing Christina's suggestions. * Delete shell workflow * Append pid to temp files. * Update scripts to work with the latest hashes of UFS_UTILS and UPP (#775) * update input namelist of chgres_cube * update diag_table templates * update scripts * back to original * specify miniconda version on Jet * Remove -S option from link_fix call. * Fixes due to merge * Cosmoetic changes. Co-authored-by: Chan-Hoo.Jeon-NOAA <60152248+chan-hoo@users.noreply.github.com>

junwang-noaa added the bug Something isn't working label Aug 23, 2021

junwang-noaa closed this as completed Jan 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak in MOM6 #764

memory leak in MOM6 #764

junwang-noaa commented Aug 23, 2021

DeniseWorthen commented Aug 30, 2021 •

edited

Loading

arunchawla-NOAA commented Mar 9, 2022

jiandewang commented Mar 9, 2022

DeniseWorthen commented Jan 17, 2023

marshallward commented Jan 18, 2023

DeniseWorthen commented Jan 18, 2023

jiandewang commented Jan 22, 2023

jiandewang commented Jan 28, 2023

junwang-noaa commented Jan 28, 2023

memory leak in MOM6 #764

memory leak in MOM6 #764

Comments

junwang-noaa commented Aug 23, 2021

Description

To Reproduce:

Additional context

DeniseWorthen commented Aug 30, 2021 • edited Loading

arunchawla-NOAA commented Mar 9, 2022

jiandewang commented Mar 9, 2022

DeniseWorthen commented Jan 17, 2023

marshallward commented Jan 18, 2023

DeniseWorthen commented Jan 18, 2023

jiandewang commented Jan 22, 2023

jiandewang commented Jan 28, 2023

junwang-noaa commented Jan 28, 2023

DeniseWorthen commented Aug 30, 2021 •

edited

Loading