memory problem loading long list of daily model output #703

sjvg · 2019-12-18T10:51:58Z

Hello,

I'm fairly new to parcels, so apologies if such error has been dealt before, but I could not see it amongst the previous issues raised.
I have a problem with parcels when loading a large number of daily output of an ORCA12 run (arrays of size (4032,3059, 50)), when doing a simple 2D advection experiment (7000 particles, 20 timesteps).
The experiment works fine when loading 30 daily u,v fields, but crashes when I choose to load 100 daily u,v fields instead. I thought that the option deferred_load in=True, in:
fieldset = FieldSet.from_netcdf(filenames, variables, dimensions,deferred_load=True)
would avoid this memory problem.
Do you know why this happens?
For info, I use the normal parcels python environment (no MPI) on a single node of 64Gb.

Many thanks in advance for your help.

erikvansebille · 2019-12-20T12:06:04Z

Hi @sjvg, thanks for logging this issue. I'm not sure what is happening here, not do I have any idea how to fix it. For us to tackle this, we would need to be able to at least reproduce the Issue on our systems. We do us ORCA12 fields here too, so perhaps we could try if your code also breaks on our system?

Could you share a minimal example of a breaking code?

sjvg · 2019-12-20T13:11:40Z

Hi Erik, thanks for you reply, please see code attached. All I change is the number of files fed to fieldset in : 'data': data_path + 'U00??.nc'.
When I load 30 files, no problem, when I load 100, I get:
INFO: Compiled JITParticleAdvectionRK4periodicBC ==> /tmp/parcels-985/bd35fb50f54210ceb82df350dbb549fb_0.so
INFO: Temporary output files are stored in /home/PARCELS/WORK/out-IPPOUKYX.
INFO: You can use "parcels_convert_npydir_to_netcdf /home/PARCELS/WORK/out-IPPOUKYX" to convert these to a NetCDF file during the run.
N/A% (0 of 1728000.0) | | Elapsed Time: 0:00:00 ETA: --:--:--
10% (172800.0 of 1728000.0) |############## | Elapsed Time: 0:01:27 ETA: 0:13:11
15% (259200.0 of 1728000.0) |##################### | Elapsed Time: 0:02:55 ETA: 0:24:47
20% (345600.0 of 1728000.0) |############################ | Elapsed Time: 0:04:20 ETA: 0:22:36
/var/spool/slurmd/job1108890/slurm_script: line 34: 175423 Killed ${ENV_PATH}/bin/python ./Example_parcels.py
slurmstepd: error: Step 1108890.4294967294 hit memory+swap limit at least once during execution. This may or may not result in some failure.
slurmstepd: error: Job 1108890 hit memory+swap limit at least once during execution. This may or may not result in some failure.

Thanks for your help.

Simon

Example_parcels.txt

erikvansebille · 2020-01-02T13:46:41Z

Thanks for the extra details and the code, @sjvg. I have had some time to have a more careful look today. First of all, I'm not able to reproduce your bug on either my local iMac or on our Faculty's HPC system; all works fine here. Note that we have NEMO output on five-day time frequency, though...

I did have to change the FieldSet declaration a bit to the one below, but I don't think this causes the Issue

data_path = '/Volumes/oceanparcels/input_data/NEMO-MEDUSA/ORCA0083-N006/means/ORCA0083-N06_2000*'
ufiles = sorted(glob(data_path+'U.nc'))
vfiles = sorted(glob(data_path+'V.nc'))
wfiles = sorted(glob(data_path+'W.nc'))

filenames = {'U': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': ufiles},
             'V': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': vfiles},
             'W': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': wfiles}}
variables = {'U': 'uo', 'V': 'vo', 'W': 'wo'}
dimensions = {'U': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'V': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'W': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'}}

What I find particularly strange is that your Issue appears after only four days; well before the difference between declaring 20 files and 100 files should start to matter. With the deferred load, the difference in memory overhead between the two runs should only be a list of the 80 filenames that are included in the 100days-run and not the 20days-run; that's can't be the difference!

CKehl · 2020-04-21T11:08:24Z

Dear @sjvg ,
we have updated parcels recently to the new version 2.1.5 which is up-to-date on conda-forge and github. You may want to give the new version a try, which does again some improvements on the memory behaviour. Have a look at the NEMO examples in parcels/examples/example_dask_dhunk_OCMs.py on how to set this up. we're looking forward to your feedback on the new version.
Cheers,
Christian

erikvansebille closed this as completed Jan 2, 2020

erikvansebille reopened this Jan 2, 2020

claudiofgcardoso mentioned this issue Jan 8, 2020

Issue with memory exceeding limit #711

Closed

CKehl mentioned this issue Jan 28, 2020

Fix chunking mem leak #719

Merged

cjongedijk mentioned this issue Feb 6, 2020

4D (time evolving) depth S-grids #660

Merged

CKehl closed this as completed May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory problem loading long list of daily model output #703

memory problem loading long list of daily model output #703

sjvg commented Dec 18, 2019

erikvansebille commented Dec 20, 2019

sjvg commented Dec 20, 2019

erikvansebille commented Jan 2, 2020

CKehl commented Apr 21, 2020

memory problem loading long list of daily model output #703

memory problem loading long list of daily model output #703

Comments

sjvg commented Dec 18, 2019

erikvansebille commented Dec 20, 2019

sjvg commented Dec 20, 2019

erikvansebille commented Jan 2, 2020

CKehl commented Apr 21, 2020