Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory problem loading long list of daily model output #703

Closed
sjvg opened this issue Dec 18, 2019 · 4 comments
Closed

memory problem loading long list of daily model output #703

sjvg opened this issue Dec 18, 2019 · 4 comments

Comments

@sjvg
Copy link

sjvg commented Dec 18, 2019

Hello,

I'm fairly new to parcels, so apologies if such error has been dealt before, but I could not see it amongst the previous issues raised.
I have a problem with parcels when loading a large number of daily output of an ORCA12 run (arrays of size (4032,3059, 50)), when doing a simple 2D advection experiment (7000 particles, 20 timesteps).
The experiment works fine when loading 30 daily u,v fields, but crashes when I choose to load 100 daily u,v fields instead. I thought that the option deferred_load in=True, in:
fieldset = FieldSet.from_netcdf(filenames, variables, dimensions,deferred_load=True)
would avoid this memory problem.
Do you know why this happens?
For info, I use the normal parcels python environment (no MPI) on a single node of 64Gb.

Many thanks in advance for your help.

@erikvansebille
Copy link
Member

Hi @sjvg, thanks for logging this issue. I'm not sure what is happening here, not do I have any idea how to fix it. For us to tackle this, we would need to be able to at least reproduce the Issue on our systems. We do us ORCA12 fields here too, so perhaps we could try if your code also breaks on our system?

Could you share a minimal example of a breaking code?

@sjvg
Copy link
Author

sjvg commented Dec 20, 2019

Hi Erik, thanks for you reply, please see code attached. All I change is the number of files fed to fieldset in : 'data': data_path + 'U00??.nc'.
When I load 30 files, no problem, when I load 100, I get:
INFO: Compiled JITParticleAdvectionRK4periodicBC ==> /tmp/parcels-985/bd35fb50f54210ceb82df350dbb549fb_0.so
INFO: Temporary output files are stored in /home/PARCELS/WORK/out-IPPOUKYX.
INFO: You can use "parcels_convert_npydir_to_netcdf /home/PARCELS/WORK/out-IPPOUKYX" to convert these to a NetCDF file during the run.
N/A% (0 of 1728000.0) | | Elapsed Time: 0:00:00 ETA: --:--:--
10% (172800.0 of 1728000.0) |############## | Elapsed Time: 0:01:27 ETA: 0:13:11
15% (259200.0 of 1728000.0) |##################### | Elapsed Time: 0:02:55 ETA: 0:24:47
20% (345600.0 of 1728000.0) |############################ | Elapsed Time: 0:04:20 ETA: 0:22:36
/var/spool/slurmd/job1108890/slurm_script: line 34: 175423 Killed ${ENV_PATH}/bin/python ./Example_parcels.py
slurmstepd: error: Step 1108890.4294967294 hit memory+swap limit at least once during execution. This may or may not result in some failure.
slurmstepd: error: Job 1108890 hit memory+swap limit at least once during execution. This may or may not result in some failure.

Thanks for your help.

Simon

Example_parcels.txt

@erikvansebille
Copy link
Member

Thanks for the extra details and the code, @sjvg. I have had some time to have a more careful look today. First of all, I'm not able to reproduce your bug on either my local iMac or on our Faculty's HPC system; all works fine here. Note that we have NEMO output on five-day time frequency, though...

I did have to change the FieldSet declaration a bit to the one below, but I don't think this causes the Issue

data_path = '/Volumes/oceanparcels/input_data/NEMO-MEDUSA/ORCA0083-N006/means/ORCA0083-N06_2000*'
ufiles = sorted(glob(data_path+'U.nc'))
vfiles = sorted(glob(data_path+'V.nc'))
wfiles = sorted(glob(data_path+'W.nc'))

filenames = {'U': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': ufiles},
             'V': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': vfiles},
             'W': {'lon': meshmask, 'lat': meshmask, 'depth': wfiles[0], 'data': wfiles}}
variables = {'U': 'uo', 'V': 'vo', 'W': 'wo'}
dimensions = {'U': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'V': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'W': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'}}

What I find particularly strange is that your Issue appears after only four days; well before the difference between declaring 20 files and 100 files should start to matter. With the deferred load, the difference in memory overhead between the two runs should only be a list of the 80 filenames that are included in the 100days-run and not the 20days-run; that's can't be the difference!

@CKehl
Copy link
Contributor

CKehl commented Apr 21, 2020

Dear @sjvg ,
we have updated parcels recently to the new version 2.1.5 which is up-to-date on conda-forge and github. You may want to give the new version a try, which does again some improvements on the memory behaviour. Have a look at the NEMO examples in parcels/examples/example_dask_dhunk_OCMs.py on how to set this up. we're looking forward to your feedback on the new version.
Cheers,
Christian

@CKehl CKehl closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants