Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue due to flow field data #668

Closed
VictorOnink opened this issue Oct 9, 2019 · 16 comments
Closed

Memory issue due to flow field data #668

VictorOnink opened this issue Oct 9, 2019 · 16 comments

Comments

@VictorOnink
Copy link

VictorOnink commented Oct 9, 2019

I've just updated python to version 3.7, and therefore I reinstalled parcels with the most up to date version (v2.1 that was just released). However, with this new version of parcels I find that my runs crash due to a memory error. Specifically, the error message reads:

MemoryError: Unable to allocate array with shape (1, 3251, 4500) and data type float32

The shape of the array is the same as that of one time step of the HYCOM surface circulation data that I have been using, which makes me suspect that the issue is related to releasing the files from memory once the simulation has passed them. While running a simulation I have also followed the working memory being used by the script, which increases until it is so high that the program crashes. Furthermore, when I run the script using only one flow field data file, then it runs without issue.

I've come across the same issue with parcels installed both on a linux and a windows machine, while it didn't come up when i installed parcels v2.0, so it likely lies somewhere in the more recent additions to the parcels code. It came up already last week, so Philippe already has the code and datafiles that I used that led to the issue on my machines.

@delandmeterp
Copy link
Contributor

Hi Victor, could you provide the exact file you are running?
We've run your file last week with totalKernal=AdvectionRK4+diffusion+Beach , and it was going fine on Macos. I've run it again now, and on gemini as well, with similar fine result (see below)

linux_memory
macos_memory

@VictorOnink
Copy link
Author

The file I used is the exact same as the one last week, with the only difference being that the paths to the files were changed due to it being run on a different computer. But that is odd then that the error doesn't come up on your servers. I've now been using v2.0 and that works fine without any errors, but then I'm not sure what the issue is with v2.1.

Since posting the issue I've been working with my code so quite a few of the kernels have changed somewhat. I'll try reinstalling v2.1 at some later point and I'll see if a similar error comes up again. It might just be an issue specific to my computer or due to some part of a kernel.

@CKehl
Copy link
Contributor

CKehl commented Jan 17, 2020

After expanding @delandmeterp 's tests on the field chunking (see https://nbviewer.jupyter.org/github/OceanParcels/parcels/blob/master/parcels/examples/documentation_MPI.ipynb) by tracking memory consumption and running it with MPI. Currently, the given example is a toy example looking at particle number and time steps, but it already shows a trend. The simulations are run with 48 particles, runtime is 7 days, dt is 1h.

testing the chunking without MPI-mode:
mpiChunking_plot_np4_48p_7days_woGC

testing the chunking with MPI-mode:
mpiChunking_plot_MPI_np4_48p_7days_woGC

Now, it got myself inspired by @delandmeterp and tried out what difference the garbage collection makes.

testing the chunking with Garbage Collection without MPI-mode:
mpiChunking_plot_np4_48p_7days_withGC

testing the chunking with Garbage collection with MPI-mode:
mpiChunking_plot_MPI_np4_48p_7days_withGC

Clearly, we can see that memory consumption builds up over the run, which should not be the case, and garbage collection actually assures a nye-constant memory consumption for the example.

Still though, aside from using the garbage collection, I still don't know in detail where is malicious memory behaviour has its roots.

PS: the green line tracks the number of open files, which I log too to investigate the errors with respect to job queueing systems. Thus, for the memory, you can ignore that line. The yellow one is important.

@CKehl
Copy link
Contributor

CKehl commented Jan 21, 2020

I went further into the profiling and it turns out that the old flow fields don't get deleted. For that, we can have a look at a simple example that advects 96 particle with common RK4 (+ deleting out-of-bounds particles) on the CMEMS data. Here, we advect it over a time period of 7 days with a dt=1h. To study the memory behaviour, we first exclude garbage collection (the default) and don't use MPI over more than 1 processor. The result is as such:

testMem_noMPI_woGC

if we assess the yellow curve, we see a steady minute increase each hour of a few MB, which comes from the particles. This is, in this case, not a major issue and could also come from the "repeatdt". The major memory leak is seen at the end of each day (which is when new field data are loaded), where the memory accumulates. Basically, this directly shows that new fields are loaded from disk, but the old data in memory are not deleted. This can cause memory overflows quickly when doing long-running simulations over many days and months. From what the memory_profiler in Python tells me, this leakage occurs in the "computeTimeChunk" method of the field. Still, the error needs to be found somewhere else, where the last 2 time steps are actually shifted back to make space for the new data (hypothesis).

Actually, calling the garbage collector after each advection does NOT solve the problem, as the curves with and without GC look exactly the same:

testMem_noMPI_wGC

If we compare the same run with 1 processor and then in MPI with 2 processors, we see that the problem compounds (obviously), as the leakage occurs for each processor individually.

testMem_MPI_woGC

@CKehl
Copy link
Contributor

CKehl commented Jan 21, 2020

Little additional: if we look at 31 days of simulation, we see that the memory is sort-of 'resets' when wrapping around the 30-day period covered by the CMEMS model.

testMem_MPI_wGC_31days

@VictorOnink
Copy link
Author

That matches with what I remember of the original error. The error I got reported that a memory error arose when the next time step of the data was to be loaded, since it gave the array dimensions of the variable that caused the error and this matched exactly with 1 time slice of flow field data. The simulations were also not with more particles than I had used in previous runs, so that the memory error is due to not deleting the old fields.

As for your little additional, there is a sort of reset of the memory, but then right after the memory jumps up again to a higher level than previous, so is it really a reset? or more a temporary blip?

@CKehl
Copy link
Contributor

CKehl commented Jan 21, 2020

no, Victor, you are absolutely right: the memory bumps back to the previous high level (I have a deeper look on that right now too). This is what explains the high blue bars at that point: it unload the data and then reloads them somehow in each latter iteration - probably due to data interpolation between timesteps for particles in the various field chunks. But why it loads then ALL the timesteps again is beyond me right now ... looking into it.

Thanks for your confirmation - that helps in tracking the error.

I still need to run all that with SciPy more and with the previous version to see what's what.

@CKehl
Copy link
Contributor

CKehl commented Jan 21, 2020

33 days - I can't run more locally because my memory taps out beyond that, The spikes are weird, but what is even more weird is that the bar for open files (green one) drops to 2 and stays there. Basically, saying: it loads field after field in memory, file by file, after 'wrapping around' the time domain.

testMem_MPI_wGC_33days

Good thing is: the memory consumption is the same for any number of cores. Basically: splitting a grid into N number of equally-sized grids for MPI works. But the actual data load from file does not.

testMem_noMPI_wGC_33days

Comment CK: is run with 'allow_time_extrapolation', which causes the weird spike after 30 days of simulation.

@CKehl
Copy link
Contributor

CKehl commented Jan 22, 2020

Seems that the core of the issue is the NetcdfFileBuffer itself.

if we do the current version and track what is happening there, the log looks like this (for NetcdfFileBuffer.data):

NetCDF engine: netcdf4
NetCDF dataset[uo] as <class 'xarray.core.dataarray.DataArray'>: <xarray.DataArray 'uo' (time: 1, depth: 50, latitude: 2041, longitude: 4320)>
[440856000 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667
  * latitude   (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0
  * depth      (depth) float32 0.494025 1.541375 2.645669 ... 5274.784 5727.917
  * time       (time) datetime64[ns] 2016-07-01T12:00:00
Attributes:
    long_name:      Eastward velocity
    standard_name:  eastward_sea_water_velocity
    units:          m s-1
    unit_long:      Meters per second
    valid_min:      -3454
    valid_max:      4455
    cell_methods:   area: mean
Type of actual data: <class 'numpy.ndarray'>
NetCDF indiced: {'lon': range(0, 4320), 'lat': range(0, 2041), 'depth': [0]}
dask-xarray shape: (1, 1, 2041, 4320)

the important bit is the line Type of actual data: <class 'numpy.ndarray'>, meaning that the it makes no difference in the rest of the code to use Dask cause the loaded data from NetCDF are already loaded in full at that stage.

Going through the documentation of xarray and Dask, the following thing turns out:
the xarray.open_dataset() method ONLY uses Dask and lazy allocation if the chunking information is provided in the open-call, as xarray.open_dataset(..., chunks=...). if that is not the case, this NetCDF call WILL allocate the data as nump.ndarray, and thus any later chunking has nearly no effect because the data are already in full in the memory.

This is verified cause, when just adding a "blueprint" chunk statement to the xr.open_dataset(...) call, the log looks like this:

NetCDF engine: netcdf4
NetCDF dataset[uo] as <class 'xarray.core.dataarray.DataArray'>: <xarray.DataArray 'uo' (time: 1, depth: 50, latitude: 2041, longitude: 4320)>
dask.array<open_dataset-9e81bb2c309698efdb108cdc66af94eduo, shape=(1, 50, 2041, 4320), dtype=float32, chunksize=(1, 50, 2041, 4320), chunktype=numpy.ndarray>
Coordinates:
  * longitude  (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667
  * latitude   (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0
  * depth      (depth) float32 0.494025 1.541375 2.645669 ... 5274.784 5727.917
  * time       (time) datetime64[ns] 2016-07-01T12:00:00
Attributes:
    long_name:      Eastward velocity
    standard_name:  eastward_sea_water_velocity
    units:          m s-1
    unit_long:      Meters per second
    valid_min:      -3454
    valid_max:      4455
    cell_methods:   area: mean
Type of actual data: <class 'dask.array.core.Array'>
NetCDF indiced: {'lon': range(0, 4320), 'lat': range(0, 2041), 'depth': [0]}
dask-xarray shape: (1, 1, 2041, 4320)

As we can see at Type of actual data: <class 'dask.array.core.Array'>, now the xarray NetCDF loader uses lazy-allocation Dask for the data.

I'm memory-profiling the change to see if that makes the impact I expect it to make.

@CKehl
Copy link
Contributor

CKehl commented Jan 22, 2020

update for a 7-day run:

before bugfix:
fix_pretest

after bugfix:
fix

There is still a growing trend, but that can be explained from particles being more and more distributed over time, requiring more and more chunks.

Now running the month-long tests to verify stability.

@CKehl
Copy link
Contributor

CKehl commented Jan 22, 2020

One drawback on this whole process: if one does NOT want chunking and does want all 3 field dataset in the memory, then that's not gonna work that easy. The problem is: of you concatenate a fixed-allocated array (e.g. via numpy) with dask-concatenate and treat it from then on as Dask, then shifting- and concatenation operations will NOT automatically free unused numpy arrays (which is why this error was there in the first place). In other words: as soon as an array becomes a dask-array, unused data in memory are not freed, because dask-array indexing operations don"t do anything on the memory.

Hence, of one defines his FieldSet by initialisation with "chunking=False", all array calls that are now fixed on Dask need to be replaced with xarray or numpy, and probably the whole chunking (i.e. all functions calls related to that) need to the skipped. So, making that work will require a larger overhaul.

@CKehl
Copy link
Contributor

CKehl commented Jan 22, 2020

The fix also seems to work on the longer scale, though wrapping fieldsets in time (either by periodic or time extrapolation) still has some flaws. Here again for comparison the 33 days runs. Watch the yellow bar for memory. Keep in mind that the bar without the fix is measured in Gigabytes [=1000 MB], while the bar with the fix is measured in 1/10 of Gigabytes [=100MB]. Thus, though the bars look similar, there is actually a difference of a whole order of magnitude in memory consumption.

before bugfix:
testMem_noMPI_wGC_33days

after bugfix:
testMem_noMPI_woGO_33days

Comment CK: here, we have a periodic_time of 30 days - allow_time_extrapolation breaks completely for some reason.

@CKehl
Copy link
Contributor

CKehl commented Jan 22, 2020

I'm fixing some MPI-related problems, but also in MPI it starts working on 33 days:
testMem_MPI_woGC_33days

@CKehl
Copy link
Contributor

CKehl commented Jan 27, 2020

Here some results from recent runs - mind that I check the graphs for backward optimization mode and MP and submission systems too - they look the same.

Run forward with extrapolation (fieldsize=2048):
testMem_noMPI_extrapolation_fwd_33days_fs2048_fix

Run forward without deferred arrays with extrapolation (fieldsize=2048):
testMem_noMPI_extrapolation_noDefer_fwd_33days_fs2048_fix

Run forward with periodic wrapping (fieldsize=2048):
testMem_noMPI_periodic_fwd_33days_fs2048_fix

Run forward with extrapolation (fieldsize=256):
testMem_noMPI_extrapolation_fwd_33days_fs256_fix

Run forward with periodic wrapping (fieldsize='auto' while haveing a valid dask config yaml file):
testMem_noMPI_periodic_fwd_33days_fsAUTO_fix

@CKehl
Copy link
Contributor

CKehl commented Jan 28, 2020

here the normal forward simulation with extrapolation without repeatdt, meaning: without continuous particle addition:
testMem_noMPI_extrapolation_fwd_33days_fs2048_fix_noParticleAdd

@CKehl
Copy link
Contributor

CKehl commented Jan 28, 2020

testing and further discussion please at #719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants