-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem of RAM memory exhaustion for datasets with unlimited axis? #1168
Comments
Please post the data file somewhere and put the link in this issue. |
@jerabaul29 we really can't make any progress on diagnosing the problem without having access to the data file. Is there any problem with providing access to the dataset? |
Hi @jswhit2! I initially encountered the problem with the memory over usage. The example data set can be found here https://www.dropbox.com/s/zk6js1cmt6p2tj9/wave_data_bad.nc?dl=0 |
Many thanks for uploading your example file @BrazhnikovDmitry :) . @jswhit2 sorry for the absence of response on my side, I was traveling, some backlogs. I can confirm that I get the error on the exact file @BrazhnikovDmitry uploaded now :) . |
OK, I've got the file now, thanks. Just curious why you decided to make the 'time' unlimited dimension the rightmost dimension (last in the list of dimensions for that variable). Typically the unlimited dimension is defined as the leftmost (slowest varying) dimension. I bet that if you had done it that way accessing the data along the unlimited dimension would be much faster. |
On MacOS with the latest github master for both netcdf4-python and netcdf-c I don't see this problem. Here's my simple test script: from netCDF4 import Dataset
import tracemalloc, time
def read_data():
nc = Dataset('wave_data_bad.nc')
data = nc["timeIMU"][0, :]
nc.close()
tracemalloc.start()
# function call
t1 = time.perf_counter()
read_data()
t2 = time.perf_counter()
print('time = %s secs' % str(t2-t1))
# displaying the memory
print('peak memory = %s bytes' % tracemalloc.get_traced_memory()[1])
# stopping the library
tracemalloc.stop()
>> time = 110.724687782 secs
>> peak memory = 51784442 bytes I'm pretty sure nothing has changed in the python module that would impact this, so perhaps it's something that could be remedied by updating the netcdf and hdf5 C libs? |
Ok, interesting. I saw it on Linux Ub 20.04 fully up to date as previously mentioned, just curious, @BrazhnikovDmitry which OS and version are you using? :) |
It was my thought as well. I did not have time to update to the latest netcdf llibrary and have a check. The file was created with 4.7.4. According to Unidata/netcdf-c#1913 they fixed some memory leaks in 4.8.0. |
I've also encountered this issue with unlimited dimensions but I solved it similar to #859, by increasing the chunksize of the unlimited dimension. |
First, credits to @BrazhnikovDmitry for finding this, I am only writing the issue but he should get credit for pointing this out :) .
It seems like opening a dataset with an unlimited dimension can cause RAM memory exhaustion and crash. For example, I have a file with an unlimited dimension of size:
The data file is relatively big for using on my local machine (a laptop), but not huge: 1.6GB in total. My local machine has 16GB or RAM, out of which 8GB + are completely free.
When trying to open a small slice of a field of the dataset (the first index is an "instrument ID", the second index is the unlimited time dimension):
all goes well.
But when trying to open the whole field:
all the RAM gets exhausted (while I had over 8GB of RAM available when starting the command; seems like RAM use increases almost linearly over the course of a few seconds, until it is exhausted), and the process gets killed automatically (which is great actually, because, as you can imagine, my whole system freezes when all RAM gets used, so nice that somehow the process gets killed and my system responsiveness is restored :) ).
The interesting thing is, packaging the exact same data, but with a fixed dimension size, the whole field can be open with the same
[0, :]
without encountering any issue, and using just a few 100s MBs of RAM.version and system information
OS: Ubuntu 20.04, fully updated
ipython:
The text was updated successfully, but these errors were encountered: