-
Notifications
You must be signed in to change notification settings - Fork 301
Closed
Milestone
Description
When I try and load a large (~ 30 GB, 2120x2600 grid, 90 levels) UM format file in Iris I am getting out of memory errors in the iris.load() call.
I expect instead that Iris should be able to load() any size UM file, and that an out of memory error should only occur if you try to load lazy data that is too large for memory.
It appears that array data is being read at https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/pp.py#L597 when the file is first loaded. I don't entirely understand Iris's lazy_data module, but perhaps the open() here could be wrapped by dask.delayed so that the file is only read from when data is needed, something like
def __getitem__(self, keys):
@dask.delayed
def load_data():
with open(self.path, "rb") as pp_file:
pp_file.seek(self.offset, os.SEEK_SET)
data_bytes = pp_file.read(self.data_len)
data = _data_bytes_to_shaped_array(
data_bytes,
self.lbpack,
self.boundary_packing,
self.shape,
self.src_dtype,
self.mdi,
)
return data
data = dask.array.from_delayed(load_data(), shape=self.shape, dtype=self.dtype)
return data.__getitem__(keys)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels