Skip to content

Data not loaded lazily in a large UM format file #3586

@ScottWales

Description

@ScottWales

When I try and load a large (~ 30 GB, 2120x2600 grid, 90 levels) UM format file in Iris I am getting out of memory errors in the iris.load() call.

I expect instead that Iris should be able to load() any size UM file, and that an out of memory error should only occur if you try to load lazy data that is too large for memory.

It appears that array data is being read at https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/pp.py#L597 when the file is first loaded. I don't entirely understand Iris's lazy_data module, but perhaps the open() here could be wrapped by dask.delayed so that the file is only read from when data is needed, something like

    def __getitem__(self, keys):
        @dask.delayed
        def load_data():
            with open(self.path, "rb") as pp_file:
                pp_file.seek(self.offset, os.SEEK_SET)
                data_bytes = pp_file.read(self.data_len)
                data = _data_bytes_to_shaped_array(
                    data_bytes,
                    self.lbpack,
                    self.boundary_packing,
                    self.shape,
                    self.src_dtype,
                    self.mdi,
                )
                return data
        data = dask.array.from_delayed(load_data(), shape=self.shape, dtype=self.dtype)
        return data.__getitem__(keys)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions