Load hazard raster data lazily #578

peanutfun · 2022-11-07T16:17:06Z

Changes proposed in this PR:

Use dask array chunks to load hazard raster data lazily
Add sparse as new dependency for that task

This PR fixes #544, an issue raised during the review of #507

PR Author Checklist

Read the Contribution Guide
Correct target branch selected (if unsure, select develop)
Source branch up-to-date with target branch
Documentation updated
Tests updated
Tests passing
No new linter issues

PR Reviewer Checklist

* Always load hazard raster data into chunked dask arrays. * Support both dask and numpy DataArrays being passed as hazard data.

timschmi95 · 2022-11-08T10:20:29Z

This change does enable to use the .from_raster_xarray method instead of my custom method for my hazard file without running into memory issues 👍
My hazard is a xarray dataset created from multiple netcdf files, and the following dimensions: time: 3660, chy: 225, chx: 352

timschmi95 · 2022-11-08T10:45:19Z

This change does enable to use the .from_raster_xarray method instead of my custom method for my hazard file without running into memory issues 👍 My hazard is a xarray dataset created from multiple netcdf files, and the following dimensions: time: 3660, chy: 225, chx: 352

Actually, I realize that depending on the previous memory allocation I still run into a memory error. I'm not sure if the chunking could be made more efficient, or if it's just too much for my laptop anyways and I either pre-slice the data to a smaller extent (like in my custom function), or run it on euler if I need the full extent.

peanutfun · 2022-11-08T10:50:32Z

Actually, I realize that depending on the previous memory allocation I still run into a memory error.

Xarray does not check how much memory in your system is already in use. I currently set chunks="auto", which simply ensures that one chunk can fit into the entire system RAM. However, from_raster_xarray explicitly supports passing Datasets that are already opened, so you can open it with custom chunking. The following would be a somewhat conservative choice:

ds = xr.open_dataset("file.nc", chunks=dict(time=100, chy=-1, chx=-1))
hazard = Hazard.from_raster_xarray(ds, **kwargs)

climada/hazard/base.py

peanutfun · 2023-01-13T10:31:27Z

@emanuel-schmid The sparse package still seems to be missing from the testing environment

emanuel-schmid · 2023-01-13T12:15:58Z

@emanuel-schmid The sparse package still seems to be missing from the testing environment

rather again than still ;)
But it's been re-installed just now.

emanuel-schmid · 2023-01-13T12:18:23Z

tests are running and I'm gonna include sparse in the develop branch

climada/hazard/base.py

Co-authored-by: Emanuel Schmid <51439563+emanuel-schmid@users.noreply.github.com>

climada/hazard/base.py

emanuel-schmid · 2023-01-16T08:54:29Z

Let me quickly come back to a topic related to a previous PR, not really related to this one but by the method concerned:
Why did we chose 'time' as the default 'event' coordinate name, and not 'event'? 🧐

peanutfun · 2023-01-16T10:26:31Z

Why did we chose 'time' as the default 'event' coordinate name, and not 'event'?

@emanuel-schmid In the Hazard class, we distinguish certain events, that can occur at the same or different times. However, most hazard datasets contain a time variable, but not an event variable. So the choice was made to use the most reasonable default value, but to still call the key event, and not time, to make clear that the variable will relate to different events in the Hazard class context. Does that make sense? 😵

emanuel-schmid · 2023-01-16T12:11:40Z

Yes, that makes sense. 👍 I've added a note to this respect in the pydoc string in case somebody else is wondering. Feel free to revert if it isn't appropriate.

peanutfun · 2023-01-16T12:56:57Z

Very good! 🤩

peanutfun · 2023-01-17T09:04:32Z

@emanuel-schmid Is this ready to go from your perspective?

requirements/env_climada.yml

emanuel-schmid · 2023-01-17T14:55:57Z

🎉 Ready to go, from my perspective.

emanuel-schmid · 2023-01-17T15:26:18Z

🙇‍♂️ many thanks!

peanutfun added 2 commits November 7, 2022 17:13

Add sparse to dependencies

efdd05a

Load hazard raster data lazily

dfc5d9a

* Always load hazard raster data into chunked dask arrays. * Support both dask and numpy DataArrays being passed as hazard data.

Add 'sparse' to setup.py requirements

755fcd5

peanutfun mentioned this pull request Jan 11, 2023

Define specific conda environment for Jenkins builds #618

Closed

emanuel-schmid reviewed Jan 12, 2023

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

peanutfun added 3 commits January 13, 2023 10:58

Merge branch 'develop' into feature/xarray-loader-lazy

54be7cf

Ensure correct order of sparse and scipy.sparse

9ba6c9e

Update Hazard.from_raster_xarray docstring

197e336

peanutfun marked this pull request as ready for review January 13, 2023 10:16

peanutfun requested a review from emanuel-schmid January 13, 2023 10:16

emanuel-schmid reviewed Jan 13, 2023

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

Update climada/hazard/base.py

144efbb

Co-authored-by: Emanuel Schmid <51439563+emanuel-schmid@users.noreply.github.com>

peanutfun commented Jan 13, 2023

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

emanuel-schmid reviewed Jan 13, 2023

View reviewed changes

climada/hazard/base.py Outdated Show resolved Hide resolved

pylint

c92de88

peanutfun linked an issue Jan 16, 2023 that may be closed by this pull request

Follow-up to #507: Lazy loading of chunked xarray Datasets #544

Closed

Merge branch 'develop' into feature/xarray-loader-lazy

6f0d634

hazard.from_raster_xarray: comment about default coord

5a76305

dependencies: be specific about sparse version

fc29dd9

emanuel-schmid added 2 commits January 17, 2023 13:49

cosmetics

f473721

Merge branch 'develop' into feature/xarray-loader-lazy

857fa8d

peanutfun commented Jan 17, 2023

View reviewed changes

requirements/env_climada.yml Outdated Show resolved Hide resolved

dependencies: let conda install sparse

cb3015b

peanutfun merged commit d244150 into develop Jan 17, 2023

emanuel-schmid deleted the feature/xarray-loader-lazy branch January 17, 2023 15:35

emanuel-schmid mentioned this pull request Jan 19, 2023

Follow-up to #507: Lazy loading of chunked xarray Datasets #544

Closed

peanutfun mentioned this pull request Jan 26, 2023

Invalid import in hazard.base.py #632

Closed

peanutfun mentioned this pull request Feb 8, 2023

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load hazard raster data lazily #578

Load hazard raster data lazily #578

peanutfun commented Nov 7, 2022 •

edited

Loading

timschmi95 commented Nov 8, 2022

timschmi95 commented Nov 8, 2022

peanutfun commented Nov 8, 2022

peanutfun commented Jan 13, 2023

emanuel-schmid commented Jan 13, 2023

emanuel-schmid commented Jan 13, 2023

emanuel-schmid commented Jan 16, 2023

peanutfun commented Jan 16, 2023

emanuel-schmid commented Jan 16, 2023

peanutfun commented Jan 16, 2023

peanutfun commented Jan 17, 2023

emanuel-schmid commented Jan 17, 2023

emanuel-schmid commented Jan 17, 2023

Load hazard raster data lazily #578

Load hazard raster data lazily #578

Conversation

peanutfun commented Nov 7, 2022 • edited Loading

PR Author Checklist

PR Reviewer Checklist

timschmi95 commented Nov 8, 2022

timschmi95 commented Nov 8, 2022

peanutfun commented Nov 8, 2022

peanutfun commented Jan 13, 2023

emanuel-schmid commented Jan 13, 2023

emanuel-schmid commented Jan 13, 2023

emanuel-schmid commented Jan 16, 2023

peanutfun commented Jan 16, 2023

emanuel-schmid commented Jan 16, 2023

peanutfun commented Jan 16, 2023

peanutfun commented Jan 17, 2023

emanuel-schmid commented Jan 17, 2023

emanuel-schmid commented Jan 17, 2023

peanutfun commented Nov 7, 2022 •

edited

Loading