Clipping & Reprojecting larger Datasets with rioxarray #222

lwasser · 2021-01-27T18:53:26Z

lwasser
Jan 27, 2021

IN this discussion it was mentioned that a user was running into errors when clipping data. i've encountered the same thing when even clipping data that don't seem all that large like NAIP 4 band scenes. One thing i noticed and @joemcglinchy can confirm this is that NAIP scenes come in as floats which makes clipping much slower.

Here is my question:

I tried using reproject_match to "clip" an array to the spatial extent of another array. It works wonderful but seems to use a huge amount of memory. Note that it works more efficiently if the data are coerced to an int format.

Given the issues encountered above, i started to play with chunking the data by band, x and y upon import. While this looked like it worked, when i tried to reproject again, it seemed to still be consuming RAM. my (very basic) understanding of Dask and chunking is that it somehow allows operations to run via CPU power rather than consume as much ram. In practice i wasn't sure if that was happening.

What are the best practices for reprojecting and matching raster data that are large. Should we use chunks and do chunks really send the data to CPU power vs stored in memory?

Many thanks for any guidance here. this has been a learning experience for me in efficient computing.

snowman2 · 2021-01-27T18:56:36Z

snowman2
Jan 27, 2021
Maintainer

somehow allows operations to run via CPU power rather than consume as much ram.

One thing to note is that dask is not going to reduce the total amount of RAM used by your machine. It can distribute the memory used to different processors, but the total amount of RAM used will be the same.

I tried using reproject_match to "clip" an array to the spatial extent of another array. It works wonderful but seems to use a huge amount of memory. Note that it works more efficiently if the data are coerced to an int format.

This method loads all of the data from disk and resamples it to the grid to match to. It is not an efficient method for clipping. Its main use is for aligning grids.

What are the best practices for reprojecting and matching raster data that are large. Should we use chunks and do chunks really send the data to CPU power vs stored in memory?

Loading from disk is the best method. You can reproject and clip from disk: #222 (comment).

1 reply

snowman2 Jan 29, 2021
Maintainer

#226

rmg55 · 2021-01-27T19:16:05Z

rmg55
Jan 27, 2021

@lwasser - I think the current reproject method pulls dask arrays into memory (back into numpy arrays - let me know if I have this wrong @snowman2!) . See this thread - #119. My current workaround is to use VRTs to reproject before reading into an xarray object. You can use the rasterios warped_vrt method. Here is an example gist. Hope that helps

1 reply

snowman2 Jan 28, 2021
Maintainer

That was a good point to bring up. When the datasets are large, using operations from disk is definitely the way to go.
For clipping, this should help (ref #115). But, rio.clip_box is a workable alternative (ref #207).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clipping & Reprojecting larger Datasets with rioxarray #222

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Clipping & Reprojecting larger Datasets with rioxarray #222

lwasser Jan 27, 2021

Replies: 2 comments · 2 replies

snowman2 Jan 27, 2021 Maintainer

snowman2 Jan 29, 2021 Maintainer

rmg55 Jan 27, 2021

snowman2 Jan 28, 2021 Maintainer

lwasser
Jan 27, 2021

Replies: 2 comments 2 replies

snowman2
Jan 27, 2021
Maintainer

snowman2 Jan 29, 2021
Maintainer

rmg55
Jan 27, 2021

snowman2 Jan 28, 2021
Maintainer