Replies: 2 comments 2 replies
-
One thing to note is that dask is not going to reduce the total amount of RAM used by your machine. It can distribute the memory used to different processors, but the total amount of RAM used will be the same.
This method loads all of the data from disk and resamples it to the grid to match to. It is not an efficient method for clipping. Its main use is for aligning grids.
Loading from disk is the best method. You can reproject and clip from disk: #222 (comment). |
Beta Was this translation helpful? Give feedback.
-
@lwasser - I think the current reproject method pulls dask arrays into memory (back into numpy arrays - let me know if I have this wrong @snowman2!) . See this thread - #119. My current workaround is to use VRTs to reproject before reading into an xarray object. You can use the rasterios warped_vrt method. Here is an example gist. Hope that helps |
Beta Was this translation helpful? Give feedback.
-
IN this discussion it was mentioned that a user was running into errors when clipping data. i've encountered the same thing when even clipping data that don't seem all that large like NAIP 4 band scenes. One thing i noticed and @joemcglinchy can confirm this is that NAIP scenes come in as floats which makes clipping much slower.
Here is my question:
I tried using
reproject_match
to "clip" an array to the spatial extent of another array. It works wonderful but seems to use a huge amount of memory. Note that it works more efficiently if the data are coerced to an int format.Given the issues encountered above, i started to play with chunking the data by band, x and y upon import. While this looked like it worked, when i tried to reproject again, it seemed to still be consuming RAM. my (very basic) understanding of Dask and chunking is that it somehow allows operations to run via CPU power rather than consume as much ram. In practice i wasn't sure if that was happening.
What are the best practices for reprojecting and matching raster data that are large. Should we use chunks and do chunks really send the data to CPU power vs stored in memory?
Many thanks for any guidance here. this has been a learning experience for me in efficient computing.
Beta Was this translation helpful? Give feedback.
All reactions