Optimize NuthKaab RAM usage #198

rhugonnet · 2021-08-09T13:52:20Z

Slightly related to #110.

Issue

The apply_matrix function is using a lot of RAM, which rapidly limits application on personal computers.

I tested with a couple of DEMs of ~500 and 100MB on disk that, respectively, take about 1GB and 200MB in-memory.

Running NuthKaab.fit() (using the 100MB DEM as reference to reduce usages) uses ~2Go of RAM, which is OK but still a lot.
We should aim to cap it at 3-4 times the DEM size, enough for: 1x elevation differences, 1x slope, 1x aspect.
Running NuthKaab.apply() on the 500MB DEM (the dem to align) results in a memory error after using more than 8GB of memory.
We should aim to cap it at 1 times the DEM size, enough for: 1x DEM correction at the same grid size, possibly derived from a 4x4 matrix.

Started a fix

I tried to re-write the old apply_func of the Nuth and Kaab based on arrays into a case to treat differently in apply_matrix but I'm not sure that it is correct as I'm only familiar with the method of shifting the geotransform + warping (input @erikmannerfelt, @adehecq ?).

If we find a magic solution to speed up/limit RAM usage for the generic apply_matrix solution, great!
But otherwise, we should probably try to subdivide in use cases to make it as efficient as possible. I guess that leaving these cases outside of the NuthKaab and other Coreg subclasses makes sense in case the apply_func needs to be applied for a full CoregPipeline.

Thoughts

These issues made me think that we could add a "default" behaviour for coregistration that optimizes efficiency:

The extent considered: right now it's the reference DEM extent. But this is not always optimal. If the reference DEM has a larger extent than the slave, it will take a lot of memory to load and duplicate the array.
The resolution considered: right now it's the reference DEM resolution. If this resolution is higher than that of the slave DEM, the reprojected datasets will take a lot of additional space for not much added value. Automatically, we should default to the lowest resolution of the two DEMs.

What do you think?

Update upstream

erikmannerfelt · 2021-08-10T16:13:23Z

Huh, I get a 500 HTTP response when trying to look at the changed files!

Making a new apply function (or just reviving the old one) is one solution to the problem, indeed! It's unfortunate that the fix is required (I'm not particularly proud of apply_matrix(), even though it technically does the job...), but if this increases performance, then so be it!

Your "Thoughts" parts are most easily handled by the user-end. The Coreg class takes two rasters in the same grid and estimates the shifts. If the reference DEM has a higher resolution, why not reproject the reference DEM? If it is larger than the TBA DEM, it could be cropped beforehand.

Making a custom apply_func() is supposed to be as easy as just implementing it (the apply() function checks whether apply_func() raises a NotImplementedError, and if so, falls to apply_matrix()). You can see e.g. Deramp for the structure of that function. Again, I haven't seen what you've written, since I can't access the changed files page!!!

adehecq · 2021-08-25T15:25:03Z

I fully agree that currently the generic implementation is overly complex, especially in the case of Nuth & Kaab where we only need a shift. I also agree that we should try to optimize memory usage so that it can run on a desktop computer, as it will be most use cases.
As pointed out by @erikmannerfelt, an easy fix is to have a custom apply_func for Nuth & Kaab. So your suggested change could just be moved place (@erikmannerfelt I can perfectly see Romain's changes).
The issue with that is that we won't optimize coregistration when using a pipeline, as each step will be applied twice (for fitting and then when applied)... The only solution is to come up with a better generic apply_func matrix.

I also agree with @erikmannerfelt that the issue of DEM resolution and extent can easily be handled beforehand by the user with all of the nice geoutils tools 😉.
Alternatively, we could accept two input DEMs that are on different grids and add a step so that one is resampled onto the other's grid. One can then decide which grid to use. That is what I was doing in my older Nuth & Kaab implementation: https://github.com/GeoUtils/geoutils/blob/5c41b2a343d1f0fc161eed4426154656c06841a4/geoutils/dem_coregistration.py#L241

Should we talk about all this together when @rhugonnet get back to work?

rhugonnet · 2021-09-03T17:05:27Z

I also agree with @erikmannerfelt that the issue of DEM resolution and extent can easily be handled beforehand by the user with all of the nice geoutils tools .

As these resolution/extent steps would have no influence on the N&K results, but have high potential to improve its efficiency, I think it should be an optimization within the xdem Class or similar. Rather than letting every user figure out why their N&K runs slowly or uses a lot of RAM in specific cases!
A good subject for the meeting next Wednesday :)

rhugonnet · 2022-09-20T22:14:54Z

Closing this, as we'll be working on the Coreg classes again soon, and we probably need to revisit the apply_matrix function more generically before addressing the issue of optimizing each type of Coreg.

We still have the ~20 lines of changes proposed here for the record and the discussion.

rhugonnet added 5 commits March 27, 2021 14:26

Merge pull request #6 from GlacioHack/main

f7f6108

Update upstream

Merge branch 'GlacioHack:main' into main

9c97f54

Merge branch 'GlacioHack:main' into main

3a48737

Merge remote-tracking branch 'upstream/main'

c7f18be

old nuthkaab apply

11975dc

rhugonnet changed the title ~~Optimize NuthKaab apply RAM usage~~ Optimize NuthKaab RAM usage Aug 9, 2021

add opencv exception in apply_matrix

182b08d

rhugonnet closed this Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize NuthKaab RAM usage #198

Optimize NuthKaab RAM usage #198

rhugonnet commented Aug 9, 2021

erikmannerfelt commented Aug 10, 2021

adehecq commented Aug 25, 2021

rhugonnet commented Sep 3, 2021 •

edited

Loading

rhugonnet commented Sep 20, 2022

Optimize NuthKaab RAM usage #198

Optimize NuthKaab RAM usage #198

Conversation

rhugonnet commented Aug 9, 2021

Issue

Started a fix

Thoughts

erikmannerfelt commented Aug 10, 2021

adehecq commented Aug 25, 2021

rhugonnet commented Sep 3, 2021 • edited Loading

rhugonnet commented Sep 20, 2022

rhugonnet commented Sep 3, 2021 •

edited

Loading