Way forward for `spatialstats` functions to be more consistent and easier to use #588

rhugonnet · 2024-09-12T00:33:20Z

@adehecq @MarinKneib Opening this based on your feedback (thanks a lot!), I agree we really need to adapt the uncertainty section of xDEM 🙂. It's getting old (more than 2 years!), and we didn't have the same perspective on the package back then.

Before diving into the way forward, two reminders:

The new documentation should clarify some more things (in Update documentation and warnings before 0.1 release #502), see a preview of it here: https://xdem-rhugonnet.readthedocs.io/en/towards_0.1/,
We have already planned to re-structure spatialstats into an uncertainty module (see Re-structure spatialstats.py #378), move out certain functionalities that I have implemented directly in SciKit-GStat in the past years, and move other functions (binning, nmad, etc) into a stats module that could live directly in GeoUtils.

Despite all of this, it would be good to further adapt the uncertainty functions, inputs/outputs, and arguments for ease of use and modularity... I've also been thinking about this recently to combine more easily with Coreg methods. The new DEM.estimate_uncertainty() function (see new documentation) is a good starting point for this, but the outputs are not ideal...

Here's a list of points based on your comments + other things I'm thinking of:

1. Variogram estimation/fitting/manipulation:

Note: In the new documentation, I chose to summarize the different options of 1+2 below as "Traditional" (single-range vario, homoscedastic), "Rolstad" (multi-range vario, homoscedastic) and "Hugonnet" (multi-range vario, heteroscedastic).

Instead of returning volatile empirical/modelled variogram as a pd.Dataframe and list[str], we should use the skgstat.Variogram class that encompasses both, now supports all of our specific routines (sum of variogram models, custom binning, etc) directly there, and can be used for quick plotting of the variogram!
In short:

Remove sample_empirical_variogram, fit_sum_variogram_models, get_variogram_model_func, plot_variogram from spatialstats and replace by a _variogram function in geoutils/stats/spatial that can be called from the objects directly as Raster.variogram() or PointCloud.variogram(), and has all arguments to take care of all the details with robust defaults (subsampling, sum of models, estimator).
Remove estimate_model_spatial_correlation, and replace by a new subclass of Variogram in xdem/uncertainty: ErrorVariogram, which would allow to move correlation_from_variogram, covariance_from_variogram, etc simply as class methods added on top, for methods more specific to uncertainty estimation with a variogram?

2. Heteroscedasticity estimation:

Here potentially less modifications, we could simply move all the binning functions in a geoutils/stats/binning and a function Raster.binning()?

But it is unclear in my mind what would be the best output type:

A new Heteroscedasticity class object containing pd.Dataframe from the binning, a string of the method to apply the binning (linear interpolation, per bin?, with what min_count?), and a class method to_error_map() or similar which would allow to derive the final Raster of random errors from applying that binning?
Or simply leave those inputs/outputs as volatile?
The first option allows to have everything in one place, but introduces a new object for the user to get familiar with.

3. Error propagation

Note: In the new documentation, for spatial propagation, I also summarize the options of 3 below as "Exact" (full covariance sum), "Rolstad" (circular approximation) and "Hugonnet" (fast approximation of the covariance sum based on subsampling).

With 1. and 2. sorted, we'd have two consistent inputs for 3!

Here, in the short term, we could rely on a slightly modified function spatial_error_propagation close to the current one (maybe as a class method too, DEM.propagate_uncertainty()), which simply chooses one approximation method from a string (neff_xxx functions), and applies the error propagation to any area/shapefile/mask (with more flexibility on the input than currently).

In the longer term, it could be nice to use https://obsarray.readthedocs.io/en/latest/index.html. We'd need to coordinate with them to add spatial error propagation (I'm not sure exists yet), and it would then work implicitly on any operation we perform on a DEM or dDEM (spatial mean, sum, including with other data, etc).

Conclusion

I think these changes would help address most of your points, in particular for having a full object (ErrorVariogram or Heteroscedasticity) that we could .plot() and manipulate more easily, and pin good default values to it.
What do you think?

The text was updated successfully, but these errors were encountered:

rhugonnet added enhancement Feature improvement or request architecture Need to re-organize or re-structure something labels Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Way forward for `spatialstats` functions to be more consistent and easier to use #588

Way forward for `spatialstats` functions to be more consistent and easier to use #588

rhugonnet commented Sep 12, 2024 •

edited

Loading

Way forward for spatialstats functions to be more consistent and easier to use #588

Way forward for spatialstats functions to be more consistent and easier to use #588

Comments

rhugonnet commented Sep 12, 2024 • edited Loading

Way forward for `spatialstats` functions to be more consistent and easier to use #588

Way forward for `spatialstats` functions to be more consistent and easier to use #588

rhugonnet commented Sep 12, 2024 •

edited

Loading