Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way forward for spatialstats functions to be more consistent and easier to use #588

Open
rhugonnet opened this issue Sep 12, 2024 · 0 comments
Labels
architecture Need to re-organize or re-structure something enhancement Feature improvement or request

Comments

@rhugonnet
Copy link
Member

rhugonnet commented Sep 12, 2024

@adehecq @MarinKneib Opening this based on your feedback (thanks a lot!), I agree we really need to adapt the uncertainty section of xDEM 🙂. It's getting old (more than 2 years!), and we didn't have the same perspective on the package back then.

Before diving into the way forward, two reminders:

Despite all of this, it would be good to further adapt the uncertainty functions, inputs/outputs, and arguments for ease of use and modularity... I've also been thinking about this recently to combine more easily with Coreg methods. The new DEM.estimate_uncertainty() function (see new documentation) is a good starting point for this, but the outputs are not ideal...

Here's a list of points based on your comments + other things I'm thinking of:

1. Variogram estimation/fitting/manipulation:

Note: In the new documentation, I chose to summarize the different options of 1+2 below as "Traditional" (single-range vario, homoscedastic), "Rolstad" (multi-range vario, homoscedastic) and "Hugonnet" (multi-range vario, heteroscedastic).

Instead of returning volatile empirical/modelled variogram as a pd.Dataframe and list[str], we should use the skgstat.Variogram class that encompasses both, now supports all of our specific routines (sum of variogram models, custom binning, etc) directly there, and can be used for quick plotting of the variogram!
In short:

  • Remove sample_empirical_variogram, fit_sum_variogram_models, get_variogram_model_func, plot_variogram from spatialstats and replace by a _variogram function in geoutils/stats/spatial that can be called from the objects directly as Raster.variogram() or PointCloud.variogram(), and has all arguments to take care of all the details with robust defaults (subsampling, sum of models, estimator).
  • Remove estimate_model_spatial_correlation, and replace by a new subclass of Variogram in xdem/uncertainty: ErrorVariogram, which would allow to move correlation_from_variogram, covariance_from_variogram, etc simply as class methods added on top, for methods more specific to uncertainty estimation with a variogram?

2. Heteroscedasticity estimation:

Here potentially less modifications, we could simply move all the binning functions in a geoutils/stats/binning and a function Raster.binning()?

But it is unclear in my mind what would be the best output type:

  • A new Heteroscedasticity class object containing pd.Dataframe from the binning, a string of the method to apply the binning (linear interpolation, per bin?, with what min_count?), and a class method to_error_map() or similar which would allow to derive the final Raster of random errors from applying that binning?
  • Or simply leave those inputs/outputs as volatile?
    The first option allows to have everything in one place, but introduces a new object for the user to get familiar with.

3. Error propagation

Note: In the new documentation, for spatial propagation, I also summarize the options of 3 below as "Exact" (full covariance sum), "Rolstad" (circular approximation) and "Hugonnet" (fast approximation of the covariance sum based on subsampling).

With 1. and 2. sorted, we'd have two consistent inputs for 3!

Here, in the short term, we could rely on a slightly modified function spatial_error_propagation close to the current one (maybe as a class method too, DEM.propagate_uncertainty()), which simply chooses one approximation method from a string (neff_xxx functions), and applies the error propagation to any area/shapefile/mask (with more flexibility on the input than currently).

In the longer term, it could be nice to use https://obsarray.readthedocs.io/en/latest/index.html. We'd need to coordinate with them to add spatial error propagation (I'm not sure exists yet), and it would then work implicitly on any operation we perform on a DEM or dDEM (spatial mean, sum, including with other data, etc).

Conclusion

I think these changes would help address most of your points, in particular for having a full object (ErrorVariogram or Heteroscedasticity) that we could .plot() and manipulate more easily, and pin good default values to it.
What do you think?

@rhugonnet rhugonnet added enhancement Feature improvement or request architecture Need to re-organize or re-structure something labels Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Need to re-organize or re-structure something enhancement Feature improvement or request
Projects
None yet
Development

No branches or pull requests

1 participant