Way forward for spatialstats
functions to be more consistent and easier to use
#588
Labels
architecture
Need to re-organize or re-structure something
enhancement
Feature improvement or request
@adehecq @MarinKneib Opening this based on your feedback (thanks a lot!), I agree we really need to adapt the uncertainty section of xDEM 🙂. It's getting old (more than 2 years!), and we didn't have the same perspective on the package back then.
Before diving into the way forward, two reminders:
0.1
release #502), see a preview of it here: https://xdem-rhugonnet.readthedocs.io/en/towards_0.1/,spatialstats
into anuncertainty
module (see Re-structurespatialstats.py
#378), move out certain functionalities that I have implemented directly in SciKit-GStat in the past years, and move other functions (binning
,nmad
, etc) into astats
module that could live directly in GeoUtils.Despite all of this, it would be good to further adapt the uncertainty functions, inputs/outputs, and arguments for ease of use and modularity... I've also been thinking about this recently to combine more easily with
Coreg
methods. The newDEM.estimate_uncertainty()
function (see new documentation) is a good starting point for this, but the outputs are not ideal...Here's a list of points based on your comments + other things I'm thinking of:
1. Variogram estimation/fitting/manipulation:
Note: In the new documentation, I chose to summarize the different options of 1+2 below as "Traditional" (single-range vario, homoscedastic), "Rolstad" (multi-range vario, homoscedastic) and "Hugonnet" (multi-range vario, heteroscedastic).
Instead of returning volatile empirical/modelled variogram as a
pd.Dataframe
andlist[str]
, we should use theskgstat.Variogram
class that encompasses both, now supports all of our specific routines (sum of variogram models, custom binning, etc) directly there, and can be used for quick plotting of the variogram!In short:
sample_empirical_variogram
,fit_sum_variogram_models
,get_variogram_model_func
,plot_variogram
fromspatialstats
and replace by a_variogram
function ingeoutils/stats/spatial
that can be called from the objects directly asRaster.variogram()
orPointCloud.variogram()
, and has all arguments to take care of all the details with robust defaults (subsampling, sum of models, estimator).estimate_model_spatial_correlation
, and replace by a new subclass ofVariogram
inxdem/uncertainty
:ErrorVariogram
, which would allow to movecorrelation_from_variogram
,covariance_from_variogram
, etc simply as class methods added on top, for methods more specific to uncertainty estimation with a variogram?2. Heteroscedasticity estimation:
Here potentially less modifications, we could simply move all the binning functions in a
geoutils/stats/binning
and a functionRaster.binning()
?But it is unclear in my mind what would be the best output type:
Heteroscedasticity
class object containingpd.Dataframe
from the binning, astring
of the method to apply the binning (linear interpolation, per bin?, with what min_count?), and a class methodto_error_map()
or similar which would allow to derive the finalRaster
of random errors from applying that binning?The first option allows to have everything in one place, but introduces a new object for the user to get familiar with.
3. Error propagation
Note: In the new documentation, for spatial propagation, I also summarize the options of 3 below as "Exact" (full covariance sum), "Rolstad" (circular approximation) and "Hugonnet" (fast approximation of the covariance sum based on subsampling).
With 1. and 2. sorted, we'd have two consistent inputs for 3!
Here, in the short term, we could rely on a slightly modified function
spatial_error_propagation
close to the current one (maybe as a class method too,DEM.propagate_uncertainty()
), which simply chooses one approximation method from a string (neff_xxx
functions), and applies the error propagation to any area/shapefile/mask (with more flexibility on the input than currently).In the longer term, it could be nice to use https://obsarray.readthedocs.io/en/latest/index.html. We'd need to coordinate with them to add spatial error propagation (I'm not sure exists yet), and it would then work implicitly on any operation we perform on a DEM or dDEM (spatial mean, sum, including with other data, etc).
Conclusion
I think these changes would help address most of your points, in particular for having a full object (
ErrorVariogram
orHeteroscedasticity
) that we could.plot()
and manipulate more easily, and pin good default values to it.What do you think?
The text was updated successfully, but these errors were encountered: