-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactoring] N-dimenstional Kriging #31
Comments
@rth This is a good idea and I did in fact think about it as this is essentially the The question is who is going to bell the cat? |
I am also very interested in this generalisation. |
This is a supremely excellent suggestion, especially as it would mean that future enhancements wouldn't have to be redundantly added to the various kriging classes. As long as everything we do is well-documented with plenty of examples, I think the existing user base would probably be ok with this change as well. (The documentation is one of the really useful parts according to users I've talked to.) Maybe we can keep the existing classes as-is just for backwards compatibility. The first big question I'd have about this is how to generalize the array input and handling to an arbitrary number of dimensions. Maybe we can just have one input array that would be MxN+1, with M as the number of measurements and N as the dimension of the space in which to krige. Then the last column could be the actual measurement values to krige. @rth, what do you think? Then it would probably be simple to use the scipy distance routines to calculate euclidian (or some other metric even?) distances between measurement points, and populating a kriging matrix would then be simple I think... |
Thanks for the feedback @bsmurphy @basaks @kvanlombeek and sorry for the late response.
So now PyKridge takes as input some
Yes, definitely, that would also simplify the calls to ND distance routines from scipy.
Well, we could keep the interface of the existing classes as they are, and progressively migrate the internals until we progressively transition from "separate krige methods + single scikit-learn compatible wrapper" to "single krige implementation + multiple krige wrappers that support the old interface". This would also present the advantage of this new method passing the old unit tests (so that we are confident of not breaking anything during refactoring). In practice this refactoring could take the steps described in the updated first post above. What do you think? |
@rth Thank you for putting this list together. |
This looks great @rth, thanks for getting going on this. I'll try to work on merging the statistics and variogram estimation business all together, as there were some other changes in those portions of the code that I wanted to make anyways... (as I side in another thread, statistics calculations are wrong, need to fix that) BTW, @rth (and anyone else who thinks about matrices, I don't think about them that much admittedly), I've been wondering if we could use LU decomposition on the kriging matrix to speed up the solution. Any thoughts? I haven't thought about this that much, but since the LHS matrix is the same (just the RHS vector that's changing), we might be able to leverage this for the looping backend... |
@rth |
OK, so I'm finally get the chance to set aside some solid time to work on this. I ultimately want to implement complex kriging, which would allow for kriging of vector field data. (So you could combine X and Y components of the vector field into a single complex number, then krige the complex number field. Since I work with geophysical EM field data, this would be really useful to me.) In terms of the refactor, the main differences with complex kriging come in the variogram estimation routine, so it would be beneficial to make things much more modular, as proposed in #56. All this is to say, here's what I'm thinking...
I can start working right away on refactoring the variogram stuff into a separate module and putting together a core |
Ok, here are the various building blocks that I'd envision. (A) A generic coordinates module object that would take the data coordinates and scale/rotate/project them into the appropriate coordinate system to calculate the distance between points. So this would include all the anisotropy stuff as well as the geographic coordinates capabilities. The raw data coordinates would go in, and a list of inter-point distances would come out. (In reality, that "list" might be a callable object, like at its core a wrapper around scipy's KDtree...) (B) A generic variogram model module that would take as input the inter-point distances and evaluate the specified variogram function on those distances. If users want more control over the variogram estimation they can iterate between A and B to adjust anisotropy parameters, etc. So, distances (probably in the form of object A) would go in, and the evaluated variogram function would come out. (C) A variogram plotting module that would sit on top of B, as proposed in #72 (and would also solve #63). (D) A model parameter estimation module that would also sit on top of B and provide the automatic variogram estimation stuff. Keeping this stuff in a separate object would make it easier to implement different variogram search routines and features (e.g., #54, #57, #29). (E) A model validation module that would also sit on top of B to calculate, e.g., the statistics (would address #52). (F) A drift terms object, which would be a user specified collection of drift terms to add into the estimation process. (G) A In principle only object A would need to know about exact coordinate locations, and everything else would just need to know inter-point distances. So all objects besides A would be intrinsically ND (which is what this issue was originally started to discuss...). Although for the local variogram stuff (#41) we might need to keep memory of point locations throughout the entire pipeline. Although, maybe in object A there could be some way of grouping points into multiple categories that would somehow each get their own object B, and then object G would be smart enough to use different object Bs in different parts of the kriging domain... The existing API would be refactored to be wrappers around these various objects. Thoughts? I'm trying to think of how to best arrange things to allow further extensions and improvements (e.g., #59 and the complex kriging stuff I mentioned...) |
Yes, if we implement a compatible transform for different components, I think it might we worth linking them together with a scikit-learn pipeline. We would save time no re-inventing something similar.
+1
I'm just a bit concerned about the amount of code to maintain, and overall code complexity in this project. If we manage to express these classes a function of the new processing pipeline, why not, but if we keep both old and new implementations side by side, I'm worried that it would be hard to manage. Will respond to your other message later today. |
Related : #133 |
Just a few thoughts about possible code refactoring in PyKrige.
Currently PyKrige has separate code that implements 2D and 3D kriging (e.g. in
ok.py
andok3d.py
), this result in code duplication and makes code maintenance and adding new features more difficult (they need to be added to every single file). Besides the 1D Kriging is not implemented, and it might have been nice to have some 1D examples as they are easier to visualize.In addition, PR #24 adds a scikit-learn API to PyKrige on top of the existing
UniversalKrigging
andOrdinaryKrigging
methods.A possible solution to remove the current code duplication, would be to refactor the
UniversalKrigging
andOrdinaryKrigging
to work in N-dimensions. The simplest way of doing it would be to use something along the lines of the scikit-learn API which will also remove the need for an additional wrapper for that. The general API could be something like,The case of
execute(style="masked", ...)
could be done by supporting masked arrays forpredict(X)
, while the caseexecute(style="grid", ...)
can be done with a helper function,which is mostly what is done internally by
execute
at present.This would break backward compatibility though, so a major version change would be needed (PyKrige v2).
Update: the refactoring could follow the steps below,
core.py
mergeadjust_for_anisotropy
andadjust_for_anisotropy_3d
into a single private function_adjust_for_anisotropy(X, center, scaling, angle)
where X is a[n_samples, n_dim]
array, and all the other are list of floats and, it returns theX_modified
array. (PR ND refactoring of adjust_for_anisotropy #33 )core.py
merge theinitialize_variogram_model
andinitialize_variogram_model_3d
into a single private function (PR Variogram Refactorization and Improvements #47 )core.py
merge thekrige
andkrige_3d
into a single private function_krige(X, y, coords, variogram_function, variogram_model_parameters)
(PR Refactor statistics calculations #51 )core.py
similarly mergefind_statistics*
(PR Refactor statistics calculations #51)OrdinaryKriging
would be great but it's already taken). MaybeOrdinaryNDKriging
?OrdinaryKriging
andOrdinaryKriging3D
to useOrdinaryNDKriging
internally.OrdinaryNDKriging
and add deprecation warnings onOrdinaryKriging
,OrdinaryKriging3D
What do you think?
The text was updated successfully, but these errors were encountered: