-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Data processing pipelines #56
Comments
I think this is a really good idea, especially as it can cut down a lot on redundancies in the code. You're right, all of the kriging classes rely on a few similar building blocks. Abstracting and consolidating the various building blocks would make code maintenance and future enhancements much easier. In line with what you're suggesting, I was actually thinking that, regarding issue #52, maybe it would be best to define a separate method/set of methods to calculate the statistics if the user feels the urge to do so. And now that I think about it, having an abstract kriging base class would be good, too (for example, OK is just a special case of UK, they share a lot of things, so breaking things into distinct building blocks would be good from this point of view). So I like this idea from that point of view as well. I'm not very familiar with the Also, +1 for cleaning up the interface... all the kwargs are starting to seriously clutter things up... |
Seems like a good idea! Then it should be possible without problems to seperate coordinate transformations. Also, geographic coordinates would be implemented in UK. |
Currently, the statistics do rely purely on the variogram/distances, as in the case of OK. That's not technically correct tho, so it's an area for improvement... |
Here are some more thoughts... I do like the As I've said here and elsewhere, there are a lot of improvements that could be made to the code. I think isolating individual building blocks in this manner would make implementing those improvements much, much easier (and probably make contributing easier?). So maybe here's what I'd suggest, or envision as a start....
What do you guys think? I can start on an abstract |
@bsmurphy I agree with this. We could keep a global wrapper methods (the real issue is the long and repeated docstrings there, but if we can find a way to generate those from the individual blocks it would quite simple to do). In the ND refactorization, I was thinking that we might maybe able to mostly keep backward compatibility, but honestly it might be simpler to make a release of v1.4 now, and say that in v2.0 we are going to break backward compatibility with the new API (also because we don't have much development resources for all of it). We could always make a v1.4.1 bug fixing release later. So for instance the approach could be something as follows,
What do you think? |
In line with code reorganization in issue #33 , I was wondering what's your opinion on data pipelines. I could be wrong but, I looks like what the current OridnaryKrigging etc.. could be separated in several independent steps,
One could then potentially create, for instance,
AnisotropyTransformer
,CoordinateTransformer
,KrigingEstimator
classes (or some other names) and get the results by constructing a pipeline usingsklearn.pipeline.Pipeline
orsklearn.pipeline.make_pipeline
. The advantage of this being that the different transformersI'm not sure if that would be useful. Among other things this could depend on how much more options we might end up adding. For instance the Universal Krigging class currently has 16 input parameters which is already quite a bit. If we end up adding, say local variogram search radius, kriging search radius and a few others, splitting the processing into several steps might be a way to simplify the interface for the users.
Just a though... What do you think?
The text was updated successfully, but these errors were encountered: