API: move from object-based design to function-based #340
Replies: 4 comments
-
I'm always for progress and improvements, in general. I'm going to give this more thought, as it will have implications for the refactoring of |
Beta Was this translation helpful? Give feedback.
-
I am afraid that your evenings are not feasible for me here :D. Weekends should be fine. |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
I have created a POC in #406 to discuss on top of the actual code. |
Beta Was this translation helpful? Give feedback.
-
I want to propose a big API overhaul, moving away from the current object-based API to a simpler function-based one.
Current situation
At the moment, the functionality can be split into several groups:
The majority of characters takes input variables and their main output is a single pandas Series. The usual pattern looks like this (note the
.series
):Some characters result in multiple Series'. Those were initially the main reason to implement object-based API.
Characters measured by the
graph
module use functions. They take a network, measure stuff and save the result as a node or edge attribute of a given network and return its copy with that attribute.Classes that are used to generate new geometries (like
Tessellation
) return a new GeoDataFrame but also Series' with IDs in some cases to link the resulting gdf to the original one.Proposal
I would like to simplify all that and turn existing classes into functions returning numpy arrays. There are a few reasons for that:
.series
attribute is annoying and there's almost no value in other attributes we currently providedask_geopandas
compatibility - while you can use existing classes withinmap_partitions
, it is not straightforward, especially for classes with multiple outputs.esda
So, the proposed changes to the groups listed above:
A simple function that returns a numpy array.
A function that returns a tuple of arrays (for example like
gdf.sindex.query_bulk
does)networkx.Graph
These do not require many changes. I would just introduce
inplace
keyword to avoid unnecessary copying. But due to the nature of the underlying data structure, they will always be a bit different in behaviour.In most cases, these can be turned into functions. In the case of tessellation, I would even split it into two -
morphological_tessellation
andenclosed_tessellation
. When we return IDs, we can again return a tuple. However, this change is not as needed and we can keep the existing API if we come to that conclusion.No change needed here.
This proposal has significant API implications as it is going to break everything. But I think that we should do that anyway and sooner it happens, the less trouble it will cause. We should make a transition period long enough to let everyone adapt to a new API before we remove it.
I want to hear your thoughts on that as it entails a pretty significant refactoring and changes.
cc @jGaboardi @ljwolf @amorfinv @pysal/devs
Beta Was this translation helpful? Give feedback.
All reactions