-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add from_wkt/to_wkt functions #50
ENH: add from_wkt/to_wkt functions #50
Conversation
src/io.cpp
Outdated
#else | ||
if (planar || oriented) { | ||
throw std::invalid_argument( | ||
"planar and oriented options are only available with s2geography >= 0.2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"planar and oriented options are only available with s2geography >= 0.2"); | |
"planar and oriented options are only available with s2geography >= 1.2"); |
(per above if
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have it correct now.. In any case, the s2geography version is certainly 0.2 (https://github.com/paleolimbot/s2geography/blob/9ebd7b106ba2ae30046cb4f3e9561a8bbdb18b9b/CMakeLists.txt#L6).
I initially did the #if
version check wrong, see de19f0e for correcting it.
We want that both version of 0.2 or 1.1 result in True, with the current version being 0.1 returning False, so that means major >= 1 OR minor >= 2
, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments, although overall it looks already good!
I'm not sure about the cost in performance of constructing (and destroying) new s2geography::WKTReader
(or s2geography::WKTWriter
) objects within the vectorized inner loops, but perhaps it would be preferable to handle that out of the loop (e.g., using a Functor)? I don't think it would be very useful to allow array values for planar
and/or oriented
.
|
||
to_wkt: _VFunc_Nin1_Nout1[Literal["to_wkt"], str, object] | ||
|
||
def from_wkt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is also a fully vectorized function, i.e., oriented
and planar
also accept array-like objects, right?
Or do we want to restrict oriented
and planar
to accepting each a single value? (and use a Functor in order to handle those arguments out of the vectorized call loop like for the predicate functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used a lambda in the m.def
so to only vectorize the main input data argument, and not oriented/planar.
src/io.cpp
Outdated
If set to True, the edges linestrings and polygons are assumed to | ||
be planar. In that case, additional points will be added to the line | ||
while creating the geography objects, to ensure every point is | ||
within 100m of the original line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make this distance threshold configurable? (Note: this can be addressed later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should (eg bigquery uses a default of 10m, I think), but was also thinking that this could be added later (although it is not actually difficult to add, because I already hardcode the 100m. I can maybe add this while refactoring to reuse the reader/writer object and to not vectorize those additional arguments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a tessellate_tol_m
keyword, mimicking R's s2 (https://r-spatial.github.io/s2/reference/s2_geog_point.html)
# if we force to not orient, we get an error | ||
with pytest.raises(RuntimeError, match="Inconsistent loop orientations detected"): | ||
spherely.from_wkt(polygon_with_bad_hole_wkt, oriented=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is there a great performance gain when setting oriented=True
and when we know that all ring are consistently oriented? I haven't checked the implementation in s2geography (and not much in s2geometry), but I assume that the orientations are checked anyway if such case may trigger raising an error.
Maybe are there other reasons of setting oriented=True
? I.e., when do we want to avoid implicit re-ordering of the rings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know (without trying out) if there is a performance difference, but I think that the oriented=True
does allow you to create different geographies? (if you have one that is larger than half of the globe, the default constructor would always normalize it to represent the smaller polygon)
And I can also imagine that just checking that the orientation of all loops are the same is still cheaper than a full normalization (but that is indeed something to check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).
Yeah, in this case the error comes from s2geography (in theory I could catch the error and re-throw it with a different class, because those RuntimeErrors are a bit strange)
Updated this now #51 is merged, I think this should be ready for a final review / merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor suggestion but otherwise it is good to go!
Thanks @jorisvandenbossche !
src/io.cpp
Outdated
ensure every point is within 100m of the original line. | ||
By default (False), it is assumed that the edges are spherical | ||
(i.e. represent the shortest path on the sphere between two points). | ||
tessellate_tol_m : float, default 100.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tessellate_tol_m : float, default 100.0 | |
tessellate_tol : float, default 100.0 |
(nit suggestion: _m
makes the parameter name slightly less readable IMHO and its meaning may not be obvious at a first glance without checking the docs anyway... _meters
would be more meaningful but would also make the parameter name too long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That keyword name was taken from R s2 (https://r-spatial.github.io/s2/reference/s2_geog_point.html), but I am certainly fine with dropping the _m
as well. I agree that from just seeing the keyword name (without reading the explanation for what it stands), it is not necessarily clear that this stands for "meters", making it an unnecessary suffix.
(to be honest, I am also not convinced that "tessellation" is the best name. BigQuery uses it, but if you google for it, it is typically about dividing a plane into multiple shapes that nicely cover it without overlap, e.g. https://en.wikipedia.org/wiki/Edge_tessellation and https://en.wikipedia.org/wiki/Tessellation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm more familiar with the latter meaning (dividing a surface into multiple shapes), but I'm fine with the name here.
Another name could be densify_pts_tol
or densify_tol
(pyproj uses densify_pts for similar problems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in the pyproj case it is (I assume) a uniform/regular densification, while s2 does a non-uniform (only adding points where needed to preserve the tolerance)
I renamed the keyword to tessellate_tolerance
(making it a bit longer, but also more readable I think than the "tol", and then it is also consistent with the C++ argument)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to merge this, so I can fixup conflicts with the geoarrow PR. But happy to change the keyword name again in a quick follow-up.
This adds vectorized
from_wkt
/to_wkt
functions.In addition to the geoarrow IO in #49, this is still useful to have so users can convert to/from WKT with plain numpy arrays as well instead of having to go through Arrow (and use some arrow library).