Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Geography and add (scalar) creation functions #51

Merged
merged 57 commits into from
Oct 22, 2024

Conversation

benbovy
Copy link
Owner

@benbovy benbovy commented Oct 4, 2024

This PR refactors the Geography (sub)classes in spherely and adds geography creation scalar functions.

Supersedes #26.
Closes #41.
Closes #27.

  • Goal is to remove the Geography subclasses (Point, Polygon, etc.): only one Geography wrapper class remains
  • Geography type (and also whether the geography is empty or not) is inferred at object creation
  • Geography objects are created via the top level creation functions (only scalar for now except for points)

TODO:

  • fix segfault when creating a geography collection itself including collection(s)
  • add multipolygon creation function
  • add EMPTY geographies creation function overloads (+ tests errors when trying to build MULTI geographies from EMPTY geographies)
  • move geography creation tests into test_creation.py
  • raise when trying to create linestring with one point only

benbovy added 23 commits March 14, 2023 15:01
- seem to be also the case in Shapely
- remove last vertex if it is equal to the 1st one
- still raise error in other duplicate vertices cases
Also added a ``spherelt::Geography::clone()`` method, which makes easier
creating collections from copies of input objects (needed by the
``s2geography::GeographyCollection`` which owns its features).

TODO: implement clone for LinearRing and GeographyCollection.
Unfortunately, the repr fails for a GeographyCollection that includes a
LinearRing, since the latter is not supported in s2geography WKT writer.
A workaround could be to have our own version of GeographyCollection WKT
writer here but it doesn't make much sense to re-implement the whole
thing.
Refactor clone:

- won't need spherely::Geography::clone
- instead use a template function spherely::clone_s2geography with SFINAE

Not complete and still needs some clean-up
We'll see later if we really need it...

It was a bit of a special case in s2geometry, which was not supported in
s2geography.
Create points from an array-like of lng, lat coordinates.
Copy link
Collaborator

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not yet ready, was just already taking a quick look.

One thought regarding the scope of this PR, since it is a big one: maybe you could limit this PR to only cover what essentially already existed (to get the current tests (updated for new function names) passing). For example just provide scalar creation functions for linestring/polygon, and leave the vectorized creation functions (for everything other than point) for a follow-up.

src/creation.cpp Outdated Show resolved Hide resolved
@benbovy
Copy link
Owner Author

benbovy commented Oct 9, 2024

Yes I've been working further on this (last updates not uploaded yet) but that might be too much of a change indeed. I'll restrict this PR to the refactoring of Geography wrapper classes.

Also I'm not exactly sure whether we should closely reuse Shapely's API for creation functions. I find that API convoluted with the mixing of a unique coords argument vs. separate x, y (and z) arguments as well as the different behavior depending on indices. We should probably reuse the same API for compatibility although it makes things more cumbersome to implement.

@jorisvandenbossche
Copy link
Collaborator

Yeah, the vectorized creation functions in shapely are quite complex, and I am also not sure if they are always that useful (like creating polygons from a single 3D array, which I think in practice is only useful to generate some random triangles or to create bboxes), and agreed we should carefully think about which API we exactly want here.
That's also the reason I mentioned leaving out the vectorized creation functions for now, and only providing the scalars ones (that already exist and that are used in the tests). For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

@benbovy
Copy link
Owner Author

benbovy commented Oct 9, 2024

For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

Yes. For a single array this is currently limited to 2D (i.e., not a true ufunc like in shapely) but that's fine for now. I don't think that we can use pybind11.vectorize to extend support to a single nD input array (pybind/pybind11#1294).

I wouldn't be too hard I think to mimic the behavior of a ufunc by relying on pybind11's array view & indexing capabilities (+ some manual computation of nD indices), though. That would be even easier to let xtensor(-python) takes care of all of that (also for creating temporary arrays of geographies on the C++ side while releasing the GIL), but I'm not sure if adding such build dependencies is worth it.

@benbovy
Copy link
Owner Author

benbovy commented Oct 15, 2024

This is getting close. I'd just like to investigate the segfault when creating nested geography collections before making this PR ready (EDIT: fixed segfault).

A few other comments that we can address later:

  • Empty geographies: there is no overload function to create MULTI geographies without argument (or with None) ; also when empty or 1-length sequences are given the multi-types are demoted (see Singleton Multi-types automatically demoted paleolimbot/s2geography#17)
  • The collection type is named GeographyType::GeographyCollection but both the Geography repr and WKT have the GEOMETRYCOLLECTION name. Do we want to uniformize things and rename GeographyType::GeographyCollection to GeographyType::GeometryCollection?
  • Full polygon not supported (yet)
  • It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that s2geography::PolygonGeography reuse the same underlying S2Polygon for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?
  • No test added for vectorised point creation functions, but the API may not be final.
  • Scalar creation function naming: still no strong opinion, although I would lean towards either point(), linestring(), etc. or create_point(), create_linestring(), etc.

@benbovy benbovy marked this pull request as ready for review October 15, 2024 19:47
Copy link
Collaborator

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/creation.cpp Outdated Show resolved Hide resolved
src/creation.hpp Outdated Show resolved Hide resolved
Comment on lines 50 to 51
Geography(Geography&& geog) : m_s2geog_ptr(std::move(geog.m_s2geog_ptr)) {
// std::cout << "Geography move constructor called: " << this <<
// std::endl;
extract_geog_properties();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the performance overhead of extract_geog_properties would be, but should we allow to create Geography() with a given geography type (like for clone_s2geography you have a variant that infers vs receives the type, we could have the same for extract_geog_properties). For certain functions (like creation functions), you always know the type in advance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I should have commented that on the case below (creating from s2geography), because for the line above in theory we could propagate the attributes from the passed Geography object? (like is done in the move operator implementation. Or if you want to avoid this overhead, just use move instead?)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I should have commented that on the case below (creating from s2geography), because for the line above in theory we could propagate the attributes from the passed Geography object?

Yes indeed we should simply propagate the attributes in this case.

In the case of creating from an s2geography object when we know the geography type, we could indeed provide the type explicitly instead of inferring it (and therefore avoid the inference overhead). We certainly want to avoid passing inconsistent s2geography object type vs. geography type, though, in order to prevent odd behavior or segfaults happening later. Checking for this consistency at runtime would be pretty similar to extract_geog_properties in terms of performance overhead I think. If we really want to optimize this, perhaps we could use template constructors + SFINAE or if constexpr...

tests/test_creation.py Show resolved Hide resolved
tests/test_creation.py Show resolved Hide resolved
tests/test_creation.py Outdated Show resolved Hide resolved
src/geography.cpp Outdated Show resolved Hide resolved
tests/test_creation.py Show resolved Hide resolved
@jorisvandenbossche
Copy link
Collaborator

This is getting close.

A few other comments that we can address later:

Agreed! Let's indeed get this in (to have the basics working, and will also be easier for other PRs adding features), and the things below can be handled in follow-ups.

Personally I am fine with that limitation for now, especially given that s2geography essentially does not make the distinction between single and multi geometries anyway (as you linked), and that it is only for output (like WKT, or asking for the type) that the distinction is introduced, based on the number of elements stored in the object.

  • The collection type is named GeographyType::GeographyCollection but both the Geography repr and WKT have the GEOMETRYCOLLECTION name. Do we want to uniformize things and rename GeographyType::GeographyCollection to GeographyType::GeometryCollection?

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue? (or at least when we add a prefix like create_.., that might be sufficient)

  • It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that s2geography::PolygonGeography reuse the same underlying S2Polygon for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?

A multipolygon with duplicate rings also sounds like "wrong" .. (in shapely / GEOS, that would be an invalid geometry anyway, and thus give unexpected results), so I am personally fine with not allowing to create it.

Although I suppose at some point we should allow to create invalid geographies, end then have a method to check if they are valid / what the reason is for the invalidity (for example s2_make_polygon has a check keyword to disable validation when constructing a polygon)

This is also another case (together with the geometrycollection / geographycollection above) where it points the the mismatch between the "simple features" model we try to emulate and how s2 works.
For the constructor functions you implemented here, it is nice to make the distinction between all the types, I think (as it gives a more user friendly interface), but for accessors like the geography type, we could also stay more closely to s2geography and have a function that just returns point/line/polygon/collection

  • Scalar creation function naming: still no strong opinion, although I would lean towards either point(), linestring(), etc. or create_point(), create_linestring(), etc.

Let's open a separate issue for this and see if we can get some more opinions

No need to store outer shell loops ids in a vector (just count how many
loops there are) for inferring polygon vs. multipolygon.
TODO: we'll need to decide whether we stick with s2geography's behavior
or if we adapt it to match shapely's behavior.
That is a bit unfortunate but GeographyCollection is not defined in WKT.
No need to infer geography properties.
@benbovy
Copy link
Owner Author

benbovy commented Oct 21, 2024

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue?

I renamed GeographyType::GeographyCollection -> GeographyType::GeometryCollection and geography_collection() to collection(), I also think it looks nicer for now.

@jorisvandenbossche jorisvandenbossche merged commit a1f86f9 into main Oct 22, 2024
15 checks passed
@jorisvandenbossche jorisvandenbossche deleted the add-geometry-creation-functions branch October 22, 2024 08:06
@jorisvandenbossche
Copy link
Collaborator

OK, merged this, so we can move forward with other PRs and follow-ups. Thanks @benbovy!

@benbovy
Copy link
Owner Author

benbovy commented Oct 22, 2024

👍 thanks for the reviews @jorisvandenbossche !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Usage of lat/lng vs lng/lat in constructors Exposing various Geography subclasses?
2 participants