Replies: 6 comments
-
PR opened. #973. Reviewers please consider the above (@thomcom @isVoid @trxcllnt @jarmak-nv ). I will wait a few days to see if any other opinions exist before we merge. |
Beta Was this translation helpful? Give feedback.
-
Agree on Point 1 - when specs are defined I'd like to stay with them (unless we have a very good reason for breaking). If someone is using cuSpatial supported IO, would that prevent the user error you provided from happening? If yes, only expert users are likely to run into it which is not a big concern IMO. Point 2 - No opinion. Whatever is better for the cuSpatial engineering team is my preference. |
Beta Was this translation helpful? Give feedback.
-
My understanding is that even for a 0-length list column, the offset should still have a single element (0). In [16]: pa.ListArray.from_arrays([0], [])
Out[16]:
<pyarrow.lib.ListArray object at 0x7f757e739cc0>
[]
I'm slightly in favor of being explicit on closed polygon. I see the benefit of saving polygon storage data, but only in the case where the geometry dataset consists of mainly simple polygons. In mapping use case this might not be the most frequent use (e.g. state boundaries). In case it is a complicated polygon, we would have to pay extra overhead by loading two very remote vertex, breaking data locality and voids cache. In existing standards, SFA is explicit on closed polygons. Part of the reason is that
Shapely detects if user input is closed, and closes it if not: In [4]: list(Polygon([(0, 0), (1, 0), (1, 1)]).exterior.coords)
Out[4]: [(0.0, 0.0), (1.0, 0.0), (1.0, 1.0), (0.0, 0.0)] And throws if wkt is not closed: In [9]: shapely.from_wkt('POLYGON ((0 0, 1 0, 1 1))')
---------------------------------------------------------------------------
GEOSException Traceback (most recent call last)
Cell In[9], line 1
----> 1 shapely.from_wkt('POLYGON ((0 0, 1 0, 1 1))')
File ~/compose/etc/conda/cuda_11.8.0/envs/rapids/lib/python3.10/site-packages/shapely/io.py:286, in from_wkt(geometry, on_invalid, **kwargs)
282 raise TypeError("on_invalid only accepts scalar values")
284 invalid_handler = np.uint8(DecodingErrorOptions.get_value(on_invalid))
--> 286 return lib.from_wkt(geometry, invalid_handler, **kwargs)
GEOSException: IllegalArgumentException: Points of LinearRing do not form a closed linestring |
Beta Was this translation helpful? Give feedback.
-
Ahh, so this means we would only be able to enforce that any offsets array is >= 1 element, and a single element implies an empty list. Your data on closed-ness is interesting. I wonder if there are any counterpoints -- do any geospatial libraries / standards work fine with non-closed geometry? |
Beta Was this translation helpful? Give feedback.
-
More data:
geoJSON requires closed rings:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for your positions everybody. As I work on At an abstract level I think that closed polygons and unclosed polygons are different objects, that is, that closed-ness actually specifies polygonality. Obviously an unclosed polygon is a LineString, like in the above |
Beta Was this translation helpful? Give feedback.
-
I would like to discuss requirements that cuSpatial should and should not enforce for polygon data. See #962 .
As I wrote in that issue:
There are two aspects.
N+1 offsets: Offsets arrays (polygons and rings) have one more element than the number of items in the array. The last offset is always the sum of the previous offset and the size of that element. For example the last value in the ring offsets array is the last ring offset plus the number of rings in the last polygon. This is actually mentioned in the Arrow Variable-Size Binary Layout, but not in the GeoArrow specification itself. I think it bears repeating or at least referencing in the GeoArrow spec.
Open/Closed rings: By this I mean whether the last vertex in any ring is the same as the first vertex. This means the minimum number of vertices in a ring is 4. This is not mentioned in the GeoArrow specification, but all of the concrete examples satisfy this point. After some discussion, I believe that GeoArrow is agnostic on this point.
My opinions on this are based on discussion with the cuSpatial team.
In my opinion we should comply with Arrow on point 1 and enforce it in our APIs to the extent that we can without introspecting data. Specifically, this means that all offset arrays should either have zero (empty) or at least two elements. We can't enforce the values of the offsets without introspecting data. Also, if because we get the number of items from the size of the offsets arrays, if the user intends$N$ polygons and passes $N$ offsets, we will assume they intended $N-1$ polygons. If the user intends 1 polygon ring with 5 vertices and they polygon ring offsets of
[0, 4]
then we will end up ignoring the last vertex of their polygon ring, possibly computing incorrect results (e.g. if they passed an unclosed ring). We will have to treat this as user error.In my opinion we should be agnostic on point 2. This means that we accept polygon rings that are either closed or open. Our current algorithms that take polygonal input data (various forms of point-in-polygon and polygon bounding box computation)are agnostic to open / closed polygons -- they work fine with either. Future algorithms may or may not be agnostic, but I think we can always add logic to detect open polygons and re-use the first vertex as the last vertex, or to detect closed polygons and ignore the last vertex, as needed. Being agnostic on this point allows users to save storage, bandwidth, and GPU memory when processing large polygonal datasets, especially those with small polygons (e.g. triangles). Requiring closed triangles increases vertex data by 33% (4/3).
Please share your thoughts.
Beta Was this translation helpful? Give feedback.
All reactions