-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store envelopes rather than S2 cells. #518
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some CLI helpers for encoding/decoding envelopes for debugging/diagnostics:
# encoded index value to envelope
$ kart spatial-tree index --decode-envelope=<envelope-hex>
174.0,-36.9,174.1,-36.8
# envelope to encoded index value
$ kart spatial-tree index --encode-envelope=174.0,-36.9,174.1,-36.8
c0ffee5a4a0ff2348a17e4c1b5febe8035c0ffee
# blob oid to envelope
$ kart spatial-tree index --feature-envelope=<blob-oid>
Native envelope: 1000,100,2000,200
Dataset CRS:
EPSG:2193 transform: unknown id, Inverse of New Zealand Transverse Mercator 2000 + NZGD2000 to WGS 84 (1), 1 m
→ segmentized at 1%=100m, buffered 0.5%=50m: 950,50,2050,250
→ 174.0,-36.9,174.1,-36.8
EPSG:27200 transform: unknown id, Inverse of New Zealand Map Grid + NZGD49 to WGS 84 (3)
→ segmentized at 1%=100m, buffered 0.5%=50m: 950,50,2050,250
→ 178.0,-36,178.1,-35.9
Final envelope: 174.0,-36.9,178.1,-35.9
Encoded: c0ffee5a4a0ff2348a17e4c1b5febe8035c0ffee
I mostly addressed the comments but what I haven't yet done, and will do tomorrow, is check what the input envelopes we are getting from GPKG / OGR are, how they handle meridian issues, and convert them appropriately into the W,S,E,N envelopes that I'm using now. Right now it just assumes W = min-x and E = max-x, which seems to work as long as there are no meridian-crossing geometries. |
We should add a couple of antimeridian datasets to our unit test data. @craigds got anything small+handy? |
semi-OT: OGR has an |
Added some code to handle anti-meridian issues. Still haven't handled long lines that need segmenting |
1eb0d2c
to
8cc6330
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there 😃
kart/spatial_tree.py
Outdated
|
||
def get_envelope_for_indexing(geom, transforms): | ||
|
||
def anticlockwise_ring_from_minmax_envelope(minmax_envelope): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce an invalid geometry for a single point, since LinearRings can't have duplicate points except for the first/last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly it seems to work anyway, but it might be a bit brittle - for instance the shoelace calculation will return 0 and I'm just lucky that my code is apparently okay with that - so I've now special cased points.
return normalised * (max_value - min_value) + min_value | ||
|
||
|
||
def get_envelope_for_indexing(geom, transforms, feature_oid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one needs tests... and I think you'll find we don't always get useful envelopes from GPKG geometries because the spec defines them as minx,maxx... these two would have the same envelope:
POLYGON((170 10, -170 10, -170 11, 170 11, 170 10)) → (-170, 10, 170, 11)
POLYGON((-170 10, 0 10, 170 10, 170 11, 0 11, -170 11, -170 10)) → (-170, 10, 170, 11)
So if it's defined in a geographic CRS and it's not a point we'll need to access the underlying geometry object:
More complications: if a ring is wound a different direction... we can't obviously differentiate between
POLYGON((170 10, -170 10, -170 11, 170 11, 170 10))
20° wide box spanning the antimeridian (counterclockwise)POLYGON((170 10, -170 10, -170 11, 170 11, 170 10))
160° wide box spanning the prime meridian (clockwise)
And lines & multi-geometries don't have a winding direction at all...
LINESTRING(170 10, -170 10)
— 20° "long" line or 160°?MULTIPOINT((170 10, -170 10))
— 20° spacing or 160°?MULTILINESTRING((170 10, 171 10), (-170 10, -171 10))
— 18° spacing or 160°?MULTIPOLYGON(((170 10, 171 10, 171 11, 170 11, 170 10)), (-170 10, -170 11, -171 11, -171 10, -170 10)))
— 18° spacing or 160°?
(I presume by now you have a piece of paper in front of you with lots of boxes and lines and arrows drawn on it? 👍 Or a globe)
Unfortunately ESRI & OGC differ on the winding direction of exterior rings, so both are common.
We can make useful assumptions though that generally hold up, like points are likely to be joined by the shortest distance (ie. +170°→-170° is more likely to go East than West). And obviously if points go <-180° or >+180° that helps too.
Worst case we have to define it as -180°→180° (or null) so the feature always gets included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went looking for actual GPKGs that cross the antimeridian and I found that they had values >180. I assumed that this was the proper way of encoding geometries in GPKG, since it is also the one the only one that makes sense with min-x and max-x envelopes that GPKG requires:
- if you have a polygon that has 4 different x values: [170, 175, 185, 190] then the min and max x values are 170 and 190, which is useful
- for the same geometry, if you store those x values as [170, 175, -175, -170] then the min and max x values are -170 and 170, which are useless - those values are not at the outside of the shape, and they don't tell us if the shape crosses the antimeridian or not, so there's nothing at all we can use them for.
However, you are right - just because it wouldn't make sense to encode a single geometry in GPKG which is split into two halves, one near 180 and the other near -180, because that breaks the min and max envelopes, doesn't mean that nobody is doing this.
I propose the following:
- I construct an anticlockwise envelope around the min and max values of the geometry in its native CRS.
(This can mean I sometimes get a useless envelope for geometries that are split by the antimeridian - eg spanning from -179 to 179 - and sometimes I will get a useful envelope eg 179 to 181, depending on how the geometry is encoded). - I transform the envelope into EPSG:4326
- I shift parts of the envelope by 360 degrees until the envelope has anticlockwise winding order.
(So far the algorithm is unchanged) - I test if the resulting envelope spans more than 180 degrees of longitude. If it does, I abandon the attempt to find a meaningful envelope and return None, so this geometry will match all spatial filters.
Why this should work:
The code I have is working for input geometries that don't cross the antimeridian, and that are contiguous across the antimeridian (eg spanning 179 to 181) - and the fact that these geometries are split in two halves post-transform I can handle fine - but my code won't work for input geometries that are split into two halves, since then (min-x, max-x) will be something like (-179, 179), which looks the same to me as a geometry that covers the whole globe except for a tiny gap near the antimeridian. However, since its width is 358 degrees, I just give up, so the end result will be fine.
The only case where this wouldn't work - where I won't detect that the shape has been split and simply give up - is if there is a single geometry that crosses the antimeridian with a single line segment which is > 180 degrees of longitude wide, but that doesn't seem like a real world geometry.
Why I want to do it this way:
This way keeps the efficiency of the current method - the rare case which is too hard to index using the current method, we just don't index. Inspecting underlying geometries to try and recalculate their true envelopes when they have been split across the antimeridian could require a) a deeper knowledge of the various CRSs than I currently have, and a lot more code for handling odd geometries and b) algorithms that are O(n^2) where n = num_points, running on the individual geometries with all of their complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a little bit more about this: the one case I said I can't handle correctly above is actually not a real-case since it looks identical from our point of view to a smaller geometry with shorter line segments that doesn't cross the antimeridian. The only way we could distinguish between the two would be to check the winding order of the raw geometry, but since we don't know what the winding order is supposed to be, we just have to assume the smaller geometry with smaller line segments is the right interpretation. So, I've convinced myself that this approach is adequate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this the other day but I've only just read it written down. What you've written is a good approach I think. WRT supporting large sparse-vertexed boxes that cover more than a hemisphere: as you've mentioned, there's no way to handle those without knowing the intended winding order in advance. Luckily, no one ever does that. But if they do, tough :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that this was the proper way of encoding geometries in GPKG, since it is also the one the only one that makes sense with min-x and max-x envelopes that GPKG requires:
It's kind of up to the input geometry whether it goes past >180° or <-180° longitude... but geometries that use "normalised" coordinates certainly aren't wrong/invalid. There's no requirement in GPKG/anywhere else for extending beyond those limits. Since geometries are projected vertex-by-vertex, a projected CRS geometry transformed into 4326 will always have vertexes in the range (-180° <= x <= 180°).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me.
The only case where this wouldn't work - where I won't detect that the shape has been split and simply give up - is if there is a single geometry that crosses the antimeridian with a single line segment which is > 180 degrees of longitude wide, but that doesn't seem like a real world geometry.
This is basically what MS SQL Server does, fwiw, and rejects any geometries with vertexes more than a hemisphere apart — your points need to be <=180° apart. I think what you have is a good approach.
Lots of tests to make sure that the behaviour works as documented :-)
lots of stuff from LINZ Hydro; e.g. [a small crop of] Depth contour polyline (Hydro, 1:1.5mil and smaller) ; which stretches from roughly +145°E to +215°E |
We can get this in a couple of different CRS? eg: 3994 & 3832 are projected CRS that span the antimeridian |
Also encodes envelopes using 20 bits per value. Together, these changes make the index 25x smaller than it currently is (or 3x smaller than the best tuned S2 index that I have seen). Doesn't update the git filter extension, so cloning won't work yet. This is therefore part 1 of a 2 or 3 part change.
Still doesn't handle possible long-line issues - assumes straight lines are still straight lines even when reprojected, so assumes the corners of bounding boxes are sufficient for reprojecting a bounding box.
Added more tests
Removes libs2 and s2_py from the vendor build. Also updates spatial-filter test data to have an envelope index instead of S2 index.
84b8205
to
e1a6fb5
Compare
@olsen232 did we end up with some proper anti-meridian data here? eg: the Hydro datasets in EPSG:3994. AFAICS the only test-data changes were swapping s2 for envelopes in the existing dataset? |
Yeah I kinda pushed forward with this and started doing some indexing... I've gone back and added tests using the dataset Craig linked too now, using two different CRS: 3994 and 3832 |
... when writing the index used for spatially-filtered cloning.
Also encodes envelopes using 20 bits per value.
Together, these changes make the index 25x smaller than it currently
is (or 3x smaller than the best tuned S2 index that I have seen).
Doesn't update the git filter extension, so cloning won't work yet.
This is therefore part one of a two or three part change.
Also doesn't improve reprojections - there are still theoretical (and probably one day practical issues) with precision loss due to reprojecting envelopes, rather than geometries. This will also be addressed in a future commit.
Related links:
#456
Checklist: