-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEOS 3.7.1 to 3.7.2 tightening of validity needed for operations #1121
Comments
Changes to packages using rgeos and affected here should probably use:
|
Rather:
|
rgeos 0.5-1 submitted to CRAN (10:53 CEST), on CRAN 12:09 CEST - thanks to the CRAN team! |
Most likely an unintended consequence of https://trac.osgeo.org/geos/ticket/789 or https://trac.osgeo.org/geos/ticket/838. GEOS doesn't make any claims about what happens when its inputs are invalid, so I'm not sure if the project would consider it a regression or not. It's also not clear to me without spending more time understanding the R code which geometries are actually making it into a failing |
Thanks @dbaston ! I felt that the uncertainty about changes in GEOS, and whether they were intended or not, should be protected against. I agree that it is not easy to know which kinds of invalidity may have been digestable before 3.7.2. I did look at the commits but was no wiser from doing so. I suspect that perhaps before 3.7.2, things may have been too permissive. There are no reports that I can see on the GEOS mailing list of regressions, so I'm viewing this as GEOS tightening, probably to match its upstream JTS, so downstream should adapt - leading to the rgeos release a day ago. |
FWIW, the two failing inputs are:
and
This fails on recent JTS as well. |
OK, thanks. So current GEOS is consistent with current JTS; is it worth seeing whether JTS has always found these failing, and GEOS has now come into line, or whether they've mostly been synchronized? Anyway, useful to know that this is unlikely to have been a regression since these inputs also fail on JTS. |
Again, hard to call it a "regression" since the input is invalid, yet I'm curious what @dr-jts thinks. |
I agree that "regression" is the wrong term, I used tightening, which maybe feels like moving closer to JTS in how operations treat invalidity. |
Are the invalid inputs to GEOS the result of prior computation in some other algorithm? The GEOS/JTS requirement for valid geometry as input to operations is quite essential to provide efficiency and algorithmic simplicity. So if an upstream algorithm is producing invalid geometry, there needs to be an intermediate step to fix this. There's a spectrum of geometry invalid situations, which are more or less easy to fix. (For instance, self-touching rings have an obvious geometric interpretation, and are relatively simple to fix. Arbitrary self-intersection is harder to interpret and fix). |
Yes, the affected R packages are generating geometries before getting to the calls to GEOS topological operations that fail in 3.7.2 but not in 3.7.1 or earlier. It feels as though the cut-off point in the spectrum of invalidity has tightened. It isn't a problem, and has been worked around in rgeos by enforcing validity checking where rgeos is built using GEOS >= 3.7.2, and offering zero-width buffering internally as a possible repair step. The difficulty was in not seeing a specific notice in the release notes that this might occur. Once CRAN had noticed the failures and notified me as rgeos maintainer, it was just a matter of bisecting back to a plausible cause, and adding mitigating steps. By the way, it does show how effective CRAN's continuous testing is, contrasted with CI Travis-style - test on delta. CRAN tests everything against everything daily at the R and R package level, and has systems which update upstream packages (Debian, Fedora) like libcurl, etc., and including GEOS, GDAL, and PROJ. Then the CRAN team push notifications to package maintainers if they can identify the right person (here three failing packages all use rgeos built with the GEOS that had been updated). For R geospatial packages, we do track PROJ, GDAL and GEOS masters, but do not check other packages using facilities provided by say rgeos or sf. Single-threaded on my desktop, reverse dependency checks for rgeos take over three hours, and then need manual collation. @edzer maybe we need to create something less manual to be up to CRAN speed? |
It seems to be the requirement that the ring direction of the outer ring needs to be counter clockwise (CCW): library(sf)
# Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0
p = "MULTIPOLYGON (((1 5, 2 5, 2 4, 2 3, 1 3, 1 2, 2 2, 2 3, 3 3, 3 2, 3 1, 2 1, 1 1, 0 1, 0 2, 0 3, 0 4, 0 5, 1 5)), ((5 2, 5 1, 5 0, 4 0, 4 1, 4 2, 5 2)))"
st_as_sfc(p)[[1]]
# MULTIPOLYGON (((1 5, 2 5, 2 4, 2 3, 1 3, 1 2, 2 2, 2 3, 3 3, 3 2, 3 1, 2 1, 1 1, 0 1, 0 2, 0 3, 0 4, 0 5, 1 5)), ((5 2, 5 1, 5 0, 4 0, 4 1, 4 2, 5 2)))
sf:::check_ring_dir(st_as_sfc(p))[[1]]
# MULTIPOLYGON (((1 5, 0 5, 0 4, 0 3, 0 2, 0 1, 1 1, 2 1, 3 1, 3 2, 3 3, 2 3, 2 2, 1 2, 1 3, 2 3, 2 4, 2 5, 1 5)), ((5 2, 4 2, 4 1, 4 0, 5 0, 5 1, 5 2))) In I believe I've looked for this several times in the simple feature access (part 1) standard document, but couldn't find it stated as a requirement, more as an implicit assumption, that outer rings are CCW; first ring being exterior seems sufficient for disambiguation. For rings on the sphere it is more important, as they divide the sphere in two parts rather than having a natural outside, as on the plane (though heuristics could assume the smaller part is inside). |
The sp objects here are positive for
so as far as GEOS 3.7.2 is concerned, both are troubled. |
SFA specifies CCW orientation for exterior rings; from section 6.1.1.1:
GEOS actually uses the opposite convention for geometries that it produces, but is agnostic as to its inputs: http://www.tsusiatsoftware.net/jts/jts-faq/jts-faq.html#B6 All that to say that orientation isn't the issue here. The polygon is invalid because it has a self-touching exterior ring. This is the shapefile style for representing this geometry. GEOS expects the OGC representation, which would include a hole. That the operation succeeds in 3.7.1 but not 3.7.2 is not the result of a change in validity standards; it's just that it happened to work in 3.7.1 and no longer happens to work in 3.7.2. |
Thanks, that clarifies! But what then was the problem with
the failing input you reported above? |
There is no problem with that one. |
Thanks - I now understand your post. |
The requirement that rings not self-touch is actually just an optimization that allows skipping a scan of the input geometry to find self-nodes. It would be relatively easy to relax this, at the cost of somewhat reduced performance. Alternatively, the In fact, the forthcoming improved overlay algorithm will accept self-touching rings, due to the use of snap-rounding. So this problem may disappear again at some point. |
@dr-jts I noticed that as of 2011, JTS/GEOS overlay worked on both ESRI-valid and OGC-valid polygons: https://sourceforge.net/p/jts-topo-suite/mailman/message/27048423/ Though you pointed out that the behavior isn't guaranteed, if the behavior held true up to 3.7.1, I wonder if we should attempt to preserve it in the 3.7.x line. |
BTW, I would be happy to write a |
Yes, there was an optimization introduced at some point that caused self-touching rings to no longer work.
That might be nice. Although depends on what change caused this "regression". If it was the optimization mentioned above that should be easy to revert. If it is a side-effect of something else that might not be so easy to fix. |
Pretty sure it must be a side-effect of libgeos/geos@609e764 or libgeos/geos@3528071, though I didn't do incremental builds to see which. |
I've reconstituted the history of the issue which lead to this change (in JTS and GEOS). The original issue was GEOS-838, which presented two valid geometries whose union was invalid. JTS-107 has more discussion about this. JTS-257 is the fix implementation. The problem turned out to be a noding robustness issue, which caused the valid input linework to have a self-touch after noding. This caused the output to be invalid. The fix was to tighten up the internal overlay noding validation check to catch this situation. This has the side-effect of detecting (and failing) all self-touches in input geometry. Previously, vertex-vertex self-touches were not detected, and in many cases they would simply propagate through the overlay algorithm. (This made the output invalid as well, but since the inputs were already invalid this behaviour was considered acceptable). Some conclusions are:
|
Thanks very much for a comprehensive account. On the R side, we can take steps to convince package authors to check and correct validity before passing objects to rgeos and GEOS, and probably a blog linked from error messages in sf and rgeos. We can also advise use of the LWGEOM-based function Anyway, many thanks for this most helpful discussion! |
As rgeos maintainer (https://r-forge.r-project.org/projects/rgeos/), I was asked to look for causes of CRAN check failures for BayesX, birdring and inlmisc immediately following a system upgrade from GEOS 3.7.1 to 3.7.2 (the same problem is present in GEOS 3.8.0dev). I'm posting this issue to provide a reprex, and to document the resolution for rgeos 0.5-1 (rev. 603). The two WKT files (based on the error in inlmisc) are in this zipfile:
WKTs.zip
The initial script for CRAN releases of rgeos and sf is:
GEOS_3.7.2_3.7.1_test.zip
This gives the following output for the CRAN releases for GEOS 3.7.1 and GEOS 3.7.2:
script_output_3.7.1.txt
script_output_3.7.2.txt
As can be seen, GEOS 3.7.2 is stricter on topological operations than 3.7.1 was. This leads to failures which had not previously been seen for invalid geometries. On the hunch that a zero-width buffer might help, rgeos 0.5-1 (rev. 603):
install.packages("rgeos", repos="http://R-Forge.R-project.org")
and a modified script:
GEOS_3.7.2_3.7.1_test_2L.zip
now pass, informing that a geometry was invalid and that a zero-width buffer repair has been attempted; the issues in sf have not been addressed. I do not know whether other topology operations are affected, or whether predicates are affected (they do not seem to be so far).
The text was updated successfully, but these errors were encountered: