-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Geo] Add Well Known Text (WKT) Parsing Support to ShapeBuilders #27417
Conversation
@elasticmachine please test this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize thanks, I did a first round of review. Most changes in the builders look fine, however I think the WKT parser currently is a bit too lenient in what it accepts as valid WKT values (I left one example, but I think there are many more edge cases that are not currently tested and should be rejected).
It would also be great to test the parser not only with well formed values that the current builders output but add test for values that should be rejected for various reasons. Not sure how the current parser can be tweaked to do this, but I think it would be good to add this before this can be merged.
|
||
static { | ||
BBOX = "BBOX"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you move this assignment to L281 for better readability or is this unsafe to use it in the following initialization then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
for (GeoShapeType type : values()) { | ||
shapeTypeMap.put(type.shapename, type); | ||
// add 'BBOX' wkt name support for Envelope | ||
if (type.equals(ENVELOPE)) { | ||
shapeTypeMap.put(ENVELOPE.wktName().toLowerCase(Locale.ROOT), type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done just after the values() loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -300,6 +311,10 @@ public abstract ShapeBuilder getBuilder(CoordinateNode coordinates, DistanceUnit | |||
ShapeBuilder.Orientation orientation, boolean coerce); | |||
abstract CoordinateNode validate(CoordinateNode coordinates, boolean coerce); | |||
|
|||
public String wktName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add some javadoc that the WKT identifier is the shapename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
public String toWKT() { | ||
StringBuilder sb = new StringBuilder(); | ||
sb.append(type().wktName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, this will be lowercase, no? The examples in the issue and PR are all uppercase, does this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the WKT spec, casing doesn't matter
sb.append(bottomRight.x); | ||
sb.append(GeoWKTParser.COMMA); | ||
sb.append(GeoWKTParser.SPACE); | ||
// @todo support Z?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doesn't really matter but mostly I see this as // TODO. Might be better to keep it similar to be able to grep similar things or for other tooling.
throws IOException, ElasticsearchParseException { | ||
CoordinatesBuilder coordinates = new CoordinatesBuilder(); | ||
coordinates.coordinate(parseCoordinate(stream)); | ||
while (nextCloserOrComma(stream).equals(COMMA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close or Comma will also again allow for things like "( 1, 2, 3) 4 )" being accepted, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up coordinate list parsing to better handle coordinates with and without parens.
if (nextEmptyOrOpen(stream).equals(EMPTY)) { | ||
return null; | ||
} | ||
// todo orientation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What needs to be done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, nothing. This is handled in the builder. Removing todo.
} | ||
// todo orientation? | ||
PolygonBuilder builder = new PolygonBuilder(parseLine(stream), ShapeBuilder.Orientation.RIGHT); | ||
if (nextWord(stream).equals(LPAREN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if next word is something else than LPAREN?
// todo orientation? | ||
PolygonBuilder builder = new PolygonBuilder(parseLine(stream), ShapeBuilder.Orientation.RIGHT); | ||
if (nextWord(stream).equals(LPAREN)) { | ||
while (nextCloserOrComma(stream).equals(COMMA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same problem as above I think, only that unmatched closing brackets will create holes here.
case StreamTokenizer.TT_EOL: return EOL; | ||
case StreamTokenizer.TT_NUMBER: return NUMBER; | ||
} | ||
return "'" + (char)stream.ttype + "'"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace
54b9036
to
bdd1380
Compare
Thanks for the review @cbuescher I went ahead and added more strict parsing (to include better paren parsing for coordinate lists). I also added tests for comments and illegal geometry types. I think this is close. If you wouldn't mind giving another review. I will try to add geometry collection testing before committing. Also, since this is a fresh feature, what we can do is set this as experimental for 6.1 and flush it out in bug fixes. Thoughts? |
97b22a0
to
fc6604d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize I took another look and left two comments. In general I would still like to have more (and maybe not randomized, but simple edge cases) malformed tests, maybe a few typical cases per shape. Also Polygons with holes don't seem to get parsed correctly and I think tests are also missing for those cases.
RandomShapeGenerator.createShape(random(), RandomShapeGenerator.ShapeType.POLYGON)); | ||
Coordinate[] coords = builder.coordinates()[0][0]; | ||
LinearRing shell = GEOMETRY_FACTORY.createLinearRing(coords); | ||
Polygon expected = GEOMETRY_FACTORY.createPolygon(shell, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also work with holes? I just tried parsing the second polygon example from the issue decription and it fails (POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
) I suspect the second argument is a hole so I was trying to see of this was tested, seems like this needs to be added still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good catch! I agree we should have tests for holes. This is a widely used use case. Will add that next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to add testing at elast one hole here? If no I'd prefer to do so. GeoJsonShapeParserTests#testParseMultiPolygon() seems to do this as well, maybe something similar can be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explicit test for polygons with holes below in testParsePolygonWithHole
, are you asking for a random polygon with random hole? That's a bit more tricky to generate a random polygon with a random inscribed polygon as a hole. Probably not really necessary as long as we've tested at least one polygon with a hole?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw the explicit test and I was hoping for a randomized polygon/hole thing, but I just realized that also GeoJsonShapeParserTests doesn't do this fully randomized. So if it is difficult I'm also fine if there is only the explicit test.
} | ||
|
||
private void assertExpected(Shape expected, ShapeBuilder builder) throws IOException { | ||
boolean isMalformed = randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer at least one good and one malformed case per test run. Otherwise we only catch bigger errors on multiple repetions and not e.g. on PR tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an explicit malformed test for common typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test, I would still prefer to test both variants (a malformed one and a correct one) on each test run for each random shape. It should be relatively easy to add this to the test given that all the parts (random shape generation, adding malformed content) is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
b80c0a4
to
ae0a54d
Compare
@cbuescher just added explicit tests for the following:
Will update documentation but I think this is close to ready. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize thanks, I left two more comments regarding tests. Also, since you mentioned docs, I think at least a short info about the possibility to use WKT for shapes should be part of this PR as well.
} | ||
|
||
private void assertExpected(Shape expected, ShapeBuilder builder) throws IOException { | ||
boolean isMalformed = randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test, I would still prefer to test both variants (a malformed one and a correct one) on each test run for each random shape. It should be relatively easy to add this to the test given that all the parts (random shape generation, adding malformed content) is there.
RandomShapeGenerator.createShape(random(), RandomShapeGenerator.ShapeType.POLYGON)); | ||
Coordinate[] coords = builder.coordinates()[0][0]; | ||
LinearRing shell = GEOMETRY_FACTORY.createLinearRing(coords); | ||
Polygon expected = GEOMETRY_FACTORY.createPolygon(shell, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to add testing at elast one hole here? If no I'd prefer to do so. GeoJsonShapeParserTests#testParseMultiPolygon() seems to do this as well, maybe something similar can be added here?
@nknize thanks for updating the test, if the polygon + hole randomization is tricky, let's leave that out. I think this is ready once some documentation is added. |
} | ||
|
||
/** gets the field name */ | ||
public String fieldName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use get/setFieldName to make the method naming more consistent?
@@ -50,6 +53,16 @@ | |||
public static final String LATITUDE = "lat"; | |||
public static final String LONGITUDE = "lon"; | |||
public static final String GEOHASH = "geohash"; | |||
public static final ParseField FIELD_FIELD = new ParseField("field"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move all the BBOX parsing (ParseFields and static method) to the new GeoBBox class and make that one a top level class? Stuffing all of this code into the a Utils class doesn't feel optimal to me.
@@ -549,6 +630,32 @@ public boolean advanceExact(int target) throws IOException { | |||
} | |||
} | |||
|
|||
/** wraps a {@link Rectangle} to include an optional field name for queries */ | |||
public static class GeoBBox extends Rectangle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is only used by the GeoBoundingBoxQueryBuilder I'd probably make it an inner class there.
@@ -61,7 +61,13 @@ | |||
private GeoWKTParser() {} | |||
|
|||
public static ShapeBuilder parse(XContentParser parser) | |||
throws IOException, ElasticsearchParseException { | |||
throws IOException, ElasticsearchParseException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: weird indentation, could this throws clause move to the previous line
} | ||
|
||
/** parse geometry from the stream tokenizer */ | ||
private static ShapeBuilder parseGeometry(StreamTokenizer stream, GeoShapeType shapeType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extra method needed? I would combine it with the previous one that seems to be its only caller atm.
/** parse geometry from the stream tokenizer */ | ||
private static ShapeBuilder parseGeometry(StreamTokenizer stream, GeoShapeType shapeType) | ||
throws IOException, ElasticsearchParseException { | ||
switch (shapeType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a warning for a missing GEOMETRYCOLLECTION case label here. Even if that shouldn't be possible, maybe add this to the switch (and/or a default case) that throws.
/** wraps a {@link Rectangle} to include an optional field name for queries */ | ||
public static class GeoBBox extends Rectangle { | ||
/** optional fieldname for queries */ | ||
private String fieldName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the fieldname as part of the parsed bbox structure. As far as I can see this gets only parsed when an inner field
parameter is seen, the use of which seems both undocumented and it also doesn't get rendered when using the GeoBoundingBoxQueryBuilder#doXContent method. I have a hope that we can remove this maybe?
currentFieldName = parser.currentName(); | ||
token = parser.nextToken(); | ||
if (FIELD_FIELD.match(currentFieldName)) { | ||
fieldName = parser.text(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this inner field
parameter really still supported? I cannot find it mentioned in the docs and it doesn't seem to get written when using one of our builders. As far as I understand the whole bbox json should look like this:
"some.field.foo" : {
... various formats defining the box ...
}
This is probably legacy and should be removed (probably in another PR?)
@nknize I just did another round of reviews on the last commit adding WKT parsing support to the GeoBoundingBoxQueryBuilder. Does this need to be part of this PR? I agree it should be added to complete the WKT support (don't know what needs to be done with the other query builders that can contain shapes), but since the existing PR is already big as is (and was close to merging when the documentation is added) I'd probably do this in a separate PR. |
0deb57c
to
3c44e57
Compare
@cbuescher I agree. After I made the push I figured it would be best to do it in a separate PR. So I'll go ahead and do that. Going to commit the updated docs here shortly and consider this PR done (after the doc review). |
00afb21
to
f27b2e2
Compare
@cbuescher added some WKT documentation to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize thanks for pulling out the query builder changes, left a short question for the latest docs commit and apparently some console example still needs fixing, but other than that it looks good to me. Will give the rest another quick look but I think once the docs test passes this is good to merge.
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[skip:https://github.com/elastic/elasticsearch/issues/23836] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, does the same problem describe in the bug not apply to the polygon example above?
-------------------------------------------------- | ||
POST /example/doc | ||
{ | ||
"location" : "POLYGON ((100.0 0.0, 101.0 0.0, 101.0 1.0, 100.0 1.0, 100.0 0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradle :docs:check
fails here, maybe because of the line break. Not sure how to fix though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the line break is the problem, it's ok to have long lines. The code blocks on the docs will scroll
/** parse geometry from the stream tokenizer */ | ||
private static ShapeBuilder parseGeometry(StreamTokenizer stream) throws IOException, ElasticsearchParseException { | ||
final GeoShapeType type = GeoShapeType.forName(nextWord(stream)); | ||
switch (type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a warning here that GEOMETRYCOLLECTION is missing, looking at the added documentation this should be supported I think. If it is missing here as a type, we also need a test for it to catch this.
case ENVELOPE: | ||
return parseBBox(stream); | ||
} | ||
throw new IllegalArgumentException("Unknown geometry type: " + type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might throw in a DEFAULT case instead.
assertValidException(builder, IllegalArgumentException.class); | ||
} | ||
|
||
// todo: add geocollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need that test.
f389b08
to
bde013c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize I took a brief look at the added geometry collection support and left a small suggstion on improving the test. Also documentation tests still need fixing, I think after that this PR is ready.
List<Coordinate> multiPointCoords = randomLineStringCoords(); | ||
MultiPointBuilder multiPointBuilder = new MultiPointBuilder(multiPointCoords); | ||
GeometryCollectionBuilder builder = new GeometryCollectionBuilder() | ||
.shape(polygonBuilder).shape(lineStringBuilder).shape(multiPointBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be randomized so the GeometryCollectionBuilder has between 0 and e.g. 2 or 3 shapes?
Thanks for such a great review @cbuescher Looks like I've fixed the doc failures (at least on my test run, we'll see what elasticmachine has to say) and I added some randomization to GeometryCollection test. Pending everything passes green, I think this is good to go. Let me know what you think. |
c70f729
to
4f73780
Compare
@elasticmachine please test this |
// assert empty shape collection | ||
assertEquals(shapeCollection(expected).isEmpty(), builder.build().isEmpty()); | ||
} | ||
GeometryCollectionBuilder gcb = RandomShapeGenerator.createGeometryCollection(random()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to have this, but it doesn't seem to test the no-shapes case. Would be nice to add this as another case (maybe wrapped in if (rarely())
, we don't need to test that edge case very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize thanks for adding the randomization to the test, I left a small note regarding the test simplification. The rest looks good, and since the doc tests seem to pass I think this is ready to be merged once you get a clear CI pass.
63ed67b
to
c6c063a
Compare
This commit adds WKT support to Geo ShapeBuilders. This supports the following format: POINT (30 10) LINESTRING (30 10, 10 30, 40 40) BBOX (-10, 10, 10, -10) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOINT (10 40, 40 30, 20 20, 30 10) MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20))) GEOMETRYCOLLECTION (POINT (30 10), MULTIPOINT ((10 40), (40 30), (20 20), (30 10))) closes elastic#9120
c6c063a
to
8bcf539
Compare
Yay! thanks! |
This PR adds WKT support to Geo ShapeBuilders.
This supports the following format:
closes #9120