Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Geo] Add Well Known Text (WKT) Parsing Support to ShapeBuilders #27417

Merged
merged 1 commit into from
Dec 5, 2017

Conversation

nknize
Copy link
Contributor

@nknize nknize commented Nov 16, 2017

This PR adds WKT support to Geo ShapeBuilders.

This supports the following format:

POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))

closes #9120

@nknize nknize added :Analytics/Geo Indexing, search aggregations of geo points and shapes :Query DSL v6.1.0 v7.0.0 labels Nov 16, 2017
@nknize nknize requested a review from cbuescher November 16, 2017 23:00
@cbuescher cbuescher self-assigned this Nov 17, 2017
@nknize
Copy link
Contributor Author

nknize commented Nov 18, 2017

@elasticmachine please test this.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize thanks, I did a first round of review. Most changes in the builders look fine, however I think the WKT parser currently is a bit too lenient in what it accepts as valid WKT values (I left one example, but I think there are many more edge cases that are not currently tested and should be rejected).
It would also be great to test the parser not only with well formed values that the current builders output but add test for values that should be rejected for various reasons. Not sure how the current parser can be tweaked to do this, but I think it would be good to add this before this can be merged.


static {
BBOX = "BBOX";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you move this assignment to L281 for better readability or is this unsafe to use it in the following initialization then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for (GeoShapeType type : values()) {
shapeTypeMap.put(type.shapename, type);
// add 'BBOX' wkt name support for Envelope
if (type.equals(ENVELOPE)) {
shapeTypeMap.put(ENVELOPE.wktName().toLowerCase(Locale.ROOT), type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done just after the values() loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -300,6 +311,10 @@ public abstract ShapeBuilder getBuilder(CoordinateNode coordinates, DistanceUnit
ShapeBuilder.Orientation orientation, boolean coerce);
abstract CoordinateNode validate(CoordinateNode coordinates, boolean coerce);

public String wktName() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add some javadoc that the WKT identifier is the shapename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


public String toWKT() {
StringBuilder sb = new StringBuilder();
sb.append(type().wktName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, this will be lowercase, no? The examples in the issue and PR are all uppercase, does this matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the WKT spec, casing doesn't matter

sb.append(bottomRight.x);
sb.append(GeoWKTParser.COMMA);
sb.append(GeoWKTParser.SPACE);
// @todo support Z??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't really matter but mostly I see this as // TODO. Might be better to keep it similar to be able to grep similar things or for other tooling.

throws IOException, ElasticsearchParseException {
CoordinatesBuilder coordinates = new CoordinatesBuilder();
coordinates.coordinate(parseCoordinate(stream));
while (nextCloserOrComma(stream).equals(COMMA)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close or Comma will also again allow for things like "( 1, 2, 3) 4 )" being accepted, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up coordinate list parsing to better handle coordinates with and without parens.

if (nextEmptyOrOpen(stream).equals(EMPTY)) {
return null;
}
// todo orientation?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs to be done here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nothing. This is handled in the builder. Removing todo.

}
// todo orientation?
PolygonBuilder builder = new PolygonBuilder(parseLine(stream), ShapeBuilder.Orientation.RIGHT);
if (nextWord(stream).equals(LPAREN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if next word is something else than LPAREN?

// todo orientation?
PolygonBuilder builder = new PolygonBuilder(parseLine(stream), ShapeBuilder.Orientation.RIGHT);
if (nextWord(stream).equals(LPAREN)) {
while (nextCloserOrComma(stream).equals(COMMA)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem as above I think, only that unmatched closing brackets will create holes here.

case StreamTokenizer.TT_EOL: return EOL;
case StreamTokenizer.TT_NUMBER: return NUMBER;
}
return "'" + (char)stream.ttype + "'";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace

@nknize nknize force-pushed the feature/WKTParsing branch from 54b9036 to bdd1380 Compare November 22, 2017 20:35
@nknize
Copy link
Contributor Author

nknize commented Nov 22, 2017

Thanks for the review @cbuescher

I went ahead and added more strict parsing (to include better paren parsing for coordinate lists). I also added tests for comments and illegal geometry types. I think this is close. If you wouldn't mind giving another review. I will try to add geometry collection testing before committing.

Also, since this is a fresh feature, what we can do is set this as experimental for 6.1 and flush it out in bug fixes. Thoughts?

@nknize nknize force-pushed the feature/WKTParsing branch from 97b22a0 to fc6604d Compare November 23, 2017 04:47
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize I took another look and left two comments. In general I would still like to have more (and maybe not randomized, but simple edge cases) malformed tests, maybe a few typical cases per shape. Also Polygons with holes don't seem to get parsed correctly and I think tests are also missing for those cases.

RandomShapeGenerator.createShape(random(), RandomShapeGenerator.ShapeType.POLYGON));
Coordinate[] coords = builder.coordinates()[0][0];
LinearRing shell = GEOMETRY_FACTORY.createLinearRing(coords);
Polygon expected = GEOMETRY_FACTORY.createPolygon(shell, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also work with holes? I just tried parsing the second polygon example from the issue decription and it fails (POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))) I suspect the second argument is a hole so I was trying to see of this was tested, seems like this needs to be added still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good catch! I agree we should have tests for holes. This is a widely used use case. Will add that next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to add testing at elast one hole here? If no I'd prefer to do so. GeoJsonShapeParserTests#testParseMultiPolygon() seems to do this as well, maybe something similar can be added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an explicit test for polygons with holes below in testParsePolygonWithHole, are you asking for a random polygon with random hole? That's a bit more tricky to generate a random polygon with a random inscribed polygon as a hole. Probably not really necessary as long as we've tested at least one polygon with a hole?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw the explicit test and I was hoping for a randomized polygon/hole thing, but I just realized that also GeoJsonShapeParserTests doesn't do this fully randomized. So if it is difficult I'm also fine if there is only the explicit test.

}

private void assertExpected(Shape expected, ShapeBuilder builder) throws IOException {
boolean isMalformed = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer at least one good and one malformed case per test run. Otherwise we only catch bigger errors on multiple repetions and not e.g. on PR tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an explicit malformed test for common typos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test, I would still prefer to test both variants (a malformed one and a correct one) on each test run for each random shape. It should be relatively easy to add this to the test given that all the parts (random shape generation, adding malformed content) is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@nknize nknize added v6.2.0 and removed v6.1.0 labels Nov 23, 2017
@nknize nknize force-pushed the feature/WKTParsing branch 3 times, most recently from b80c0a4 to ae0a54d Compare November 27, 2017 18:46
@nknize
Copy link
Contributor Author

nknize commented Nov 27, 2017

@cbuescher just added explicit tests for the following:

  • polygon with hole (can add one for multi polygon w/ and w/o holes)
  • a malformed poly caused by common typos (can expand this test to include other common typos if we'd like).
  • self crossing polygon

Will update documentation but I think this is close to ready. Let me know what you think.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize thanks, I left two more comments regarding tests. Also, since you mentioned docs, I think at least a short info about the possibility to use WKT for shapes should be part of this PR as well.

}

private void assertExpected(Shape expected, ShapeBuilder builder) throws IOException {
boolean isMalformed = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test, I would still prefer to test both variants (a malformed one and a correct one) on each test run for each random shape. It should be relatively easy to add this to the test given that all the parts (random shape generation, adding malformed content) is there.

RandomShapeGenerator.createShape(random(), RandomShapeGenerator.ShapeType.POLYGON));
Coordinate[] coords = builder.coordinates()[0][0];
LinearRing shell = GEOMETRY_FACTORY.createLinearRing(coords);
Polygon expected = GEOMETRY_FACTORY.createPolygon(shell, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to add testing at elast one hole here? If no I'd prefer to do so. GeoJsonShapeParserTests#testParseMultiPolygon() seems to do this as well, maybe something similar can be added here?

@cbuescher
Copy link
Member

@nknize thanks for updating the test, if the polygon + hole randomization is tricky, let's leave that out. I think this is ready once some documentation is added.

}

/** gets the field name */
public String fieldName() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use get/setFieldName to make the method naming more consistent?

@@ -50,6 +53,16 @@
public static final String LATITUDE = "lat";
public static final String LONGITUDE = "lon";
public static final String GEOHASH = "geohash";
public static final ParseField FIELD_FIELD = new ParseField("field");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move all the BBOX parsing (ParseFields and static method) to the new GeoBBox class and make that one a top level class? Stuffing all of this code into the a Utils class doesn't feel optimal to me.

@@ -549,6 +630,32 @@ public boolean advanceExact(int target) throws IOException {
}
}

/** wraps a {@link Rectangle} to include an optional field name for queries */
public static class GeoBBox extends Rectangle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is only used by the GeoBoundingBoxQueryBuilder I'd probably make it an inner class there.

@@ -61,7 +61,13 @@
private GeoWKTParser() {}

public static ShapeBuilder parse(XContentParser parser)
throws IOException, ElasticsearchParseException {
throws IOException, ElasticsearchParseException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: weird indentation, could this throws clause move to the previous line

}

/** parse geometry from the stream tokenizer */
private static ShapeBuilder parseGeometry(StreamTokenizer stream, GeoShapeType shapeType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this extra method needed? I would combine it with the previous one that seems to be its only caller atm.

/** parse geometry from the stream tokenizer */
private static ShapeBuilder parseGeometry(StreamTokenizer stream, GeoShapeType shapeType)
throws IOException, ElasticsearchParseException {
switch (shapeType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a warning for a missing GEOMETRYCOLLECTION case label here. Even if that shouldn't be possible, maybe add this to the switch (and/or a default case) that throws.

/** wraps a {@link Rectangle} to include an optional field name for queries */
public static class GeoBBox extends Rectangle {
/** optional fieldname for queries */
private String fieldName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the fieldname as part of the parsed bbox structure. As far as I can see this gets only parsed when an inner field parameter is seen, the use of which seems both undocumented and it also doesn't get rendered when using the GeoBoundingBoxQueryBuilder#doXContent method. I have a hope that we can remove this maybe?

currentFieldName = parser.currentName();
token = parser.nextToken();
if (FIELD_FIELD.match(currentFieldName)) {
fieldName = parser.text();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this inner field parameter really still supported? I cannot find it mentioned in the docs and it doesn't seem to get written when using one of our builders. As far as I understand the whole bbox json should look like this:

"some.field.foo" : {
                        ... various formats defining the box ...
}

This is probably legacy and should be removed (probably in another PR?)

@cbuescher
Copy link
Member

@nknize I just did another round of reviews on the last commit adding WKT parsing support to the GeoBoundingBoxQueryBuilder. Does this need to be part of this PR? I agree it should be added to complete the WKT support (don't know what needs to be done with the other query builders that can contain shapes), but since the existing PR is already big as is (and was close to merging when the documentation is added) I'd probably do this in a separate PR.

@nknize nknize force-pushed the feature/WKTParsing branch from 0deb57c to 3c44e57 Compare November 30, 2017 23:45
@nknize
Copy link
Contributor Author

nknize commented Nov 30, 2017

@cbuescher I agree. After I made the push I figured it would be best to do it in a separate PR. So I'll go ahead and do that. Going to commit the updated docs here shortly and consider this PR done (after the doc review).

@nknize nknize force-pushed the feature/WKTParsing branch from 00afb21 to f27b2e2 Compare December 1, 2017 00:36
@nknize
Copy link
Contributor Author

nknize commented Dec 1, 2017

@cbuescher added some WKT documentation to the geo-shape.asciidoc. If you're good with this I'll commit to 6.2 and 7.0 and open a separate PR for adding WKT support to GeoBoundingBox queries. Then move back to Z value support
/cc @clintongormley

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize thanks for pulling out the query builder changes, left a short question for the latest docs commit and apparently some console example still needs fixing, but other than that it looks good to me. Will give the rest another quick look but I think once the docs test passes this is good to merge.

}
--------------------------------------------------
// CONSOLE
// TEST[skip:https://github.com/elastic/elasticsearch/issues/23836]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, does the same problem describe in the bug not apply to the polygon example above?

--------------------------------------------------
POST /example/doc
{
"location" : "POLYGON ((100.0 0.0, 101.0 0.0, 101.0 1.0, 100.0 1.0, 100.0 0.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradle :docs:check fails here, maybe because of the line break. Not sure how to fix though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the line break is the problem, it's ok to have long lines. The code blocks on the docs will scroll

/** parse geometry from the stream tokenizer */
private static ShapeBuilder parseGeometry(StreamTokenizer stream) throws IOException, ElasticsearchParseException {
final GeoShapeType type = GeoShapeType.forName(nextWord(stream));
switch (type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a warning here that GEOMETRYCOLLECTION is missing, looking at the added documentation this should be supported I think. If it is missing here as a type, we also need a test for it to catch this.

case ENVELOPE:
return parseBBox(stream);
}
throw new IllegalArgumentException("Unknown geometry type: " + type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might throw in a DEFAULT case instead.

assertValidException(builder, IllegalArgumentException.class);
}

// todo: add geocollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need that test.

@nknize nknize force-pushed the feature/WKTParsing branch from f389b08 to bde013c Compare December 3, 2017 03:38
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize I took a brief look at the added geometry collection support and left a small suggstion on improving the test. Also documentation tests still need fixing, I think after that this PR is ready.

List<Coordinate> multiPointCoords = randomLineStringCoords();
MultiPointBuilder multiPointBuilder = new MultiPointBuilder(multiPointCoords);
GeometryCollectionBuilder builder = new GeometryCollectionBuilder()
.shape(polygonBuilder).shape(lineStringBuilder).shape(multiPointBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be randomized so the GeometryCollectionBuilder has between 0 and e.g. 2 or 3 shapes?

@nknize
Copy link
Contributor Author

nknize commented Dec 4, 2017

Thanks for such a great review @cbuescher Looks like I've fixed the doc failures (at least on my test run, we'll see what elasticmachine has to say) and I added some randomization to GeometryCollection test. Pending everything passes green, I think this is good to go. Let me know what you think.

@nknize nknize force-pushed the feature/WKTParsing branch from c70f729 to 4f73780 Compare December 4, 2017 17:59
@nknize
Copy link
Contributor Author

nknize commented Dec 5, 2017

@elasticmachine please test this

// assert empty shape collection
assertEquals(shapeCollection(expected).isEmpty(), builder.build().isEmpty());
}
GeometryCollectionBuilder gcb = RandomShapeGenerator.createGeometryCollection(random());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have this, but it doesn't seem to test the no-shapes case. Would be nice to add this as another case (maybe wrapped in if (rarely()), we don't need to test that edge case very often.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize thanks for adding the randomization to the test, I left a small note regarding the test simplification. The rest looks good, and since the doc tests seem to pass I think this is ready to be merged once you get a clear CI pass.

@nknize nknize force-pushed the feature/WKTParsing branch from 63ed67b to c6c063a Compare December 5, 2017 15:31
This commit adds WKT support to Geo ShapeBuilders.

This supports the following format:

POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
BBOX (-10, 10, 10, -10)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))
GEOMETRYCOLLECTION (POINT (30 10), MULTIPOINT ((10 40), (40 30), (20 20), (30 10)))

closes elastic#9120
@nknize nknize force-pushed the feature/WKTParsing branch from c6c063a to 8bcf539 Compare December 5, 2017 16:59
@nknize nknize merged commit 8bcf539 into elastic:master Dec 5, 2017
@synhershko
Copy link
Contributor

Yay! thanks!

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
@jpountz jpountz removed the :Search/Search Search-related issues that do not fall into other categories label Jan 29, 2019
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support WKT syntax in geo_point and geo_shape
7 participants