Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support spatial fields in field retrieval API. #59821

Merged
merged 9 commits into from
Jul 22, 2020

Conversation

jtibshirani
Copy link
Contributor

Although we accept a variety of formats during indexing, spatial data is
returned in a single consistent format. This is GeoJSON by default, but
well-known text is also supported by passing the option format: wkt.

Note that points (in addition to shapes) are returned in GeoJSON by default. The
reasoning is that this gives better consistency, and is the most convenient
format for most REST API users.

@@ -36,18 +37,18 @@
private static final ParseField Y_FIELD = new ParseField("y");
private static final ParseField Z_FIELD = new ParseField("z");

protected float x;
protected float y;
protected double x;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, coordinates could change unexpectedly in a roundtrip like WKT -> CartesianPoint -> WKT. It seems common to represent points using doubles even though they're indexed as floats -- for example the Point geometry uses doubles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good Point. I do not see a problem with this. maybe @nknize has an opinion here

Formatter<Parsed> geometryFormatter = mappedFieldType.geometryFormatter();

Parsed geometry;
try (XContentParser parser = new MapXContentParser(NamedXContentRegistry.EMPTY, LoggingDeprecationHandler.INSTANCE,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like the easiest way to integrate with the existing logic but is far from ideal:

  • We perform parsing even when it's not necessary: even when the geometry is already in the right format, we still parse it to an object re-serialize it.
  • We also awkwardly translate between maps and xContent.

Let me know if you have suggestions around a better approach. Overall I was hoping to keep this PR reasonably scoped, since it is part of a large 'field retrieval' change that already has a few moving parts. But we could plan a larger refactor in a separate/ follow-up change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty yikes, but I do like the idea of keeping the work contained on the branch and doing a larger refactoring after merging. Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup. That way we don't need to do the xcontent -> map -> xcontent dance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to specialize between points and shapes? At least for Shapes, we should be able to tell whether value is an object or a string, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup

I think this would be challenging given the current design of SourceLookup, since it is operates only on the level of xContent and is completely agnostic to the mappings. It would be nice though if SourceLookup could provide direct access to the xContent representation instead of always parsing + returning generic objects. This feels like a bigger change that's better suited for a follow-up though?

At least for Shapes, we should be able to tell whether value is an object or a string, right?

This would work, I'll look into it further! I'm inclined to not worry about it though if it makes the design more complicated (specifically splitting points vs. shapes handling).

@jtibshirani jtibshirani added :Analytics/Geo Indexing, search aggregations of geo points and shapes :Search Foundations/Mapping Index mappings, including merging and defining field types >feature labels Jul 19, 2020
@jtibshirani jtibshirani marked this pull request as ready for review July 19, 2020 21:47
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 19, 2020
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right way to go for the branch but it'll be important to get a refactor in soon after to clear out the yikes method. I'll leave more review for @talevy and @imotov .

docs/reference/mapping/types.asciidoc Outdated Show resolved Hide resolved
Formatter<Parsed> geometryFormatter = mappedFieldType.geometryFormatter();

Parsed geometry;
try (XContentParser parser = new MapXContentParser(NamedXContentRegistry.EMPTY, LoggingDeprecationHandler.INSTANCE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty yikes, but I do like the idea of keeping the work contained on the branch and doing a larger refactoring after merging. Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup. That way we don't need to do the xcontent -> map -> xcontent dance.

@@ -610,4 +617,14 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
}
}

public static Map<?, ?> toXContentMap(Geometry geometry) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth javadoc. I was confused by the method name because I usually think of map and xcontent as mutually exclusive representations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, XContent here is just an implementation detail. it is a normal Map, right?

Copy link
Contributor Author

@jtibshirani jtibshirani Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clean this up. Here I was using toXContentMap to mean 'return the JSON representation of the geometry as a Java map'. But I see how this is confusing given there's already a toXContent method ...

@jtibshirani jtibshirani requested review from talevy and imotov July 20, 2020 20:50
Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, would like to make a plan / gain resolution on the "yikes" method Nik.

@@ -82,6 +87,11 @@
Parsed parse(XContentParser parser, AbstractGeometryFieldMapper mapper) throws IOException, ParseException;
}

public interface Formatter<Parsed> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some javadoc for this?

@@ -36,18 +37,18 @@
private static final ParseField Y_FIELD = new ParseField("y");
private static final ParseField Z_FIELD = new ParseField("z");

protected float x;
protected float y;
protected double x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good Point. I do not see a problem with this. maybe @nknize has an opinion here

@@ -610,4 +617,14 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
}
}

public static Map<?, ?> toXContentMap(Geometry geometry) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, XContent here is just an implementation detail. it is a normal Map, right?

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the concern about mapping as well, I think we can easily add a separate visitor to generate a map directly. I can help with this part. I also wonder if we can make it a bit more flexible in terms of formats and instead of introducing GeoShapeFormatter interface we could extend GeometryFormat interface with toGenericObject() method, which will produce String for WKT and Map for Json and possibly support other formats in the future. We will have to add an alternative GeometryParser,geometryFormat() method that will return the format not only for XContentParser but also for a string.

@jtibshirani
Copy link
Contributor Author

Thanks everyone for the reviews. Here are my planned next steps:

  • I'll try to generalize the notion of a 'format', building off the GeometryFormat concept.
  • I'll look into avoiding parsing + serializing shapes if they're already in the right format.
  • We'll eliminate the redundancy when returning values, where we do geometry -> xContent -> map. @imotov and I will work together in a follow-up PR to clean this up.

@jtibshirani
Copy link
Contributor Author

A heads up that I will rebase field-retrieval onto master to pull in some refactors, then rebase and force-push this branch.

Although we accept a variety of formats during indexing, spatial data is
returned in a single consistent format. This is GeoJSON by default, but
well-known text is also supported by passing the option 'format: wkt'.

Note that points (in addition to shapes) are returned in GeoJSON by default. The
reasoning is that this gives better consistency, and is the most convenient
format for most REST API users.
This gives more predictable values when parsing and formatting a point. It
matches the behavior for GeoPoint and the Point geometry.
@jtibshirani
Copy link
Contributor Author

@talevy @imotov this is now ready for another look. It may be easiest to review each new commit separately.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of really minor comments, otherwise LGTM.

}

@Override
public Map<?, ?> toObject(Geometry geometry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe return Object here as well for consistency. I don't think the fact that it returns Map is significant here.

Copy link
Contributor Author

@jtibshirani jtibshirani Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes like using covariance like this to clarify that a subclass always returns a specific type. I don't feel strongly about it though, happy to change it. We can always change it back later if we find it helpful for unit testing, etc. to have the exact type.

*
* For example, the GeoJson format returns the geometry as a map, while WKT returns a string.
*/
Object toObject(ParsedFormat geometry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if somebody can come up with a better name here :) maybe to toXContentObject() or toXConentValue() or something like this. I feel like toObject is misleadingly generic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously had a method called toXContentMap that @talevy and @nik9000 found confusing, because we often refer to xContent and maps as distinct representations. Perhaps a name like toXContentAsObject could work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, oi. since there are javadocs explaining it now, I drop my naming argument. I'm good with whatever sounds good to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot say I love toXContentAsObject but I like it the best comparing to all other versions.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@jtibshirani
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/default-distro

@jtibshirani
Copy link
Contributor Author

I checked with @imotov and he is good with merging.

@jtibshirani jtibshirani merged this pull request into elastic:field-retrieval Jul 22, 2020
@jtibshirani jtibshirani deleted the fetch-geo-fields branch July 22, 2020 22:07
jtibshirani added a commit that referenced this pull request Jul 23, 2020
Although we accept a variety of formats during indexing, spatial data is
returned in a single consistent format. This is GeoJSON by default, but
well-known text is also supported by passing the option 'format: wkt'.

Note that points (in addition to shapes) are returned in GeoJSON by default. The
reasoning is that this gives better consistency, and is the most convenient
format for most REST API users.
@jtibshirani
Copy link
Contributor Author

One last update: @imotov plans to raise a PR next week to avoid the geometry -> xContent -> map translation. I've already raised the PR #55363 to merge the feature branch, so that refactor will be done as a follow-up against master. I'll ensure that the feature doesn't ship in 7.10 without this improvement (filing a new 'blocking' issue if needed).

jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jul 27, 2020
Although we accept a variety of formats during indexing, spatial data is
returned in a single consistent format. This is GeoJSON by default, but
well-known text is also supported by passing the option 'format: wkt'.

Note that points (in addition to shapes) are returned in GeoJSON by default. The
reasoning is that this gives better consistency, and is the most convenient
format for most REST API users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >feature :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants