Support spatial fields in field retrieval API. #59821

jtibshirani · 2020-07-18T02:24:20Z

Although we accept a variety of formats during indexing, spatial data is
returned in a single consistent format. This is GeoJSON by default, but
well-known text is also supported by passing the option format: wkt.

Note that points (in addition to shapes) are returned in GeoJSON by default. The
reasoning is that this gives better consistency, and is the most convenient
format for most REST API users.

jtibshirani · 2020-07-18T02:31:16Z

x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/common/CartesianPoint.java

Without this change, coordinates could change unexpectedly in a roundtrip like WKT -> CartesianPoint -> WKT. It seems common to represent points using doubles even though they're indexed as floats -- for example the Point geometry uses doubles.

good Point. I do not see a problem with this. maybe @nknize has an opinion here

jtibshirani · 2020-07-18T02:42:22Z

server/src/main/java/org/elasticsearch/index/mapper/AbstractGeometryFieldMapper.java

This seemed like the easiest way to integrate with the existing logic but is far from ideal:

We perform parsing even when it's not necessary: even when the geometry is already in the right format, we still parse it to an object re-serialize it.

We also awkwardly translate between maps and xContent.

Let me know if you have suggestions around a better approach. Overall I was hoping to keep this PR reasonably scoped, since it is part of a large 'field retrieval' change that already has a few moving parts. But we could plan a larger refactor in a separate/ follow-up change.

This is pretty yikes, but I do like the idea of keeping the work contained on the branch and doing a larger refactoring after merging. Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup. That way we don't need to do the xcontent -> map -> xcontent dance.

would it make sense to specialize between points and shapes? At least for Shapes, we should be able to tell whether value is an object or a string, right?

Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup

I think this would be challenging given the current design of SourceLookup, since it is operates only on the level of xContent and is completely agnostic to the mappings. It would be nice though if SourceLookup could provide direct access to the xContent representation instead of always parsing + returning generic objects. This feels like a bigger change that's better suited for a follow-up though?

At least for Shapes, we should be able to tell whether value is an object or a string, right?

This would work, I'll look into it further! I'm inclined to not worry about it though if it makes the design more complicated (specifically splitting points vs. shapes handling).

elasticmachine · 2020-07-19T21:47:42Z

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

elasticmachine · 2020-07-19T21:47:44Z

Pinging @elastic/es-search (:Search/Mapping)

nik9000

I think this is the right way to go for the branch but it'll be important to get a refactor in soon after to clear out the yikes method. I'll leave more review for @talevy and @imotov .

docs/reference/mapping/types.asciidoc

nik9000 · 2020-07-20T13:15:25Z

server/src/main/java/org/elasticsearch/index/mapper/AbstractGeometryFieldMapper.java

This is pretty yikes, but I do like the idea of keeping the work contained on the branch and doing a larger refactoring after merging. Maybe it'd be cleaner to push the ability to parse these fields into SourceLookup. That way we don't need to do the xcontent -> map -> xcontent dance.

nik9000 · 2020-07-20T13:18:10Z

server/src/main/java/org/elasticsearch/common/geo/GeoJson.java

Probably worth javadoc. I was confused by the method name because I usually think of map and xcontent as mutually exclusive representations.

also, XContent here is just an implementation detail. it is a normal Map, right?

I will clean this up. Here I was using toXContentMap to mean 'return the JSON representation of the geometry as a Java map'. But I see how this is confusing given there's already a toXContent method ...

talevy

Overall LGTM, would like to make a plan / gain resolution on the "yikes" method Nik.

talevy · 2020-07-20T22:26:59Z

server/src/main/java/org/elasticsearch/index/mapper/AbstractGeometryFieldMapper.java

should we add some javadoc for this?

talevy · 2020-07-20T22:29:10Z

x-pack/plugin/spatial/src/main/java/org/elasticsearch/xpack/spatial/common/CartesianPoint.java

good Point. I do not see a problem with this. maybe @nknize has an opinion here

talevy · 2020-07-20T22:38:15Z

server/src/main/java/org/elasticsearch/common/geo/GeoJson.java

also, XContent here is just an implementation detail. it is a normal Map, right?

imotov

I share the concern about mapping as well, I think we can easily add a separate visitor to generate a map directly. I can help with this part. I also wonder if we can make it a bit more flexible in terms of formats and instead of introducing GeoShapeFormatter interface we could extend GeometryFormat interface with toGenericObject() method, which will produce String for WKT and Map for Json and possibly support other formats in the future. We will have to add an alternative GeometryParser,geometryFormat() method that will return the format not only for XContentParser but also for a string.

jtibshirani · 2020-07-21T17:35:08Z

Thanks everyone for the reviews. Here are my planned next steps:

I'll try to generalize the notion of a 'format', building off the GeometryFormat concept.
I'll look into avoiding parsing + serializing shapes if they're already in the right format.
We'll eliminate the redundancy when returning values, where we do geometry -> xContent -> map. @imotov and I will work together in a follow-up PR to clean this up.

jtibshirani · 2020-07-21T17:37:49Z

A heads up that I will rebase field-retrieval onto master to pull in some refactors, then rebase and force-push this branch.

Although we accept a variety of formats during indexing, spatial data is returned in a single consistent format. This is GeoJSON by default, but well-known text is also supported by passing the option 'format: wkt'. Note that points (in addition to shapes) are returned in GeoJSON by default. The reasoning is that this gives better consistency, and is the most convenient format for most REST API users.

This gives more predictable values when parsing and formatting a point. It matches the behavior for GeoPoint and the Point geometry.

jtibshirani · 2020-07-21T21:58:03Z

@talevy @imotov this is now ready for another look. It may be easiest to review each new commit separately.

imotov

Left a couple of really minor comments, otherwise LGTM.

imotov · 2020-07-22T18:31:28Z

server/src/main/java/org/elasticsearch/common/geo/GeoJsonGeometryFormat.java

+    }
+
+    @Override
+    public Map<?, ?> toObject(Geometry geometry) {


Maybe return Object here as well for consistency. I don't think the fact that it returns Map is significant here.

I sometimes like using covariance like this to clarify that a subclass always returns a specific type. I don't feel strongly about it though, happy to change it. We can always change it back later if we find it helpful for unit testing, etc. to have the exact type.

imotov · 2020-07-22T18:35:43Z

server/src/main/java/org/elasticsearch/common/geo/GeometryFormat.java

+     *
+     * For example, the GeoJson format returns the geometry as a map, while WKT returns a string.
+     */
+    Object toObject(ParsedFormat geometry);


I wonder if somebody can come up with a better name here :) maybe to toXContentObject() or toXConentValue() or something like this. I feel like toObject is misleadingly generic here.

I previously had a method called toXContentMap that @talevy and @nik9000 found confusing, because we often refer to xContent and maps as distinct representations. Perhaps a name like toXContentAsObject could work?

haha, oi. since there are javadocs explaining it now, I drop my naming argument. I'm good with whatever sounds good to you!

I cannot say I love toXContentAsObject but I like it the best comparing to all other versions.

talevy

thanks!

jtibshirani · 2020-07-22T21:07:21Z

@elasticmachine run elasticsearch-ci/default-distro

jtibshirani · 2020-07-22T22:06:54Z

I checked with @imotov and he is good with merging.

Although we accept a variety of formats during indexing, spatial data is returned in a single consistent format. This is GeoJSON by default, but well-known text is also supported by passing the option 'format: wkt'. Note that points (in addition to shapes) are returned in GeoJSON by default. The reasoning is that this gives better consistency, and is the most convenient format for most REST API users.

jtibshirani · 2020-07-23T19:47:03Z

One last update: @imotov plans to raise a PR next week to avoid the geometry -> xContent -> map translation. I've already raised the PR #55363 to merge the feature branch, so that refactor will be done as a follow-up against master. I'll ensure that the feature doesn't ship in 7.10 without this improvement (filing a new 'blocking' issue if needed).

Although we accept a variety of formats during indexing, spatial data is returned in a single consistent format. This is GeoJSON by default, but well-known text is also supported by passing the option 'format: wkt'. Note that points (in addition to shapes) are returned in GeoJSON by default. The reasoning is that this gives better consistency, and is the most convenient format for most REST API users.

jtibshirani mentioned this pull request Jul 18, 2020

Search 'fields' option design + implementation #55363

Closed

10 tasks

jtibshirani commented Jul 18, 2020

View reviewed changes

jtibshirani force-pushed the field-retrieval branch from 5b91623 to 64143b8 Compare July 18, 2020 04:50

jtibshirani force-pushed the fetch-geo-fields branch from 28ddcc3 to 58c5a26 Compare July 18, 2020 04:54

jtibshirani added :Analytics/Geo Indexing, search aggregations of geo points and shapes :Search Foundations/Mapping Index mappings, including merging and defining field types >feature labels Jul 19, 2020

jtibshirani marked this pull request as ready for review July 19, 2020 21:47

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 19, 2020

elasticmachine added the Team:Search Meta label for search team label Jul 19, 2020

nik9000 reviewed Jul 20, 2020

View reviewed changes

jtibshirani requested review from imotov and talevy July 20, 2020 20:50

talevy reviewed Jul 20, 2020

View reviewed changes

imotov reviewed Jul 21, 2020

View reviewed changes

jtibshirani force-pushed the field-retrieval branch from 64143b8 to 1de5a51 Compare July 21, 2020 18:07

jtibshirani added 3 commits July 21, 2020 11:08

In CartesianPoint, store coordinates as doubles.

d50fa6f

This gives more predictable values when parsing and formatting a point. It matches the behavior for GeoPoint and the Point geometry.

Fix naming of 'spatial datatypes' anchor.

804a295

jtibshirani force-pushed the fetch-geo-fields branch from 3bb216a to 804a295 Compare July 21, 2020 19:49

jtibshirani added 3 commits July 21, 2020 14:53

Better formalize the types of geometry formats.

6f9e82a

Rely on GeometryFormat for all value formatting.

926993d

Avoid parsing shapes if they're already in the right format.

4bf9bd1

Fix checkstyle violation.

d3d6282

imotov reviewed Jul 22, 2020

View reviewed changes

talevy approved these changes Jul 22, 2020

View reviewed changes

jtibshirani added 2 commits July 22, 2020 13:41

Rename GeometryFormat#toObject -> toXContentAsObject.

1b7dad3

Always return Object from toXContentAsObject.

9c155cf

jtibshirani merged this pull request into elastic:field-retrieval Jul 22, 2020

jtibshirani deleted the fetch-geo-fields branch July 22, 2020 22:07

jtibshirani mentioned this pull request Jul 28, 2020

Streamline serialization when retrieving spatial data through fields. #60259

Closed

Support spatial fields in field retrieval API. #59821

Support spatial fields in field retrieval API. #59821

Uh oh!

Conversation

jtibshirani commented Jul 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jul 19, 2020

Uh oh!

elasticmachine commented Jul 19, 2020

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtibshirani Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talevy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Jul 21, 2020

Uh oh!

jtibshirani commented Jul 21, 2020

Uh oh!

jtibshirani commented Jul 21, 2020

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtibshirani Jul 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talevy left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Jul 22, 2020

Uh oh!

jtibshirani commented Jul 22, 2020

Uh oh!

jtibshirani commented Jul 23, 2020

Uh oh!

jtibshirani Jul 21, 2020 •

edited

Loading

jtibshirani Jul 22, 2020 •

edited

Loading