Refactored geogrid to support multiple hash types #30320

nyurik · 2018-05-02T02:09:59Z

This patch should not introduce any changes in the existing ES behavior. Its only goal is to allow subsequent addition of various hashing algorithms, such as quadkey, pluscode, hex, .... The only user-visible change is the addition of the new type parameter to the geohash_grid aggregation. The only allowed value is geohash (default).

This PR is required for #30231 and #30240

Open Questions

Builder parses each parameter independently. innerBuild() seems to run after parameter parsing, but before the values are serialized by innerWriteTo(). Is it a good approach to do the cross-param validation and update the builder's state (set integer precision), so that serialization would only use int, not the parameter string?

TODO

Test for invalid precision: Test testParseErrorOnPrecisionOutOfRange and similar ones currently fail because the code no longer throws an error immediately when parsing precision input parameter. Instead, the error is thrown in GeoGridAggregationBuilder.innerBuild() at a slightly later stage, once all parameters have been parsed, and the hashing type is known.
~~Rename geohash_grid into geo_grid. Keep geohash_grid as a deprecated alias to geo_grid.~~ (decided to postpone this for a follow up refactoring PR to reduce reviewing complexity)
Modify ParsedGeoHashGrid to support the new type. For some reason, other hashing algorithms worked fine without any modifications to that class and no unit tests have complained about this (bug?) cc: @cbuescher @javanna

elasticsearch/server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/ParsedGeoHashGrid.java

Line 31 in 99f88f1

public class ParsedGeoHashGrid extends ParsedMultiBucketAggregation<ParsedGeoHashGrid.ParsedBucket> implements GeoHashGrid {

CC: @colings86

colings86 · 2018-05-02T11:27:24Z

Marked discuss because we are due to talk about this in the search and aggs meeting on Monday 7th May

elasticmachine · 2018-05-03T02:22:48Z

Pinging @elastic/es-search-aggs

nyurik · 2018-05-04T06:31:44Z

jenkins test this

colings86

@nyurik I left some comments but I think the general direction is good.

colings86 · 2018-06-01T11:19:21Z

qa/rolling-upgrade/src/test/resources/rest-api-spec/test/old_cluster/10_basic.yml

+          - '{"index": {"_index": "geo_agg_index", "_type": "doc"}}'
+          - '{"location": "48.861111,2.336389", "name": "Musée du Louvre"}'
+          - '{"index": {"_index": "geo_agg_index", "_type": "doc"}}'
+          - '{"location": "48.860000,2.327000", "name": "Musée Orsay"}'


Can we check here that a search returns the expected response so if it fails on the mixed cluster test we know the problem is to do with the upgrade and not because something went wrong on the old cluster before the upgrade?

colings86 · 2018-06-01T11:22:01Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));
+            // Do not catch IllegalArgumentException here


Can you explain why in the code comment please?

IllegalArgumentException indicates range validation failure - which we don't want to handle here, but want to return to the user as before. Also this way all previous unit tests continue working as if nothing changed.

Can you add this to the code comment please so its clear for people coming across this in the future?

colings86 · 2018-06-01T11:25:18Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+    public static int parsePrecisionString(String precision) {
+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));


This is a breaking change here. We can only do this in 7.0.0 but I suspect you are going to want to backport this PR to the 6.x branch. I think we should change this here to warn if the precision is out of range using the deprecation logger and then we can have a follow up PR for 7.0 only that makes this strictly within the range.

are you sure there are changes from the previous algorithm? I simply broke up the parsePrecision(XContentParser) into two functions, and inlined it to remove impossible code paths. In the previous version, checkPrecisionRange() was called at https://github.com/elastic/elasticsearch/pull/30320/files?utf8=%E2%9C%93&diff=split&w=1#diff-e98438cd3baeeca821694343df88218dL69 -- PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), ... -- which means that if you supply "99999" value, it would fail in the builder.precision() call.

Its a breaking change because if the string in the JSON was a simple integer value before we would parse it to an int using parser.intValue() and not perform a range check. Now we do perform a range check so JSON that used to be parsed without error will now throw an exception, hence its a breaking change. I do agree that throwing an error is better but we need to be careful because this can make upgrades tricky for the user if suddenly requests that they used fine in their app before start failing.

The question is, what used to happen is you gave a precision value outside of the range? Did we fail later in execution (in which case the breaking change here might be ok)? Did we accept the precision value and process it at the precision specified even though its outside the range? or may we accepted the precision value and if it was outside the range we used it as if it was at one of the bounds (i.e. if the value is above the MAX_PRECISION we just evaluated it as if it was MAX_PRECISION)?

@colings86 i think you misunderstood my comment - the existing code already throws an error if precision is out of range --

elasticsearch/server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

Line 116 in 2c20f7a

public GeoGridAggregationBuilder precision(int precision) {

Ok, in which case I agree that this is not a breaking change

colings86 · 2018-06-01T11:27:26Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashGridParams.java

@@ -26,6 +26,7 @@
 final class GeoHashGridParams {

    /* recognized field names in JSON */
+    static final ParseField FIELD_TYPE = new ParseField("type");


I think we should be more descriptive with the name here. Maybe we should use hash_type?

ah, naming... :) Another idea was to rename geohash_grid to geo_grid or some other name because geohash is technically a specific algorithm - https://en.wikipedia.org/wiki/Geohash ... Is there an easy way to keep original geohash_grid that uses default hashing type, and also a new agg that allows different types without any code duplications? Assuming we keep the same default type=geohash. If we introduce geo_grid, we might as well keep type as it will be generic enough... what do you think?

We could rename GeoHashGridAggregationBuilder to GeoGridAggregationBuilder and then create a new GeoHashGridAggregationBuilder which is just a wrapper around GeoGridAggregationBuilder but hardcodes the type to provide backward compatibility. We would want to immediately deprecate the new GeoHashGridAggregationBuilder so we can remove it in 7.0.0. @jpountz do you think this is a good path to go down?

colings86 · 2018-06-01T11:32:27Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

-        PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION,
-            org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT);
+        PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE);
+        PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT);


Given what we are doing below I think we should declare the ValueType as ValueType.VALUE here because otherwise its confusing when reading the code to see INT here and then the fact that we might expect a String that isn't just an int value below. We should then add an else to the below method to throw an exception if anything other than int or String is supplied.

If I make the changes you propose with ConstructingObjectParser, wouldn't the code be very similar to what it was before? The ValueType.INT already allows just string and int, and the different handling of string was already part of it. I will simply expand it a bit to pass string parsing via a different route depending on the type.
We could change it to VALUE and add an additional check, but it seems redundant to what parser already does. Are you sure about this one? I'm ok to add it of course, not a biggy.

I just think its weird to declare the field expects the value to be an int when actually we are also expecting string values that are not the exact string representation of an int (i.e. cannot be parsed using Integer.valueOf(String)). It took me a little while to work out how this worked when I reviewed it so personally I think its worth making the change for code readability.

done. Some error messages had to change.

colings86 · 2018-06-01T11:43:38Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashTypeProvider.java

+     * @param hash as generated by the {@link #calculateHash}
+     * @return center of the grid cell
+     */
+    GeoPoint hashAsObject(long hash);


Can we call the hashAsGeoPoint?

see public GeoPoint getKey() below.

colings86 · 2018-06-01T11:43:54Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

@@ -63,26 +65,29 @@
         * Read from a stream.
         */
        private Bucket(StreamInput in) throws IOException {
+            type = GeoHashType.readFromStream(in);


We need to check the version here

The version is checked inside the type itself -- https://github.com/elastic/elasticsearch/pull/30320/files#diff-75ad5ae5ac78babf26799166a287cfe1R51

We should do the version check here IMO. The way to serialise the type is not version dependant (since the type is new), but the fact that the type is serialised is version dependant since previous versions do not know about the concept of a type here.

@colings86, the GeoHashType.readFromStream and .write() are not modifying the stream at all for the older versions, but instead just assert that type is equal to default (on write), and return default (read). We have four places (2 reads and 2 writes) where this code is needed - GeoGridAggregationBuilder and InternalGeoHashGrid.Bucket. Duplicating identical code seems to be against all common sense here :) - that's why i placed it inside the GeoHashType. Perhaps they should be called something else, e.g. writeIfNeeded, readIfNewerVersion ? Or at least I should add a comment to all 4 places.

Again I think this is a readability thing for me. If I come across the code in GeohashGridAggregationBuilder I think that we have a bug because the type is always serialised. I then have to dig into this class to find out that sometimes, depending on the version it doesn't actually serialise itself at all. To me this seem weird and for the sake of avoiding having 4 checks in favour of 2 I don't think its worth the confusion potential.

I agree about readability, but it is not a simple if statement. For example, the current code that would need to be duplicated is this (I already removed comments)

if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { out.writeEnum(this); } else if (this != DEFAULT) { throw new UnsupportedOperationException("Geo aggregation type [" + toString() + "] is not supported by the node version " + out.getVersion().toString()); }

but as the system matures, there will be more types - e.g. one of the most requested one is "hex grid", but there could be others. This means the code will have to turn into this:

final Version version = out.getVersion(); if ( version.onOrAfter(Version.V_6_5_0) || (this.compareTo(GeoHashType.HEX) < 0 && version.onOrAfter(Version.V_6_4_0)) ) { out.writeEnum(this); } else if (this != DEFAULT) { throw new UnsupportedOperationException("Geo aggregation type [" + toString() + "] is not supported by the node version " + version.toString()); }

Duplicating this kind of code is a sure no-no, so we clearly need a helper method that will decide if 1) if type is compatible with the version, and 2) if it even needs to be serialized or not.

colings86 · 2018-06-01T11:43:59Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

            geohashAsLong = in.readLong();
            docCount = in.readVLong();
            aggregations = InternalAggregations.readAggregations(in);
        }

        @Override
        public void writeTo(StreamOutput out) throws IOException {
+            type.writeTo(out);


We need to check the version here

The version is checked inside the type itself -- https://github.com/elastic/elasticsearch/pull/30320/files#diff-75ad5ae5ac78babf26799166a287cfe1R51

colings86 · 2018-06-01T11:44:56Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

        }

        @Override
        public GeoPoint getKey() {
-            return GeoPoint.fromGeohash(geohashAsLong);
+            // TODO/FIXME:  is it ok to change from GeoPoint to Object, and return different types?


Can you explain what you mean here?

It seems the getKey() is never used for anything geoPoint specific, or at least I couldn't find it, so I was wondering if it makes sense to just return an object here, allowing different hashing algos to decide if they want to return a geopoint or some other object?

I am on the fence here but leaning towards keeping GeoPoint. The reason for this is that I think its good to have a consistent Class for the key so users can change the type and always know what to expect back from this method. Having different Classes returned from different types would mean that a client would need to consult the type in order to know what to do with the object returned here which would make their code much more complex.

colings86 · 2018-06-01T11:48:39Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashType.java

+            throw new UnsupportedOperationException("Geo aggregation type [" + this +
+                "] is not supported by the node version " + out.getVersion().toString());
+        }
+    }


We should write a test for this Enum. Have a look at ShapeRelationTests to see the kind of tests I mean. This will ensure that if the ordering of the enum is changed or if a new enum value is added it will be catch any changes to serialisation. We should also add a test for reading and writing the enum to and from a version before this feature

thx, done - see last commit.

nyurik

thx, answered in depth on several issues, let me know what you think. The last commit is a rebase + unit tests for the GeoHashType. Other changes are in progress.

nyurik · 2018-06-05T03:35:31Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

@@ -63,26 +65,29 @@
         * Read from a stream.
         */
        private Bucket(StreamInput in) throws IOException {
+            type = GeoHashType.readFromStream(in);


@colings86, the GeoHashType.readFromStream and .write() are not modifying the stream at all for the older versions, but instead just assert that type is equal to default (on write), and return default (read). We have four places (2 reads and 2 writes) where this code is needed - GeoGridAggregationBuilder and InternalGeoHashGrid.Bucket. Duplicating identical code seems to be against all common sense here :) - that's why i placed it inside the GeoHashType. Perhaps they should be called something else, e.g. writeIfNeeded, readIfNewerVersion ? Or at least I should add a comment to all 4 places.

nyurik · 2018-06-05T03:52:47Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+    public static int parsePrecisionString(String precision) {
+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));


are you sure there are changes from the previous algorithm? I simply broke up the parsePrecision(XContentParser) into two functions, and inlined it to remove impossible code paths. In the previous version, checkPrecisionRange() was called at https://github.com/elastic/elasticsearch/pull/30320/files?utf8=%E2%9C%93&diff=split&w=1#diff-e98438cd3baeeca821694343df88218dL69 -- PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), ... -- which means that if you supply "99999" value, it would fail in the builder.precision() call.

nyurik · 2018-06-05T03:55:56Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

-        PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION,
-            org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT);
+        PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE);
+        PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT);


If I make the changes you propose with ConstructingObjectParser, wouldn't the code be very similar to what it was before? The ValueType.INT already allows just string and int, and the different handling of string was already part of it. I will simply expand it a bit to pass string parsing via a different route depending on the type.
We could change it to VALUE and add an additional check, but it seems redundant to what parser already does. Are you sure about this one? I'm ok to add it of course, not a biggy.

nyurik · 2018-06-05T04:26:02Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

+        } catch (Exception e) {
+            throw new XContentParseException(builder.precisionLocation,
+                "[geohash_grid] failed to parse field [precision]", e);
+        }


thx, will do.

nyurik · 2018-06-05T04:29:28Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

@@ -101,20 +146,43 @@ protected AggregationBuilder shallowCopy(Builder factoriesBuilder, Map<String, O
     */
    public GeoGridAggregationBuilder(StreamInput in) throws IOException {
        super(in, ValuesSourceType.GEOPOINT, ValueType.GEOPOINT);
+        type = GeoHashType.readFromStream(in);


see my other comment - it is not actually being read for older versions.

nyurik · 2018-06-05T04:36:56Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashGridParams.java

@@ -26,6 +26,7 @@
 final class GeoHashGridParams {

    /* recognized field names in JSON */
+    static final ParseField FIELD_TYPE = new ParseField("type");


ah, naming... :) Another idea was to rename geohash_grid to geo_grid or some other name because geohash is technically a specific algorithm - https://en.wikipedia.org/wiki/Geohash ... Is there an easy way to keep original geohash_grid that uses default hashing type, and also a new agg that allows different types without any code duplications? Assuming we keep the same default type=geohash. If we introduce geo_grid, we might as well keep type as it will be generic enough... what do you think?

nyurik · 2018-06-05T04:40:02Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashTypeProvider.java

+     * @param hash as generated by the {@link #calculateHash}
+     * @return center of the grid cell
+     */
+    GeoPoint hashAsObject(long hash);


see public GeoPoint getKey() below.

nyurik · 2018-06-05T04:43:32Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

        }

        @Override
        public GeoPoint getKey() {
-            return GeoPoint.fromGeohash(geohashAsLong);
+            // TODO/FIXME:  is it ok to change from GeoPoint to Object, and return different types?


It seems the getKey() is never used for anything geoPoint specific, or at least I couldn't find it, so I was wondering if it makes sense to just return an object here, allowing different hashing algos to decide if they want to return a geopoint or some other object?

nyurik · 2018-06-05T04:46:50Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));
+            // Do not catch IllegalArgumentException here


IllegalArgumentException indicates range validation failure - which we don't want to handle here, but want to return to the user as before. Also this way all previous unit tests continue working as if nothing changed.

nyurik · 2018-06-05T05:38:15Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashType.java

+            throw new UnsupportedOperationException("Geo aggregation type [" + this +
+                "] is not supported by the node version " + out.getVersion().toString());
+        }
+    }


thx, done - see last commit.

colings86

@nyurik I left some replies to your comments

colings86 · 2018-06-06T09:01:56Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+    public static int parsePrecisionString(String precision) {
+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));


Its a breaking change because if the string in the JSON was a simple integer value before we would parse it to an int using parser.intValue() and not perform a range check. Now we do perform a range check so JSON that used to be parsed without error will now throw an exception, hence its a breaking change. I do agree that throwing an error is better but we need to be careful because this can make upgrades tricky for the user if suddenly requests that they used fine in their app before start failing.

The question is, what used to happen is you gave a precision value outside of the range? Did we fail later in execution (in which case the breaking change here might be ok)? Did we accept the precision value and process it at the precision specified even though its outside the range? or may we accepted the precision value and if it was outside the range we used it as if it was at one of the bounds (i.e. if the value is above the MAX_PRECISION we just evaluated it as if it was MAX_PRECISION)?

colings86 · 2018-06-06T09:02:18Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));
+            // Do not catch IllegalArgumentException here


Can you add this to the code comment please so its clear for people coming across this in the future?

colings86 · 2018-06-06T09:05:15Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

-        PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION,
-            org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT);
+        PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE);
+        PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT);


I just think its weird to declare the field expects the value to be an int when actually we are also expecting string values that are not the exact string representation of an int (i.e. cannot be parsed using Integer.valueOf(String)). It took me a little while to work out how this worked when I reviewed it so personally I think its worth making the change for code readability.

colings86 · 2018-06-06T09:08:08Z

...er/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoHashGridParams.java

@@ -26,6 +26,7 @@
 final class GeoHashGridParams {

    /* recognized field names in JSON */
+    static final ParseField FIELD_TYPE = new ParseField("type");


We could rename GeoHashGridAggregationBuilder to GeoGridAggregationBuilder and then create a new GeoHashGridAggregationBuilder which is just a wrapper around GeoGridAggregationBuilder but hardcodes the type to provide backward compatibility. We would want to immediately deprecate the new GeoHashGridAggregationBuilder so we can remove it in 7.0.0. @jpountz do you think this is a good path to go down?

colings86 · 2018-06-06T09:11:17Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

@@ -63,26 +65,29 @@
         * Read from a stream.
         */
        private Bucket(StreamInput in) throws IOException {
+            type = GeoHashType.readFromStream(in);


Again I think this is a readability thing for me. If I come across the code in GeohashGridAggregationBuilder I think that we have a bug because the type is always serialised. I then have to dig into this class to find out that sometimes, depending on the version it doesn't actually serialise itself at all. To me this seem weird and for the sake of avoiding having 4 checks in favour of 2 I don't think its worth the confusion potential.

colings86 · 2018-06-06T09:16:12Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

        }

        @Override
        public GeoPoint getKey() {
-            return GeoPoint.fromGeohash(geohashAsLong);
+            // TODO/FIXME:  is it ok to change from GeoPoint to Object, and return different types?


I am on the fence here but leaning towards keeping GeoPoint. The reason for this is that I think its good to have a consistent Class for the key so users can change the type and always know what to expect back from this method. Having different Classes returned from different types would mean that a client would need to consult the type in order to know what to do with the object returned here which would make their code much more complex.

nyurik

most concerns addressed in 2c07544
Still todo: rename all classes to geo_grid
commented inline.

nyurik · 2018-06-06T21:56:48Z

qa/rolling-upgrade/src/test/resources/rest-api-spec/test/old_cluster/10_basic.yml

+          - '{"index": {"_index": "geo_agg_index", "_type": "doc"}}'
+          - '{"location": "48.861111,2.336389", "name": "Musée du Louvre"}'
+          - '{"index": {"_index": "geo_agg_index", "_type": "doc"}}'
+          - '{"location": "48.860000,2.327000", "name": "Musée Orsay"}'


nyurik · 2018-06-06T21:59:11Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+    public static int parsePrecisionString(String precision) {
+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));


@colings86 i think you misunderstood my comment - the existing code already throws an error if precision is out of range --

elasticsearch/server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

Line 116 in 2c20f7a

public GeoGridAggregationBuilder precision(int precision) {

nyurik · 2018-06-06T21:59:21Z

server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java

+        try {
+            // we want to treat simple integer strings as precision levels, not distances
+            return checkPrecisionRange(Integer.parseInt(precision));
+            // Do not catch IllegalArgumentException here


nyurik · 2018-06-06T22:30:14Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

-        PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION,
-            org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT);
+        PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE);
+        PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT);


done. Some error messages had to change.

nyurik · 2018-06-06T22:30:39Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

+        } catch (Exception e) {
+            throw new XContentParseException(builder.precisionLocation,
+                "[geohash_grid] failed to parse field [precision]", e);
+        }


nyurik · 2018-06-06T22:32:22Z

...ain/java/org/elasticsearch/search/aggregations/bucket/geogrid/GeoGridAggregationBuilder.java

+        // Note that tests must set precision after setting the type
+        this.precision = this.type.getHandler().getDefaultPrecision();
+        return this;
+    }


done, type parsing still needs to be a separate static method, converting string into enum value

nyurik · 2018-06-06T22:48:19Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

@@ -63,26 +65,29 @@
         * Read from a stream.
         */
        private Bucket(StreamInput in) throws IOException {
+            type = GeoHashType.readFromStream(in);


I agree about readability, but it is not a simple if statement. For example, the current code that would need to be duplicated is this (I already removed comments)

if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { out.writeEnum(this); } else if (this != DEFAULT) { throw new UnsupportedOperationException("Geo aggregation type [" + toString() + "] is not supported by the node version " + out.getVersion().toString()); }

but as the system matures, there will be more types - e.g. one of the most requested one is "hex grid", but there could be others. This means the code will have to turn into this:

final Version version = out.getVersion(); if ( version.onOrAfter(Version.V_6_5_0) || (this.compareTo(GeoHashType.HEX) < 0 && version.onOrAfter(Version.V_6_4_0)) ) { out.writeEnum(this); } else if (this != DEFAULT) { throw new UnsupportedOperationException("Geo aggregation type [" + toString() + "] is not supported by the node version " + version.toString()); }

Duplicating this kind of code is a sure no-no, so we clearly need a helper method that will decide if 1) if type is compatible with the version, and 2) if it even needs to be serialized or not.

nyurik · 2018-06-06T22:49:57Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

        }

        @Override
        public GeoPoint getKey() {
-            return GeoPoint.fromGeohash(geohashAsLong);
+            // TODO/FIXME:  is it ok to change from GeoPoint to Object, and return different types?


This patch should not introduce any changes in the existing ES behavior. Its only goal is to allow subsequent addition of various hashing algorithms, such as quadkey, pluscode, hex, ...

colings86 · 2018-06-19T07:57:09Z

oh actually I just read the comment above that says this will only affect the transport client. Let me think a bit on this and chat with @cbuescher about it

cbuescher · 2018-06-25T19:20:00Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/InternalGeoHashGrid.java

@@ -49,11 +48,13 @@
        GeoHashGrid {
    static class Bucket extends InternalMultiBucketAggregation.InternalBucket implements GeoHashGrid.Bucket, Comparable<Bucket> {

+        protected GeoHashType type;


Can the type be a field in the enclosing InternalGeoHashGrid/ParsedGeoHashGrid? The type should be the same for all buckets in the aggregation, it doesn't make much sense to serialize/deserialize it for every bucket. Also if we write the type id back on the REST response, it should probably be part of the aggregation object, not the bucket.

This seems to be complicated by the fact that the buckets need access to the type in its getters, the buckets cannot see any non-static members of the enclosing aggregation class atm because of their static nature. Not sure what the best approach is here, but having the type encoded in each bucket seems not optimal.

cbuescher · 2018-06-25T20:37:47Z

@nyurik after my discussion last week with @colings86 I took the time to check one of the options of how we can make this work with the high level rest client. We talked about using something similar to the RestSearchAction.TYPED_KEYS_PARAM flag that we already need for aggregations and suggestions to be parsable by the java rest client in general. I whipped something up along these lines in cbuescher@a11f0fa but while doing this ran into several related problems. The commit is a quick and dirty hack to illustrate what I mean but there is a lot of whacky workarounds in there and it wouldn't work atm because some parts are missing. Some of the problems I encountered are:

the geoHashType really should be a property of the InternalGeoHashGrid/ParsedHashGrid aggregation. I left a comment about this in this PR. However, this doesn't seem to be a straight-forward change, since buckets currently cannot reference the aggregation they sit in. Not sure what the options would be about this
if we need to store the geoHash type in the buckets, this leads to several complications on the parsing side, e.g. we need to set the type in the buckets then but cannot be sure when we parsed back that information from the rest layer. I added a dirty hack to the commit above but I don't think its a good solution
the GeoHashType should have a stable id/name that we send across the rest layer. Using the enum name is too brittle IMHO (it can be the name, but it should be set at ctor time, not break when enums are renamed)
similarly, the GeoHashType needs a way to parse these ids back
finally: the GeoHashTypeProvider needs to convert back from String -> GeoPoint. I have a feeling thats doable but isn't implemented yet

colings86 · 2018-06-26T13:22:28Z

After talking with @cbuescher I have raised #31579 which if agreed upon will help to alleviate some of the issues here. We will talk about this in the weekly meeting on Monday next week and we should await that decision before deciding how to proceed here

nyurik · 2018-07-02T18:53:27Z

Per #31579 (comment) , should there be any changes to this PR to add any extra handling to non-geohashing hash type to the rest api? It seems the intention is to use string-only getKey going forward, but for 6.x this code should simply throw an error if hash string is unrecognized - which already happens now. cc: @colings86

The GeoHashGrid bucket class currently returns a GeoPoint with its getKey() method. The REST API returns geohash as a String in the key field of the bucket. This changes the getKey() method to also return a String. Having the ability to get a GeoPoint from a bucket that represents a geohash cell is misleading because the cell does not represent a point but an area. Instead, it should be the clients responsibility to make the decision on how to represent the bucket key in its application. This also simplifies implementing the high level rest client aspects of elastic#30320, as the client will not need to know about the GeoHashType being used and will only care that there is some string key for the buckets. Relates to elastic#30320

cbuescher · 2018-07-02T19:31:53Z

@nyurik the idea was that with #31748 there is nothing to change in ParsedGeoHashGrid to support parsing back the GeoPoint, so we won't neet to send the encoding type back with the Rest response as well. That solves a lot of issues on master (7.0). For backporting I see the following options:

accept it as a known issue that using one of the new hash types in conjunction with using the HL Java client and the getKey() method doesn't work. This should be a relatively rare case and doesn't really break any existing functionality since it wouldn't affect users only using the old default hash type. Still not nice...
revisit the idea of "guessing" the right hash type on the client side only on 6.x versions as a workaround. While I think this is a hack that should generally be avoided, it might be an okay tradeoff to make when we know this "hack" will only be needed until 7.0. Makes me feel a bit more willing to accept it, and might be better than throwing an error?
still backport Change GeoHashGrid.Bucket#getKey() to return String #31748, but we'd really like to avoid that since it would introduce a behavioural change in a minor. Not nice.

I might be missing sth, I suggest we talk this over with @colings86.

nyurik · 2018-07-10T03:27:50Z

@colings86 any updates on this?

colings86 · 2018-07-11T09:58:40Z

I've spoken to a few different people about this since getting back from holiday and I'd like to propose the following (thanks to @jpountz for this idea):

We retain the geohash_grid aggregation as it was with support for getting a GeoPoint but deprecate the aggregation in 6.x and remove it in 7.0. Any use of the aggregation should emit a deprecation warning.
The code in this PR be moved so it creates a new aggregation called geo_grid which supports multiple hash types including geo hash but returns a String of the hash form the getKey() method and does not support getting a GeoPoint from the bucket. Clients will need to convert the geohash into whatever form (point, bounding box, polygon) they need by themselves using a library or their own code.

This means that we retain the current functionality and we do not break backwards compatibility on either the REST, transport of HL REST Client layers, moreover we don't end up with a silent java break which is not shown at compile time. It also means that we can get the multiple hash types support into 6.x in a clean way which will not be possible with the other suggestions because they involve either hacks or silent breaks.

nyurik · 2018-07-11T21:54:54Z

@colings86 thanks! If possible, do you think we can get this into 6.4? Also, should I work on changing the patch, or someone else wants to handle the rename? Thanks, I like your idea!

colings86 · 2018-07-12T07:35:30Z

@nyurik If you are able to change this PR to incorporate my proposal above that would be great?

nyurik · 2018-07-12T07:47:01Z

@colings86 I could create a copy of the existing code, but I doubt I would be easily able to refactor it in such a way as to simulate both behaviors. If a (temporary) full copy is ok, than sure, I could try hacking on it.

colings86 · 2018-07-12T08:06:28Z

@nyurik yes we should create a full copy for now so we have the existing geohash_grid aggregation as it currently is in master (only deprecated) and this newgeo_grid aggregation is a full copy of the geohash_grid classes into completely new classes which do not share code but have the functionality that this PR adds and returns a string from the getKey() method of the buckets with no method to retrieve a GeoPoint from the bucket.

thomasneirynck · 2018-08-14T19:30:18Z

a gentle ping on this PR. Are there plans to move this forward and add to ES @elastic/es-search-aggs? thx

colings86 · 2018-08-15T07:54:17Z

@thomasneirynck Its something we are willing to add to ES. Given the last comment from @nyurik I had understood that he was going to work on the changes to expose this functionality in a new geo_grid aggregation rather than trying to extend the existing geohash_grid aggregation which we found was not going to work due to bwc and client problems.

Destroy666x · 2018-11-09T13:52:08Z

@nyurik are you still working on this and other aggregation PRs?

nyurik · 2018-11-09T20:56:36Z

@Destroy666x (very appropriate user name :) ) - yes and no. It is on my plate to work on this, but I need to finish my current project (probably another week+ or so). If someone wants to work on it, I will be happy to help them get started quickly.

Destroy666x · 2018-11-12T13:36:42Z

Thanks for info!

I'm asking because point aggregations are painful in ES using only geohash grid with very few precision levels + centroid, so I guess switching to PostGIS for now is more reasonable than waiting since it doesn't look like any additional clustering methods will be available anytime soon and no time to implement them myself either.

nyurik · 2019-01-10T00:08:59Z

I'm closing this PR in favor of #37277 -- I'm reworking this PR with the help of @talevy. The main changes from this approach:

Do not refactor existing geohash_grid, but instead add a new geo_grid (copies most of the existing code), and deprecate geohash_grid.
geo_grid will require hash_type parameter. Setting it to "geohash" makes it behave the same as the geohash_grid.
hash_type is implemented as a string instead of an Enum. This allows for a more flexible addition of the new algorithms.
Each bucket no longer contains a type, making bucket transmission cost the same as for the geohash_grid

To make further refactoring of GeoGrid aggregations easier (related: elastic#30320), splitting out these inner class dependencies into their own files makes it easier to map the relationship between classes

To make further refactoring of GeoGrid aggregations easier (related: #30320), splitting out these inner class dependencies into their own files makes it easier to map the relationship between classes

nyurik requested review from imotov and nknize May 2, 2018 02:13

This was referenced May 2, 2018

Support for pluscode geohash grid aggregation #30231

Closed

Implement quadkey geo grid aggregation #30240

Closed

colings86 added >feature discuss :Analytics/Geo Indexing, search aggregations of geo points and shapes labels May 2, 2018

hub-cap added :Analytics/Geo Indexing, search aggregations of geo points and shapes and removed :Analytics/Geo Indexing, search aggregations of geo points and shapes labels May 3, 2018

nyurik force-pushed the refactor_geohashagg branch 4 times, most recently from cc1c90c to 8e47f36 Compare May 4, 2018 04:32

nyurik force-pushed the refactor_geohashagg branch 2 times, most recently from a89f698 to 213487f Compare May 4, 2018 23:32

pcsanwald added team-discuss and removed discuss labels May 10, 2018

nyurik force-pushed the refactor_geohashagg branch from ae7d615 to a44514e Compare May 21, 2018 23:45

colings86 requested changes Jun 1, 2018

View reviewed changes

nyurik force-pushed the refactor_geohashagg branch from a44514e to 3663447 Compare June 5, 2018 05:37

nyurik commented Jun 5, 2018

View reviewed changes

colings86 reviewed Jun 6, 2018

View reviewed changes

nyurik force-pushed the refactor_geohashagg branch from 3663447 to 2c07544 Compare June 6, 2018 23:19

nyurik commented Jun 6, 2018

View reviewed changes

nyurik added 3 commits June 7, 2018 12:13

Refactored geogrid to support multiple hash types

8795e95

This patch should not introduce any changes in the existing ES behavior. Its only goal is to allow subsequent addition of various hashing algorithms, such as quadkey, pluscode, hex, ...

allow int precision settings

d8245b7

restored parser test, still broken

5cbad00

cbuescher reviewed Jun 25, 2018

View reviewed changes

colings86 mentioned this pull request Jun 26, 2018

Should the getKey() method from GeoHashGrid return a String instead of GeoPoint #31579

Closed

cbuescher mentioned this pull request Jul 2, 2018

Change GeoHashGrid.Bucket#getKey() to return String #31748

Closed

colings86 removed the team-discuss label Jul 30, 2018

nyurik mentioned this pull request Jan 9, 2019

(WIP) Implement geo_grid aggregation #37277

Closed

nyurik closed this Jan 10, 2019

nyurik deleted the refactor_geohashagg branch January 10, 2019 17:15

talevy mentioned this pull request Jan 17, 2019

refactor inner geogrid classes to own class files #37596

Merged

Refactored geogrid to support multiple hash types #30320

Refactored geogrid to support multiple hash types #30320

Conversation

nyurik commented May 2, 2018 • edited Loading

Open Questions

TODO

colings86 commented May 2, 2018

elasticmachine commented May 3, 2018

nyurik commented May 4, 2018 • edited Loading

colings86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nyurik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colings86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nyurik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colings86 commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nyurik commented May 2, 2018 •

edited

Loading

nyurik commented May 4, 2018 •

edited

Loading

nyurik commented Jul 2, 2018 •

edited

Loading