-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored geogrid to support multiple hash types #30320
Conversation
Marked discuss because we are due to talk about this in the search and aggs meeting on Monday 7th May |
Pinging @elastic/es-search-aggs |
cc1c90c
to
8e47f36
Compare
jenkins test this |
a89f698
to
213487f
Compare
ae7d615
to
a44514e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nyurik I left some comments but I think the general direction is good.
- '{"index": {"_index": "geo_agg_index", "_type": "doc"}}' | ||
- '{"location": "48.861111,2.336389", "name": "Musée du Louvre"}' | ||
- '{"index": {"_index": "geo_agg_index", "_type": "doc"}}' | ||
- '{"location": "48.860000,2.327000", "name": "Musée Orsay"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check here that a search returns the expected response so if it fails on the mixed cluster test we know the problem is to do with the upgrade and not because something went wrong on the old cluster before the upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); | ||
// Do not catch IllegalArgumentException here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why in the code comment please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IllegalArgumentException indicates range validation failure - which we don't want to handle here, but want to return to the user as before. Also this way all previous unit tests continue working as if nothing changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to the code comment please so its clear for people coming across this in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public static int parsePrecisionString(String precision) { | ||
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change here. We can only do this in 7.0.0 but I suspect you are going to want to backport this PR to the 6.x branch. I think we should change this here to warn if the precision is out of range using the deprecation logger and then we can have a follow up PR for 7.0 only that makes this strictly within the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure there are changes from the previous algorithm? I simply broke up the parsePrecision(XContentParser)
into two functions, and inlined it to remove impossible code paths. In the previous version, checkPrecisionRange()
was called at https://github.com/elastic/elasticsearch/pull/30320/files?utf8=%E2%9C%93&diff=split&w=1#diff-e98438cd3baeeca821694343df88218dL69 -- PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), ...
-- which means that if you supply "99999"
value, it would fail in the builder.precision()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a breaking change because if the string in the JSON was a simple integer value before we would parse it to an int using parser.intValue()
and not perform a range check. Now we do perform a range check so JSON that used to be parsed without error will now throw an exception, hence its a breaking change. I do agree that throwing an error is better but we need to be careful because this can make upgrades tricky for the user if suddenly requests that they used fine in their app before start failing.
The question is, what used to happen is you gave a precision value outside of the range? Did we fail later in execution (in which case the breaking change here might be ok)? Did we accept the precision value and process it at the precision specified even though its outside the range? or may we accepted the precision value and if it was outside the range we used it as if it was at one of the bounds (i.e. if the value is above the MAX_PRECISION we just evaluated it as if it was MAX_PRECISION)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colings86 i think you misunderstood my comment - the existing code already throws an error if precision is out of range --
Line 116 in 2c20f7a
public GeoGridAggregationBuilder precision(int precision) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in which case I agree that this is not a breaking change
@@ -26,6 +26,7 @@ | |||
final class GeoHashGridParams { | |||
|
|||
/* recognized field names in JSON */ | |||
static final ParseField FIELD_TYPE = new ParseField("type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be more descriptive with the name here. Maybe we should use hash_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, naming... :) Another idea was to rename geohash_grid
to geo_grid
or some other name because geohash
is technically a specific algorithm - https://en.wikipedia.org/wiki/Geohash ... Is there an easy way to keep original geohash_grid
that uses default hashing type, and also a new agg that allows different types without any code duplications? Assuming we keep the same default type=geohash. If we introduce geo_grid, we might as well keep type
as it will be generic enough... what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename GeoHashGridAggregationBuilder to GeoGridAggregationBuilder and then create a new GeoHashGridAggregationBuilder which is just a wrapper around GeoGridAggregationBuilder but hardcodes the type to provide backward compatibility. We would want to immediately deprecate the new GeoHashGridAggregationBuilder so we can remove it in 7.0.0. @jpountz do you think this is a good path to go down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION, | ||
org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT); | ||
PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE); | ||
PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what we are doing below I think we should declare the ValueType
as ValueType.VALUE
here because otherwise its confusing when reading the code to see INT here and then the fact that we might expect a String that isn't just an int value below. We should then add an else to the below method to throw an exception if anything other than int or String is supplied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make the changes you propose with ConstructingObjectParser
, wouldn't the code be very similar to what it was before? The ValueType.INT
already allows just string and int, and the different handling of string was already part of it. I will simply expand it a bit to pass string parsing via a different route depending on the type.
We could change it to VALUE
and add an additional check, but it seems redundant to what parser already does. Are you sure about this one? I'm ok to add it of course, not a biggy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think its weird to declare the field expects the value to be an int when actually we are also expecting string values that are not the exact string representation of an int (i.e. cannot be parsed using Integer.valueOf(String)
). It took me a little while to work out how this worked when I reviewed it so personally I think its worth making the change for code readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Some error messages had to change.
* @param hash as generated by the {@link #calculateHash} | ||
* @return center of the grid cell | ||
*/ | ||
GeoPoint hashAsObject(long hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call the hashAsGeoPoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see public GeoPoint getKey()
below.
@@ -63,26 +65,29 @@ | |||
* Read from a stream. | |||
*/ | |||
private Bucket(StreamInput in) throws IOException { | |||
type = GeoHashType.readFromStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version is checked inside the type itself -- https://github.com/elastic/elasticsearch/pull/30320/files#diff-75ad5ae5ac78babf26799166a287cfe1R51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do the version check here IMO. The way to serialise the type is not version dependant (since the type is new), but the fact that the type is serialised is version dependant since previous versions do not know about the concept of a type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colings86, the GeoHashType.readFromStream
and .write()
are not modifying the stream at all for the older versions, but instead just assert that type is equal to default (on write), and return default (read). We have four places (2 reads and 2 writes) where this code is needed - GeoGridAggregationBuilder and InternalGeoHashGrid.Bucket. Duplicating identical code seems to be against all common sense here :) - that's why i placed it inside the GeoHashType
. Perhaps they should be called something else, e.g. writeIfNeeded, readIfNewerVersion ? Or at least I should add a comment to all 4 places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think this is a readability thing for me. If I come across the code in GeohashGridAggregationBuilder I think that we have a bug because the type is always serialised. I then have to dig into this class to find out that sometimes, depending on the version it doesn't actually serialise itself at all. To me this seem weird and for the sake of avoiding having 4 checks in favour of 2 I don't think its worth the confusion potential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about readability, but it is not a simple if statement. For example, the current code that would need to be duplicated is this (I already removed comments)
if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
out.writeEnum(this);
} else if (this != DEFAULT) {
throw new UnsupportedOperationException("Geo aggregation type [" + toString() +
"] is not supported by the node version " + out.getVersion().toString());
}
but as the system matures, there will be more types - e.g. one of the most requested one is "hex grid", but there could be others. This means the code will have to turn into this:
final Version version = out.getVersion();
if (
version.onOrAfter(Version.V_6_5_0) ||
(this.compareTo(GeoHashType.HEX) < 0 && version.onOrAfter(Version.V_6_4_0))
) {
out.writeEnum(this);
} else if (this != DEFAULT) {
throw new UnsupportedOperationException("Geo aggregation type [" + toString() +
"] is not supported by the node version " + version.toString());
}
Duplicating this kind of code is a sure no-no, so we clearly need a helper method that will decide if 1) if type is compatible with the version, and 2) if it even needs to be serialized or not.
geohashAsLong = in.readLong(); | ||
docCount = in.readVLong(); | ||
aggregations = InternalAggregations.readAggregations(in); | ||
} | ||
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
type.writeTo(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version is checked inside the type itself -- https://github.com/elastic/elasticsearch/pull/30320/files#diff-75ad5ae5ac78babf26799166a287cfe1R51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
} | ||
|
||
@Override | ||
public GeoPoint getKey() { | ||
return GeoPoint.fromGeohash(geohashAsLong); | ||
// TODO/FIXME: is it ok to change from GeoPoint to Object, and return different types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the getKey()
is never used for anything geoPoint specific, or at least I couldn't find it, so I was wondering if it makes sense to just return an object here, allowing different hashing algos to decide if they want to return a geopoint or some other object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on the fence here but leaning towards keeping GeoPoint. The reason for this is that I think its good to have a consistent Class for the key so users can change the type and always know what to expect back from this method. Having different Classes returned from different types would mean that a client would need to consult the type in order to know what to do with the object returned here which would make their code much more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, np
throw new UnsupportedOperationException("Geo aggregation type [" + this + | ||
"] is not supported by the node version " + out.getVersion().toString()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should write a test for this Enum. Have a look at ShapeRelationTests to see the kind of tests I mean. This will ensure that if the ordering of the enum is changed or if a new enum value is added it will be catch any changes to serialisation. We should also add a test for reading and writing the enum to and from a version before this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, done - see last commit.
a44514e
to
3663447
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, answered in depth on several issues, let me know what you think. The last commit is a rebase + unit tests for the GeoHashType. Other changes are in progress.
@@ -63,26 +65,29 @@ | |||
* Read from a stream. | |||
*/ | |||
private Bucket(StreamInput in) throws IOException { | |||
type = GeoHashType.readFromStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colings86, the GeoHashType.readFromStream
and .write()
are not modifying the stream at all for the older versions, but instead just assert that type is equal to default (on write), and return default (read). We have four places (2 reads and 2 writes) where this code is needed - GeoGridAggregationBuilder and InternalGeoHashGrid.Bucket. Duplicating identical code seems to be against all common sense here :) - that's why i placed it inside the GeoHashType
. Perhaps they should be called something else, e.g. writeIfNeeded, readIfNewerVersion ? Or at least I should add a comment to all 4 places.
public static int parsePrecisionString(String precision) { | ||
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure there are changes from the previous algorithm? I simply broke up the parsePrecision(XContentParser)
into two functions, and inlined it to remove impossible code paths. In the previous version, checkPrecisionRange()
was called at https://github.com/elastic/elasticsearch/pull/30320/files?utf8=%E2%9C%93&diff=split&w=1#diff-e98438cd3baeeca821694343df88218dL69 -- PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), ...
-- which means that if you supply "99999"
value, it would fail in the builder.precision()
call.
PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION, | ||
org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT); | ||
PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE); | ||
PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make the changes you propose with ConstructingObjectParser
, wouldn't the code be very similar to what it was before? The ValueType.INT
already allows just string and int, and the different handling of string was already part of it. I will simply expand it a bit to pass string parsing via a different route depending on the type.
We could change it to VALUE
and add an additional check, but it seems redundant to what parser already does. Are you sure about this one? I'm ok to add it of course, not a biggy.
} catch (Exception e) { | ||
throw new XContentParseException(builder.precisionLocation, | ||
"[geohash_grid] failed to parse field [precision]", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, will do.
@@ -101,20 +146,43 @@ protected AggregationBuilder shallowCopy(Builder factoriesBuilder, Map<String, O | |||
*/ | |||
public GeoGridAggregationBuilder(StreamInput in) throws IOException { | |||
super(in, ValuesSourceType.GEOPOINT, ValueType.GEOPOINT); | |||
type = GeoHashType.readFromStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other comment - it is not actually being read for older versions.
@@ -26,6 +26,7 @@ | |||
final class GeoHashGridParams { | |||
|
|||
/* recognized field names in JSON */ | |||
static final ParseField FIELD_TYPE = new ParseField("type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, naming... :) Another idea was to rename geohash_grid
to geo_grid
or some other name because geohash
is technically a specific algorithm - https://en.wikipedia.org/wiki/Geohash ... Is there an easy way to keep original geohash_grid
that uses default hashing type, and also a new agg that allows different types without any code duplications? Assuming we keep the same default type=geohash. If we introduce geo_grid, we might as well keep type
as it will be generic enough... what do you think?
* @param hash as generated by the {@link #calculateHash} | ||
* @return center of the grid cell | ||
*/ | ||
GeoPoint hashAsObject(long hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see public GeoPoint getKey()
below.
} | ||
|
||
@Override | ||
public GeoPoint getKey() { | ||
return GeoPoint.fromGeohash(geohashAsLong); | ||
// TODO/FIXME: is it ok to change from GeoPoint to Object, and return different types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the getKey()
is never used for anything geoPoint specific, or at least I couldn't find it, so I was wondering if it makes sense to just return an object here, allowing different hashing algos to decide if they want to return a geopoint or some other object?
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); | ||
// Do not catch IllegalArgumentException here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IllegalArgumentException indicates range validation failure - which we don't want to handle here, but want to return to the user as before. Also this way all previous unit tests continue working as if nothing changed.
throw new UnsupportedOperationException("Geo aggregation type [" + this + | ||
"] is not supported by the node version " + out.getVersion().toString()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, done - see last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nyurik I left some replies to your comments
public static int parsePrecisionString(String precision) { | ||
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a breaking change because if the string in the JSON was a simple integer value before we would parse it to an int using parser.intValue()
and not perform a range check. Now we do perform a range check so JSON that used to be parsed without error will now throw an exception, hence its a breaking change. I do agree that throwing an error is better but we need to be careful because this can make upgrades tricky for the user if suddenly requests that they used fine in their app before start failing.
The question is, what used to happen is you gave a precision value outside of the range? Did we fail later in execution (in which case the breaking change here might be ok)? Did we accept the precision value and process it at the precision specified even though its outside the range? or may we accepted the precision value and if it was outside the range we used it as if it was at one of the bounds (i.e. if the value is above the MAX_PRECISION we just evaluated it as if it was MAX_PRECISION)?
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); | ||
// Do not catch IllegalArgumentException here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to the code comment please so its clear for people coming across this in the future?
PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION, | ||
org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT); | ||
PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE); | ||
PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think its weird to declare the field expects the value to be an int when actually we are also expecting string values that are not the exact string representation of an int (i.e. cannot be parsed using Integer.valueOf(String)
). It took me a little while to work out how this worked when I reviewed it so personally I think its worth making the change for code readability.
@@ -26,6 +26,7 @@ | |||
final class GeoHashGridParams { | |||
|
|||
/* recognized field names in JSON */ | |||
static final ParseField FIELD_TYPE = new ParseField("type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename GeoHashGridAggregationBuilder to GeoGridAggregationBuilder and then create a new GeoHashGridAggregationBuilder which is just a wrapper around GeoGridAggregationBuilder but hardcodes the type to provide backward compatibility. We would want to immediately deprecate the new GeoHashGridAggregationBuilder so we can remove it in 7.0.0. @jpountz do you think this is a good path to go down?
@@ -63,26 +65,29 @@ | |||
* Read from a stream. | |||
*/ | |||
private Bucket(StreamInput in) throws IOException { | |||
type = GeoHashType.readFromStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think this is a readability thing for me. If I come across the code in GeohashGridAggregationBuilder I think that we have a bug because the type is always serialised. I then have to dig into this class to find out that sometimes, depending on the version it doesn't actually serialise itself at all. To me this seem weird and for the sake of avoiding having 4 checks in favour of 2 I don't think its worth the confusion potential.
} | ||
|
||
@Override | ||
public GeoPoint getKey() { | ||
return GeoPoint.fromGeohash(geohashAsLong); | ||
// TODO/FIXME: is it ok to change from GeoPoint to Object, and return different types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on the fence here but leaning towards keeping GeoPoint. The reason for this is that I think its good to have a consistent Class for the key so users can change the type and always know what to expect back from this method. Having different Classes returned from different types would mean that a client would need to consult the type in order to know what to do with the object returned here which would make their code much more complex.
3663447
to
2c07544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most concerns addressed in 2c07544
Still todo: rename all classes to geo_grid
commented inline.
- '{"index": {"_index": "geo_agg_index", "_type": "doc"}}' | ||
- '{"location": "48.861111,2.336389", "name": "Musée du Louvre"}' | ||
- '{"index": {"_index": "geo_agg_index", "_type": "doc"}}' | ||
- '{"location": "48.860000,2.327000", "name": "Musée Orsay"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public static int parsePrecisionString(String precision) { | ||
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colings86 i think you misunderstood my comment - the existing code already throws an error if precision is out of range --
Line 116 in 2c20f7a
public GeoGridAggregationBuilder precision(int precision) { |
try { | ||
// we want to treat simple integer strings as precision levels, not distances | ||
return checkPrecisionRange(Integer.parseInt(precision)); | ||
// Do not catch IllegalArgumentException here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
PARSER.declareField((parser, builder, context) -> builder.precision(parsePrecision(parser)), GeoHashGridParams.FIELD_PRECISION, | ||
org.elasticsearch.common.xcontent.ObjectParser.ValueType.INT); | ||
PARSER.declareString(GeoGridAggregationBuilder::type, GeoHashGridParams.FIELD_TYPE); | ||
PARSER.declareField(GeoGridAggregationBuilder::parsePrecision, GeoHashGridParams.FIELD_PRECISION, ObjectParser.ValueType.INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Some error messages had to change.
} catch (Exception e) { | ||
throw new XContentParseException(builder.precisionLocation, | ||
"[geohash_grid] failed to parse field [precision]", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Note that tests must set precision after setting the type | ||
this.precision = this.type.getHandler().getDefaultPrecision(); | ||
return this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, type parsing still needs to be a separate static method, converting string into enum value
@@ -63,26 +65,29 @@ | |||
* Read from a stream. | |||
*/ | |||
private Bucket(StreamInput in) throws IOException { | |||
type = GeoHashType.readFromStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about readability, but it is not a simple if statement. For example, the current code that would need to be duplicated is this (I already removed comments)
if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
out.writeEnum(this);
} else if (this != DEFAULT) {
throw new UnsupportedOperationException("Geo aggregation type [" + toString() +
"] is not supported by the node version " + out.getVersion().toString());
}
but as the system matures, there will be more types - e.g. one of the most requested one is "hex grid", but there could be others. This means the code will have to turn into this:
final Version version = out.getVersion();
if (
version.onOrAfter(Version.V_6_5_0) ||
(this.compareTo(GeoHashType.HEX) < 0 && version.onOrAfter(Version.V_6_4_0))
) {
out.writeEnum(this);
} else if (this != DEFAULT) {
throw new UnsupportedOperationException("Geo aggregation type [" + toString() +
"] is not supported by the node version " + version.toString());
}
Duplicating this kind of code is a sure no-no, so we clearly need a helper method that will decide if 1) if type is compatible with the version, and 2) if it even needs to be serialized or not.
} | ||
|
||
@Override | ||
public GeoPoint getKey() { | ||
return GeoPoint.fromGeohash(geohashAsLong); | ||
// TODO/FIXME: is it ok to change from GeoPoint to Object, and return different types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, np
This patch should not introduce any changes in the existing ES behavior. Its only goal is to allow subsequent addition of various hashing algorithms, such as quadkey, pluscode, hex, ...
oh actually I just read the comment above that says this will only affect the transport client. Let me think a bit on this and chat with @cbuescher about it |
@@ -49,11 +48,13 @@ | |||
GeoHashGrid { | |||
static class Bucket extends InternalMultiBucketAggregation.InternalBucket implements GeoHashGrid.Bucket, Comparable<Bucket> { | |||
|
|||
protected GeoHashType type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the type be a field in the enclosing InternalGeoHashGrid/ParsedGeoHashGrid? The type should be the same for all buckets in the aggregation, it doesn't make much sense to serialize/deserialize it for every bucket. Also if we write the type id back on the REST response, it should probably be part of the aggregation object, not the bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be complicated by the fact that the buckets need access to the type in its getters, the buckets cannot see any non-static members of the enclosing aggregation class atm because of their static nature. Not sure what the best approach is here, but having the type encoded in each bucket seems not optimal.
@nyurik after my discussion last week with @colings86 I took the time to check one of the options of how we can make this work with the high level rest client. We talked about using something similar to the RestSearchAction.TYPED_KEYS_PARAM flag that we already need for aggregations and suggestions to be parsable by the java rest client in general. I whipped something up along these lines in cbuescher@a11f0fa but while doing this ran into several related problems. The commit is a quick and dirty hack to illustrate what I mean but there is a lot of whacky workarounds in there and it wouldn't work atm because some parts are missing. Some of the problems I encountered are:
|
After talking with @cbuescher I have raised #31579 which if agreed upon will help to alleviate some of the issues here. We will talk about this in the weekly meeting on Monday next week and we should await that decision before deciding how to proceed here |
Per #31579 (comment) , should there be any changes to this PR to add any extra handling to non-geohashing hash type to the rest api? It seems the intention is to use string-only |
The GeoHashGrid bucket class currently returns a GeoPoint with its getKey() method. The REST API returns geohash as a String in the key field of the bucket. This changes the getKey() method to also return a String. Having the ability to get a GeoPoint from a bucket that represents a geohash cell is misleading because the cell does not represent a point but an area. Instead, it should be the clients responsibility to make the decision on how to represent the bucket key in its application. This also simplifies implementing the high level rest client aspects of elastic#30320, as the client will not need to know about the GeoHashType being used and will only care that there is some string key for the buckets. Relates to elastic#30320
@nyurik the idea was that with #31748 there is nothing to change in ParsedGeoHashGrid to support parsing back the GeoPoint, so we won't neet to send the encoding type back with the Rest response as well. That solves a lot of issues on master (7.0). For backporting I see the following options:
I might be missing sth, I suggest we talk this over with @colings86. |
@colings86 any updates on this? |
I've spoken to a few different people about this since getting back from holiday and I'd like to propose the following (thanks to @jpountz for this idea):
This means that we retain the current functionality and we do not break backwards compatibility on either the REST, transport of HL REST Client layers, moreover we don't end up with a silent java break which is not shown at compile time. It also means that we can get the multiple hash types support into 6.x in a clean way which will not be possible with the other suggestions because they involve either hacks or silent breaks. |
@colings86 thanks! If possible, do you think we can get this into 6.4? Also, should I work on changing the patch, or someone else wants to handle the rename? Thanks, I like your idea! |
@nyurik If you are able to change this PR to incorporate my proposal above that would be great? |
@colings86 I could create a copy of the existing code, but I doubt I would be easily able to refactor it in such a way as to simulate both behaviors. If a (temporary) full copy is ok, than sure, I could try hacking on it. |
@nyurik yes we should create a full copy for now so we have the existing |
a gentle ping on this PR. Are there plans to move this forward and add to ES @elastic/es-search-aggs? thx |
@thomasneirynck Its something we are willing to add to ES. Given the last comment from @nyurik I had understood that he was going to work on the changes to expose this functionality in a new |
@nyurik are you still working on this and other aggregation PRs? |
@Destroy666x (very appropriate user name :) ) - yes and no. It is on my plate to work on this, but I need to finish my current project (probably another week+ or so). If someone wants to work on it, I will be happy to help them get started quickly. |
Thanks for info! I'm asking because point aggregations are painful in ES using only geohash grid with very few precision levels + centroid, so I guess switching to PostGIS for now is more reasonable than waiting since it doesn't look like any additional clustering methods will be available anytime soon and no time to implement them myself either. |
I'm closing this PR in favor of #37277 -- I'm reworking this PR with the help of @talevy. The main changes from this approach:
|
To make further refactoring of GeoGrid aggregations easier (related: elastic#30320), splitting out these inner class dependencies into their own files makes it easier to map the relationship between classes
To make further refactoring of GeoGrid aggregations easier (related: #30320), splitting out these inner class dependencies into their own files makes it easier to map the relationship between classes
This patch should not introduce any changes in the existing ES behavior. Its only goal is to allow subsequent addition of various hashing algorithms, such as quadkey, pluscode, hex, .... The only user-visible change is the addition of the new
type
parameter to thegeohash_grid
aggregation. The only allowed value isgeohash
(default).This PR is required for #30231 and #30240
Open Questions
innerBuild()
seems to run after parameter parsing, but before the values are serialized byinnerWriteTo()
. Is it a good approach to do the cross-param validation and update the builder's state (set integer precision), so that serialization would only use int, not the parameter string?TODO
testParseErrorOnPrecisionOutOfRange
and similar ones currently fail because the code no longer throws an error immediately when parsingprecision
input parameter. Instead, the error is thrown inGeoGridAggregationBuilder.innerBuild()
at a slightly later stage, once all parameters have been parsed, and the hashing type is known.Rename(decided to postpone this for a follow up refactoring PR to reduce reviewing complexity)geohash_grid
intogeo_grid
. Keepgeohash_grid
as a deprecated alias togeo_grid
.ParsedGeoHashGrid
to support the new type. For some reason, other hashing algorithms worked fine without any modifications to that class and no unit tests have complained about this (bug?) cc: @cbuescher @javannaelasticsearch/server/src/main/java/org/elasticsearch/search/aggregations/bucket/geogrid/ParsedGeoHashGrid.java
Line 31 in 99f88f1
CC: @colings86