Add parsing methods for InternalDateHistogram and InternalHistogram #24213

tlrx · 2017-04-20T13:55:37Z

Note: this pull request is against a feature branch.

This pull request adds the logic to parse InternalDateHistogram and InternalHistogram aggregations. To do that, it introduces a ParsedMultiBucketAggregation that implements the MultiBucketsAggregation from core. This class provides a base ParsedBucket that can be extended by parsed implementations to fit their specific needs.

For now, the parsing logic for aggregations and buckets reside in the ParsedHistogram and ParsedDateHistogram implementations. Some code could be shared but it makes everything harder to read and understand (I'm still looking at how to improve this).

The ParsedHistogram.ParsedBucket and ParsedDateHistogram.ParsedBucket are able to parse sub aggregations. They also handle the parsing logic when aggregations and buckets are keyed/not keyed.

It also introduces a InternalMultiBucketAggregationTestCase that takes care of verifying the aggregations and multiple buckets. It has a assertMultiBucketsAggregation that can checks the buckets in order or not.

Similarly to the existing InternalSingleBucketAggregationTestCase, the InternalMultiBucketAggregationTestCase randomly creates multi bucket aggregations that have sub aggregations of the same type (ie during tests, InternalDateHistogram can only have buckets with aggregations of type InternalDateHistogram). This makes things easier when checking the aggregations contained in a bucket - it uses a recursive call to assertMultiBucketsAggregation.

javanna · 2017-04-20T21:07:12Z

core/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/ParsedHistogram.java

I would try to share this block of code that parses an inner aggregation. It is going to be needed in many places and it should be exactly the same everywhere.

We could do something like this ?

that could go to master and be called already in Suggest#fromXContent ?

Sure, I do it now.

Created #24240

cbuescher

@tlrx this is a great PR, I had a first look trying to understand the whole inheritance hierarchy and left a couple of questions and a few suggestions but I think this looks great and hope it will extend a long way to other multi bucket aggregations.

cbuescher · 2017-04-21T09:19:07Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

Do we need the generic type here? The interface just defines Object getKey() but I guess it safes us some casting somewhere else. Just asking.

This is not required but I find it helpful, no casting when parsing the specialized buckets. But I can remove this if you really find it unnecessary.

I just wanted to understand why its there. At the moment it doesn't complicate things a lot, and if if helps avoiding casts thats fine.

cbuescher · 2017-04-21T09:25:49Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

Trying to understand the difference between keyAsString/keyedString. When I look at InternalDateHistogram#toXContent they should be the same value, only that one isn't rendered for the DocValue.RAW format? Is the destinction here made to ensure the same xContent rendering on for the parsed aggregation?

I agree, this is not easily readable. The distinction is effectively here to ensure the same rendering, I use the non null keyedString to know if the bucket must be keyed.

When parsing, we need to know if the bucket is keyed and if the key_as_string field has been parsed. I could replace keyedString/keyAsString with a boolean keyed, a boolean hasKeyAsString and a String keyAsString. Would that be better?

To me it would make understanding the code easier, now it looks like the two strings could have different values. Also, wouldn't hasKeyAsString and keyAsString == null be the same?

Oh right, if the getKeyAsString() method uses the DocValueFormat.RAW.format() when no keyAsString is present then we can achieve what you suggest, thanks. We won't need to keep the "keyed" field value around since this is either provided by RAW or the key_as_string field will be parsable.

/me sets mode brain on

cbuescher · 2017-04-21T09:27:20Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

Why is this not supported? Looks like if would work similar to getAsMap()?

This was just temporary, the whole Aggregation anonymous class can be replaced now #24184 has been merged.

cbuescher · 2017-04-21T09:28:37Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

See my question about keyedString/keyAsString above, could this simply be a boolean flag (and then be renamed?)

cbuescher · 2017-04-21T09:46:50Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/histogram/ParsedDateHistogram.java

This seems to be exactly like doXContentBody in InternalDateHistogram, maybe this could be factored out into a static (interface?) method (adding the keyed field as argument)? Maybe leaving this as two separate versions will help with potential later decoupling though.

I think that many aggregations that can be "keyed" will share a similar doXContentBody(), but I wanted to parse more aggregations before factorizing things like this. I'm ok to let a //norelease with a TODO just to be sure to revisit this suggestion before merging to master.

cbuescher · 2017-04-21T09:59:34Z

core/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/ParsedHistogram.java

This looks very similar to the static block in ParsedDateHistogram, maybe we can have a static helper method here that takes a parser and the bucket parser function as arguments?

Yes, this is a good suggestion

javanna

left a few comments, mostly minors, LGTM otherwise

javanna · 2017-05-02T13:46:11Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

nit: do we need to use the getter here? especially compared to below where we don't?

Yes, so that subclasses can potentially handle special logic so that it prints out the same XContent when a keyed bucket with RAW doc value format has been parsed back. I added a comment for this

javanna · 2017-05-02T13:47:14Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

shall this extend ParsedAggregation? then it implements ToXContent and we can drop a cast below I think.

or even better use Aggregations?

javanna · 2017-05-02T13:52:02Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

I think if we had Aggregations as a member we could call Aggregations#toXContentInternal instead

I agree - I changed this once #24442 was merged in the feature branch and then I saw your comment.

javanna · 2017-05-03T08:53:45Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

Aggregations is not abstract anymore, you can remove the curly brackets. also if we had Aggregations as a member we wouldn't have to create it on the fly here

javanna · 2017-05-03T08:57:08Z

core/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/ParsedHistogram.java

it seems like it could be done, unless I am missing some differences.

tlrx · 2017-05-03T09:55:02Z

@javanna Thanks for your review. Can you have another look please?

javanna · 2017-05-03T10:42:32Z

core/src/main/java/org/elasticsearch/search/aggregations/Aggregations.java

    }

-    protected Aggregations(List<? extends Aggregation> aggregations) {
+    public Aggregations(List<? extends Aggregation> aggregations) {


Thanks, I didn't see it

javanna

left a small comment, LGTM besides that one.

javanna · 2017-05-03T10:48:57Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedMultiBucketAggregation.java

+                // Subclasses can override the getKeyAsString method to handle specific cases like
+                // keyed bucket with RAW doc value format where the key_as_string field is not printed
+                // out but we still need to have a string version of the key to use as the bucket's name.
+                builder.startObject(getKeyAsString());


but then why don't we use the getter below too?

It can be aligned - both now use getKeyAsString()

tlrx · 2017-05-03T11:27:34Z

Thanks @javanna !

…lastic#24213)

tlrx added :Java High Level REST Client >non-issue v6.0.0-alpha1 labels Apr 20, 2017

tlrx requested review from cbuescher and javanna April 20, 2017 13:55

tlrx changed the base branch from master to feature/client_aggs_parsing April 20, 2017 13:57

javanna reviewed Apr 20, 2017

View reviewed changes

cbuescher reviewed Apr 21, 2017

View reviewed changes

javanna removed the v6.0.0-alpha1 label Apr 25, 2017

tlrx force-pushed the add-parsing-for-date-histogram branch from 3b25595 to 47c6834 Compare May 2, 2017 10:44

javanna mentioned this pull request May 2, 2017

Java High Level REST Client plan for first release #23331

Closed

58 tasks

javanna removed the >non-issue label May 3, 2017

tlrx added 3 commits May 3, 2017 10:52

Add parsing methods for InternalDateHistogram and InternalHistogram

753067a

Update after Christoph review

a5e99c9

Fix violation

306431b

javanna approved these changes May 3, 2017

View reviewed changes

Rebase and update after Luca review

6b89a28

tlrx force-pushed the add-parsing-for-date-histogram branch from db3a5e6 to 6b89a28 Compare May 3, 2017 09:53

javanna reviewed May 3, 2017

View reviewed changes

javanna approved these changes May 3, 2017

View reviewed changes

Get it done

69d60c8

tlrx merged commit 2ac90b3 into elastic:feature/client_aggs_parsing May 3, 2017

tlrx deleted the add-parsing-for-date-histogram branch May 3, 2017 15:07

javanna mentioned this pull request May 22, 2017

Add aggs parsers for high level REST Client #24824

Merged

javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017

Add parsing methods for InternalDateHistogram and InternalHistogram (e…

0b2572e

…lastic#24213)

javanna mentioned this pull request May 23, 2017

Backport aggs parsers for high level REST Client #24844

Merged

Add parsing methods for InternalDateHistogram and InternalHistogram #24213

Add parsing methods for InternalDateHistogram and InternalHistogram #24213

Uh oh!

Conversation

tlrx commented Apr 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented May 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!