Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Apr 20, 2017

Note: this pull request is against a feature branch.

This pull request adds the logic to parse InternalDateHistogram and InternalHistogram aggregations. To do that, it introduces a ParsedMultiBucketAggregation that implements the MultiBucketsAggregation from core. This class provides a base ParsedBucket that can be extended by parsed implementations to fit their specific needs.

For now, the parsing logic for aggregations and buckets reside in the ParsedHistogram and ParsedDateHistogram implementations. Some code could be shared but it makes everything harder to read and understand (I'm still looking at how to improve this).

The ParsedHistogram.ParsedBucket and ParsedDateHistogram.ParsedBucket are able to parse sub aggregations. They also handle the parsing logic when aggregations and buckets are keyed/not keyed.

It also introduces a InternalMultiBucketAggregationTestCase that takes care of verifying the aggregations and multiple buckets. It has a assertMultiBucketsAggregation that can checks the buckets in order or not.

Similarly to the existing InternalSingleBucketAggregationTestCase, the InternalMultiBucketAggregationTestCase randomly creates multi bucket aggregations that have sub aggregations of the same type (ie during tests, InternalDateHistogram can only have buckets with aggregations of type InternalDateHistogram). This makes things easier when checking the aggregations contained in a bucket - it uses a recursive call to assertMultiBucketsAggregation.

@tlrx tlrx requested review from cbuescher and javanna April 20, 2017 13:55
@tlrx tlrx changed the base branch from master to feature/client_aggs_parsing April 20, 2017 13:57
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to share this block of code that parses an inner aggregation. It is going to be needed in many places and it should be exactly the same everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could go to master and be called already in Suggest#fromXContent ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I do it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #24240

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlrx this is a great PR, I had a first look trying to understand the whole inheritance hierarchy and left a couple of questions and a few suggestions but I think this looks great and hope it will extend a long way to other multi bucket aggregations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the generic type here? The interface just defines Object getKey() but I guess it safes us some casting somewhere else. Just asking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required but I find it helpful, no casting when parsing the specialized buckets. But I can remove this if you really find it unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to understand why its there. At the moment it doesn't complicate things a lot, and if if helps avoiding casts thats fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the difference between keyAsString/keyedString. When I look at InternalDateHistogram#toXContent they should be the same value, only that one isn't rendered for the DocValue.RAW format? Is the destinction here made to ensure the same xContent rendering on for the parsed aggregation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is not easily readable. The distinction is effectively here to ensure the same rendering, I use the non null keyedString to know if the bucket must be keyed.

When parsing, we need to know if the bucket is keyed and if the key_as_string field has been parsed. I could replace keyedString/keyAsString with a boolean keyed, a boolean hasKeyAsString and a String keyAsString. Would that be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it would make understanding the code easier, now it looks like the two strings could have different values. Also, wouldn't hasKeyAsString and keyAsString == null be the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, if the getKeyAsString() method uses the DocValueFormat.RAW.format() when no keyAsString is present then we can achieve what you suggest, thanks. We won't need to keep the "keyed" field value around since this is either provided by RAW or the key_as_string field will be parsable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me sets mode brain on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not supported? Looks like if would work similar to getAsMap()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just temporary, the whole Aggregation anonymous class can be replaced now #24184 has been merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my question about keyedString/keyAsString above, could this simply be a boolean flag (and then be renamed?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be exactly like doXContentBody in InternalDateHistogram, maybe this could be factored out into a static (interface?) method (adding the keyed field as argument)? Maybe leaving this as two separate versions will help with potential later decoupling though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that many aggregations that can be "keyed" will share a similar doXContentBody(), but I wanted to parse more aggregations before factorizing things like this. I'm ok to let a //norelease with a TODO just to be sure to revisit this suggestion before merging to master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the static block in ParsedDateHistogram, maybe we can have a static helper method here that takes a parser and the bucket parser function as arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good suggestion

@tlrx tlrx force-pushed the add-parsing-for-date-histogram branch from 3b25595 to 47c6834 Compare May 2, 2017 10:44
@javanna javanna removed the >non-issue label May 3, 2017
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments, mostly minors, LGTM otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need to use the getter here? especially compared to below where we don't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so that subclasses can potentially handle special logic so that it prints out the same XContent when a keyed bucket with RAW doc value format has been parsed back. I added a comment for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this extend ParsedAggregation? then it implements ToXContent and we can drop a cast below I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even better use Aggregations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we had Aggregations as a member we could call Aggregations#toXContentInternal instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - I changed this once #24442 was merged in the feature branch and then I saw your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregations is not abstract anymore, you can remove the curly brackets. also if we had Aggregations as a member we wouldn't have to create it on the fly here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like it could be done, unless I am missing some differences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tlrx tlrx force-pushed the add-parsing-for-date-histogram branch from db3a5e6 to 6b89a28 Compare May 3, 2017 09:53
@tlrx
Copy link
Member Author

tlrx commented May 3, 2017

@javanna Thanks for your review. Can you have another look please?

}

protected Aggregations(List<? extends Aggregation> aggregations) {
public Aggregations(List<? extends Aggregation> aggregations) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't see it

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a small comment, LGTM besides that one.

// Subclasses can override the getKeyAsString method to handle specific cases like
// keyed bucket with RAW doc value format where the key_as_string field is not printed
// out but we still need to have a string version of the key to use as the bucket's name.
builder.startObject(getKeyAsString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then why don't we use the getter below too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be aligned - both now use getKeyAsString()

@tlrx tlrx merged commit 2ac90b3 into elastic:feature/client_aggs_parsing May 3, 2017
@tlrx
Copy link
Member Author

tlrx commented May 3, 2017

Thanks @javanna !

@tlrx tlrx deleted the add-parsing-for-date-histogram branch May 3, 2017 15:07
javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants