-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add parsing methods for InternalDateHistogram and InternalHistogram #24213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parsing methods for InternalDateHistogram and InternalHistogram #24213
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to share this block of code that parses an inner aggregation. It is going to be needed in many places and it should be exactly the same everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do something like this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could go to master and be called already in Suggest#fromXContent ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I do it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #24240
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlrx this is a great PR, I had a first look trying to understand the whole inheritance hierarchy and left a couple of questions and a few suggestions but I think this looks great and hope it will extend a long way to other multi bucket aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the generic type here? The interface just defines Object getKey() but I guess it safes us some casting somewhere else. Just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required but I find it helpful, no casting when parsing the specialized buckets. But I can remove this if you really find it unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to understand why its there. At the moment it doesn't complicate things a lot, and if if helps avoiding casts thats fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the difference between keyAsString/keyedString. When I look at InternalDateHistogram#toXContent they should be the same value, only that one isn't rendered for the DocValue.RAW format? Is the destinction here made to ensure the same xContent rendering on for the parsed aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is not easily readable. The distinction is effectively here to ensure the same rendering, I use the non null keyedString to know if the bucket must be keyed.
When parsing, we need to know if the bucket is keyed and if the key_as_string field has been parsed. I could replace keyedString/keyAsString with a boolean keyed, a boolean hasKeyAsString and a String keyAsString. Would that be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it would make understanding the code easier, now it looks like the two strings could have different values. Also, wouldn't hasKeyAsString and keyAsString == null be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, if the getKeyAsString() method uses the DocValueFormat.RAW.format() when no keyAsString is present then we can achieve what you suggest, thanks. We won't need to keep the "keyed" field value around since this is either provided by RAW or the key_as_string field will be parsable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/me sets mode brain on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not supported? Looks like if would work similar to getAsMap()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just temporary, the whole Aggregation anonymous class can be replaced now #24184 has been merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my question about keyedString/keyAsString above, could this simply be a boolean flag (and then be renamed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be exactly like doXContentBody in InternalDateHistogram, maybe this could be factored out into a static (interface?) method (adding the keyed field as argument)? Maybe leaving this as two separate versions will help with potential later decoupling though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that many aggregations that can be "keyed" will share a similar doXContentBody(), but I wanted to parse more aggregations before factorizing things like this. I'm ok to let a //norelease with a TODO just to be sure to revisit this suggestion before merging to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to the static block in ParsedDateHistogram, maybe we can have a static helper method here that takes a parser and the bucket parser function as arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good suggestion
3b25595 to
47c6834
Compare
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few comments, mostly minors, LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need to use the getter here? especially compared to below where we don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so that subclasses can potentially handle special logic so that it prints out the same XContent when a keyed bucket with RAW doc value format has been parsed back. I added a comment for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall this extend ParsedAggregation? then it implements ToXContent and we can drop a cast below I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even better use Aggregations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we had Aggregations as a member we could call Aggregations#toXContentInternal instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I changed this once #24442 was merged in the feature branch and then I saw your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregations is not abstract anymore, you can remove the curly brackets. also if we had Aggregations as a member we wouldn't have to create it on the fly here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like it could be done, unless I am missing some differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
db3a5e6 to
6b89a28
Compare
|
@javanna Thanks for your review. Can you have another look please? |
| } | ||
|
|
||
| protected Aggregations(List<? extends Aggregation> aggregations) { | ||
| public Aggregations(List<? extends Aggregation> aggregations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't see it
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a small comment, LGTM besides that one.
| // Subclasses can override the getKeyAsString method to handle specific cases like | ||
| // keyed bucket with RAW doc value format where the key_as_string field is not printed | ||
| // out but we still need to have a string version of the key to use as the bucket's name. | ||
| builder.startObject(getKeyAsString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then why don't we use the getter below too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be aligned - both now use getKeyAsString()
|
Thanks @javanna ! |
Note: this pull request is against a feature branch.
This pull request adds the logic to parse
InternalDateHistogramandInternalHistogramaggregations. To do that, it introduces aParsedMultiBucketAggregationthat implements theMultiBucketsAggregationfrom core. This class provides a baseParsedBucketthat can be extended by parsed implementations to fit their specific needs.For now, the parsing logic for aggregations and buckets reside in the
ParsedHistogramandParsedDateHistogramimplementations. Some code could be shared but it makes everything harder to read and understand (I'm still looking at how to improve this).The
ParsedHistogram.ParsedBucketandParsedDateHistogram.ParsedBucketare able to parse sub aggregations. They also handle the parsing logic when aggregations and buckets are keyed/not keyed.It also introduces a
InternalMultiBucketAggregationTestCasethat takes care of verifying the aggregations and multiple buckets. It has aassertMultiBucketsAggregationthat can checks the buckets in order or not.Similarly to the existing
InternalSingleBucketAggregationTestCase, theInternalMultiBucketAggregationTestCaserandomly creates multi bucket aggregations that have sub aggregations of the same type (ie during tests, InternalDateHistogram can only have buckets with aggregations of type InternalDateHistogram). This makes things easier when checking the aggregations contained in a bucket - it uses a recursive call toassertMultiBucketsAggregation.