-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation #12484
Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation #12484
Conversation
@@ -100,7 +101,7 @@ public void testTimeseriesWithDistinctCountAgg() throws Exception | |||
.build(); | |||
|
|||
final Iterable<Result<TimeseriesResultValue>> results = | |||
engine.process(query, new IncrementalIndexStorageAdapter(index)).toList(); | |||
engine.process(query, new IncrementalIndexStorageAdapter(index), new DefaultTimeseriesQueryMetrics()).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you verify the metrics returned by calling engine.process(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the verification is done in *QueryRunnerTests for all the query types. This is mostly to exercise the metrics code more
@@ -233,7 +235,8 @@ public Sequence<ResultRow> apply(Interval interval) | |||
outerQuery.withQuerySegmentSpec( | |||
new MultipleIntervalSegmentSpec(ImmutableList.of(interval)) | |||
), | |||
new IncrementalIndexStorageAdapter(innerQueryResultIndex) | |||
new IncrementalIndexStorageAdapter(innerQueryResultIndex), | |||
null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pass in queryMetrics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I passed null here since passing the query metrics needed changes in the interface - and since this execution is synthetic over the inner query results and would always be non-vectorized (due to the storage adapter used) I skipped it. If you think, I can try to make that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing this object around would be useful in the future for adding other dimensions.
For example, let's say we wanted to know how many group by queries are executed using the v1 strategy vs the v2 strategy, then making the interface changes to pass in the queryMetrics object could be useful.
Since the scope of this PR is just to add better support for seeing whether or not queries are vectorized, I think it isn't needed to do this as part of this change, but it could be done in a future change
@@ -187,7 +188,10 @@ Sequence<ResultRow> processSubtotalsSpec( | |||
* | |||
* @return result sequence for the storage adapter | |||
*/ | |||
Sequence<ResultRow> process(GroupByQuery query, StorageAdapter storageAdapter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to wire in the GroupByQueryMetrics in the processSubqueryResult
, processSubtotalsSpec
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do it in those methods since they take the subquery/subtotal result sequence in the method. I think the inlined subquery should execute using a separate query object and hence should emit their own metrics object. For pushed down subqueries, I'm not sure about their metrics framework as of now but my guess is that it should use the same metrics object as the outer one. will check and update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the inlined subquery should execute using a separate query object and hence should emit their own metrics object.
Is there a test we can implement easily to validate this? If so, that would be great!
processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java
Show resolved
Hide resolved
…uring cursor creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's a step in the right direction and will increase visibility 🥳
Can you add this dimension to docs/operations/metrics.md
so users know about this new dimension that is being emitted.
+1 after CI - I've re-triggered the second phase of tests manually because it looks like |
Although we have now started emitting vectorized metric to the emitter, only topn query reports that metric. Currently scan, timeseries and groupby queries don't report that metric - this is change will support that reporting.
This PR has: