Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation #12484

Merged
merged 3 commits into from
May 9, 2022

Conversation

rohangarg
Copy link
Member

Although we have now started emitting vectorized metric to the emitter, only topn query reports that metric. Currently scan, timeseries and groupby queries don't report that metric - this is change will support that reporting.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -100,7 +101,7 @@ public void testTimeseriesWithDistinctCountAgg() throws Exception
.build();

final Iterable<Result<TimeseriesResultValue>> results =
engine.process(query, new IncrementalIndexStorageAdapter(index)).toList();
engine.process(query, new IncrementalIndexStorageAdapter(index), new DefaultTimeseriesQueryMetrics()).toList();
Copy link
Contributor

@suneet-s suneet-s May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you verify the metrics returned by calling engine.process(..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the verification is done in *QueryRunnerTests for all the query types. This is mostly to exercise the metrics code more

@@ -233,7 +235,8 @@ public Sequence<ResultRow> apply(Interval interval)
outerQuery.withQuerySegmentSpec(
new MultipleIntervalSegmentSpec(ImmutableList.of(interval))
),
new IncrementalIndexStorageAdapter(innerQueryResultIndex)
new IncrementalIndexStorageAdapter(innerQueryResultIndex),
null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pass in queryMetrics here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I passed null here since passing the query metrics needed changes in the interface - and since this execution is synthetic over the inner query results and would always be non-vectorized (due to the storage adapter used) I skipped it. If you think, I can try to make that change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing this object around would be useful in the future for adding other dimensions.

For example, let's say we wanted to know how many group by queries are executed using the v1 strategy vs the v2 strategy, then making the interface changes to pass in the queryMetrics object could be useful.

Since the scope of this PR is just to add better support for seeing whether or not queries are vectorized, I think it isn't needed to do this as part of this change, but it could be done in a future change

@@ -187,7 +188,10 @@ Sequence<ResultRow> processSubtotalsSpec(
*
* @return result sequence for the storage adapter
*/
Sequence<ResultRow> process(GroupByQuery query, StorageAdapter storageAdapter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to wire in the GroupByQueryMetrics in the processSubqueryResult, processSubtotalsSpec, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do it in those methods since they take the subquery/subtotal result sequence in the method. I think the inlined subquery should execute using a separate query object and hence should emit their own metrics object. For pushed down subqueries, I'm not sure about their metrics framework as of now but my guess is that it should use the same metrics object as the outer one. will check and update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the inlined subquery should execute using a separate query object and hence should emit their own metrics object.

Is there a test we can implement easily to validate this? If so, that would be great!

@suneet-s
Copy link
Contributor

suneet-s commented May 2, 2022

Screen Shot 2022-05-02 at 3 43 10 PM

I was able to test this change in a cluster and build some visualizations on top of the metrics being emitted. I think this is a step in the right direction and most of my comments are about passing the metrics object into more places so we can get even more visibility. +1 from me after CI passes

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's a step in the right direction and will increase visibility 🥳

Can you add this dimension to docs/operations/metrics.md so users know about this new dimension that is being emitted.

@suneet-s
Copy link
Contributor

suneet-s commented May 9, 2022

+1 after CI - I've re-triggered the second phase of tests manually because it looks like BufferHashGrouperTest#testGrowingOverflowingInteger was failing in phase 1 due to an OutOfMemory error that looked unrelated to this change.

@suneet-s suneet-s merged commit 2dd073c into apache:master May 9, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants