Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for exposing query analysis #3276

Merged
merged 27 commits into from
May 6, 2024
Merged

API for exposing query analysis #3276

merged 27 commits into from
May 6, 2024

Conversation

aleks-p
Copy link
Contributor

@aleks-p aleks-p commented May 3, 2024

Exposes a new API that provides insights into the amount of data in the queried time range, as well as the number of unique series reached via the label selector.

Example output (note that numbers are serialized as strings):

{
  "queryScopes": [
    {
      "componentType": "Short term storage",
      "componentCount": "15",
      "numBlocks": "169",
      "numSeries": "1423610",
      "numProfiles": "31250250",
      "numSamples": "2691866373",
      "indexBytes": "429009665",
      "profileBytes": "13782371141",
      "symbolBytes": "10660962037"
    },
    {
      "componentType": "Long term storage",
      "componentCount": "2",
      "numBlocks": "20",
      "numSeries": "845298",
      "numProfiles": "115831090",
      "numSamples": "10271806846",
      "indexBytes": "246238688",
      "profileBytes": "60527297722",
      "symbolBytes": "9774455978"
    }
  ],
  "queryImpact": {
    "totalBytesInTimeRange": "95420335231",
    "totalQueriedSeries": "436",
    "deduplicationNeeded": true
  }
}

To be extended in the future with an estimate of the query execution time and other statistics.

Closes #3001

@aleks-p aleks-p requested a review from a team as a code owner May 3, 2024 21:39
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! 🎉

I believe this a very good starting point of making the read path more transparent. Ideally, we could also collect the actual execution statistics and report them along the way with the query results (like EXPLAIN ANALYZE in SQL).

Querying series might be fairly expensive in some cases (e.g., if there are high cardinality labels in the data set), therefore we should be careful calling the analysis. Also, Series API reports matching series in the block not accounting for the time range, which should not pose an issue in vast majority of cases – just a clarification: for example, you may query 15 minutes and get no data, and the analysis will tell you that there are 5 matching series

@aleks-p aleks-p force-pushed the feat/explain-query branch from 4983a17 to 0deea6e Compare May 6, 2024 14:20
@aleks-p aleks-p requested a review from a team as a code owner May 6, 2024 14:20
@aleks-p aleks-p requested a review from korniltsev as a code owner May 6, 2024 14:21
@aleks-p
Copy link
Contributor Author

aleks-p commented May 6, 2024

Thanks @kolesnikovae.

Ideally, we could also collect the actual execution statistics and report them along the way with the query results (like EXPLAIN ANALYZE in SQL).

Agreed. The purpose of this first iteration is to provide an efficient endpoint that could be used before the actual query, serving as a sanity check for the query itself.

Series API reports matching series in the block not accounting for the time range

I didn't know Series doesn't respect the start/end (aside from validity checks), so indeed this can result in some inconsistencies. Thanks for the heads up on that, we'll need to take the numbers with a grain of salt for now then.

Querying series might be fairly expensive in some cases (e.g., if there are high cardinality labels in the data set), therefore we should be careful calling the analysis. Also,

I added tenant-level overrides for now, one for the entire endpoint (query_analysis_enabled, defaults to true), and one for the series portion of it (query_analysis_series_enabled, defaults to false).

@aleks-p aleks-p merged commit f4f2c43 into main May 6, 2024
16 checks passed
@aleks-p aleks-p deleted the feat/explain-query branch May 6, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide an API that explains the query execution plan
2 participants