Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Adds support for rollup query helper based on query range inputs #1780

Open
wants to merge 7 commits into
base: feature_metric_rollup
Choose a base branch
from

Conversation

Harkishen-Singh
Copy link
Member

Signed-off-by: Harkishen-Singh harkishensingh@hotmail.com

Description

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

  • CHANGELOG entry for user-facing changes
  • Updated the relevant documentation

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

This commit adds support for creation or deletion of metric rollups by reading
the dataset-config file. The dataset config file contains a new field `downsample:`
which supports 2 options: `automatic` & `resolution`.

If `automatic` is false, then
connector updates in the database so that `_prom_catalog.scan_for_new_rollups` does not
create any new rollups (Caggs).

`resolution` is a comma separated list of resolutions for downsampling, in a way that
label:resolution:retention is the format that is needed.
Example: `short:5m:90d,long:1h:365d`
-> The resolution works in a strict manner. Its aim is to make sure that the database contains
exactly those resolutions that are mentioned in the dataset config. Like, if the database
already contains `short` and `long` downsampled data and the dataset is supplied with `short:5m:90d,very_long:1d:365d`
then `long` downsampled case is deleted and `very_long` is created.
-> `resolution` is applied if `automatic` is `true`. If automatic is not given in dataset config, it defaults to `true`
based on our plan to keep downsampling applied as a default setting.
…ollups.

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
@Harkishen-Singh Harkishen-Singh added the epic/auto-rollups Automatic metric rollups, downsampling label Dec 2, 2022
@Harkishen-Singh Harkishen-Singh self-assigned this Dec 2, 2022
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
@Harkishen-Singh Harkishen-Singh force-pushed the feat_rollup_resolution_decider branch 3 times, most recently from 30510f3 to d5b6cab Compare December 6, 2022 13:16
@Harkishen-Singh Harkishen-Singh changed the title Implement rollup schema decider based on querying range Adds support for rollup query helper based on query range inputs Dec 6, 2022
@Harkishen-Singh Harkishen-Singh marked this pull request as ready for review December 6, 2022 13:17
@Harkishen-Singh Harkishen-Singh requested review from a team as code owners December 6, 2022 13:17
@Harkishen-Singh Harkishen-Singh force-pushed the feat_rollup_resolution_decider branch 2 times, most recently from 916ff43 to e755d36 Compare December 6, 2022 13:22
Copy link
Contributor

@alejandrodnm alejandrodnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish my review, so I'll leave this as comments.

I'll give it another pass later

@@ -61,7 +61,7 @@ type Client struct {
}

// NewClient creates a new PostgreSQL client
func NewClient(r prometheus.Registerer, cfg *Config, mt tenancy.Authorizer, schemaLocker LockFunc, readOnly bool) (*Client, error) {
func NewClient(r prometheus.Registerer, cfg *Config, mt tenancy.Authorizer, schemaLocker LockFunc, readOnly, useRollups bool) (*Client, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we add useRollups to cfg *Config? It reads a little strange that we have a config struct, but it doesn't have all the config options and we need extra parameters in the signature.

I haven't checked, but maybe we can't because we tied our configs, cli flags and their namespaces are structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and we can't because rollup comes from dataset

pkg/pgclient/client.go Outdated Show resolved Hide resolved
pkg/rollup/resolution.go Outdated Show resolved Hide resolved
}
// All rollups are above upper limit. Hence, send the schema of the lowest resolution
// as this is the best we can do.
lowestRollup := h.resolutionsASC[len(h.resolutionsASC)-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If order is ascending I would expect that the highest resolution is the last.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but highest resolution is the one which has the least interval, since that will give maximum granularity. The last will be 1 hour (in example of 5mins, 15mins, 1hour) and hence 1 hour is of lowest resolution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe as it is now is counter-intuitive. I think we should first align on what resolution is. In our case we can probably define it as number of data points per time interval. Let' say we take 1h as a time for calculating resolution. So if we pick 5m granularity, that will have maximum resolution since we can fit 12 x 5000 data points in one hour. If we take 1h granularity it will be 5000 data points in 1 hour. That's why I believe that 5m is our highest resolution (not lowest) since resolution is a density of data points.
One potential solution could be to rename all resolution terms with interval. That will be more inline with current logic. It would also avoid confusion with terms b/c we are actually storing rollup intervals and not resolution

originalSchema = "prom_data"
upperLimit = 5000 // Maximum samples allowed
assumedScrapeInterval = time.Second * 30
refreshRollupResolution = time.Minute * 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got confused with this name and what it does. Maybe just refreshRollupInterval?
Btw naming of the file is resolution and the logic in it has mostly to deal with finding the right rollups to use.
I think this points that we need to agree on naming here. It does seem that we use rollup term interchangeably with resolution and I think it brings confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file mostly helps us decide which rollups to be used for the given query ranges, I am renaming the file and the struct to Decider{}.

@Harkishen-Singh Harkishen-Singh force-pushed the feat_rollup_resolution_decider branch 2 times, most recently from cbe2db7 to 45833de Compare December 12, 2022 18:30
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
@Harkishen-Singh
Copy link
Member Author

Update: We plan to not merge PRs related to querying of rollups ATM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
epic/auto-rollups Automatic metric rollups, downsampling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants