Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querier: Limit LabelNames; LabelValues to certain time period. #1811

Closed
bwplotka opened this issue Nov 28, 2019 · 26 comments · Fixed by #2964 or #3147
Closed

querier: Limit LabelNames; LabelValues to certain time period. #1811

bwplotka opened this issue Nov 28, 2019 · 26 comments · Fixed by #2964 or #3147

Comments

@bwplotka
Copy link
Member

Something that Cortex is doing is that LabelNames;LabelValues by default are limited to active series only on ingesters (so let's say in Thanos world to Thanos receiver or Prometheus for latest 2h). This is due to the fact that when you query you are usually interested in autocompletion for fresh metrics.

We are now exploring if we can apply the same logic for Thanos e.g limit LabelNames and LabelValues and Series (by default) on Querier QueryAPI to 2h.

IF we don't do it, StoreGateway will have to load portions of the blocks in memory for full-time retention, for all store APIs, which is something we wanted to avoid. It was briefly mentioned here: #39 Also currently this is limited to raw data only, so it's already scoped for certain setups with retention for older blocks.

@ppanyukov
Copy link
Contributor

Do I understand correctly this does not affect the query execution and any memory limits for that?

@bwplotka
Copy link
Member Author

Which memory limits you have in mind?

@ppanyukov
Copy link
Contributor

The limits on memory taken during query execution, as this PR attempts to address.

#1747

@bwplotka
Copy link
Member Author

bwplotka commented Nov 28, 2019

Yes, I still don't see how limits improve those methods. This issue is required to be able to have this: #1813

@pracucci
Copy link
Contributor

I would be glad to work on this. Few questions:

  • Should the period be configurable?
  • Should we introduce a breaking change enforcing a default of 2h, or should we keep an unlimited default just introducing an option to slim it down?

We are now exploring if we can apply the same logic for Thanos e.g limit LabelNames and LabelValues and Series (by default) on Querier QueryAPI to 2h.

FYI, Cortex doesn't enforce any short default for the Series. It currently uses Prometheus defaults, which is the entire history. Do we want to change this in Thanos, and enforce a default of 2h for Series as well?

@bwplotka
Copy link
Member Author

bwplotka commented Nov 29, 2019

Nice!

Should the period be configurable?

yes

Should we introduce a breaking change enforcing a default of 2h, or should we keep an unlimited default just introducing an option to slim it down?

We can introduce a breaking change if needed, but we need to be explicit in changelog. The question is, should we?

So what makes me think we should have unlimited default, for now: is the fact that if user goes to the dashboard with some templates using those APIs and asks for data month ago it will get label values for active series. Let's say he had eu4-production cluster month ago and now not, he won't be able to see it in his Grafana variables, right? (: I think it's should be a Thanos user choice.

I think simultaneously we should start a discussion about:

In terms of series those are already based on time, but we should in similar way define a default if not time range is specified, based on some flag we define: "just introducing an option to slim it down?"

@brancz
Copy link
Member

brancz commented Dec 5, 2019

I personally think this should actually even be a query parameter on the API endpoints. I believe there are even issues about this on Prometheus. A Prometheus with many blocks is just as affected by this as Thanos/Cortex are. I feel by default those endpoints should only respond with the active label names/values everywhere.

@davkal
Copy link

davkal commented Dec 6, 2019

This is due to the fact that when you query you are usually interested in autocompletion for fresh metrics.

I'd challenge this fact, and I've been bitten by this in post-mortems where I set the time picker in Grafana to the time of incident and then tried to get typeahead suggestions for instances that at the time of querying were long gone.
Currently Cortex's series API ignores start and end, so I don't get the behavior I expect.
The downside would be though that if those non-recent queries would take long (e.g. >2s) then it's kind of pointless anyway for tab completion.

@bwplotka
Copy link
Member Author

bwplotka commented Dec 6, 2019

I agree with the functionality and latency aspect. As I mentioned:

So what makes me think we should have unlimited default, for now: is the fact that if the user goes to the dashboard with some templates using those APIs and asks for data month ago it will get label values for active series. Let's say he had eu4-production cluster month ago and now not, he won't be able to see it in his Grafana variables, right?

In terms of Thanos there is a clear improvement to be done for series http://github.com/thanos-io/thanos/issues/1822

Overall I think we should be able to return those in a reasonable time from old storage.

@davkal so if Prometheus upstream would have labelNames and labelValues with parameters start and end time, I guess it's trivial to make sure Grafana passes those in e.g dashboards templates? Also, are those APis used in Explore UI as well?

@davkal
Copy link

davkal commented Dec 7, 2019

so if Prometheus upstream would have labelNames and labelValues with parameters start and end time, I guess it's trivial to make sure Grafana passes those in e.g dashboards templates? Also, are those APis used in Explore UI as well?

Theoretically yes. But so far it's been more nuanced:

  • We're using the label API for the initial request to populate the metrics selector and metrics suggestions in the query field. So far that's unbounded and in Prometheus would return all lifetime labels AFAIK.
  • For the label completion we're using the series API because it allows us facetting: returning subsets of possible label spaces as opposed to the whole label space. The series API supports time parameters and we make use of that.
  • We have not used the series API for the initial metrics selector requests because an empty series API request would in effect return all series and historically had poor performance (and returned much more data than needed).

When time parameters are supported by the API, it's trivial to plug them in for both dashboards and Explore.

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@brancz
Copy link
Member

brancz commented Jan 13, 2020

I believe this is still something we want to work on.

@stale stale bot removed the stale label Jan 13, 2020
@stale
Copy link

stale bot commented Feb 12, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Feb 12, 2020
@bwplotka
Copy link
Member Author

Still valid.

@stale stale bot removed the stale label Feb 12, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Feb 19, 2020

TODO: Propose a new API to Prometheus. cc @rnaveiras

@bwplotka
Copy link
Member Author

Proposed: prometheus/prometheus#6865

@stale
Copy link

stale bot commented Mar 25, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Mar 25, 2020
@stale stale bot closed this as completed Apr 1, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Apr 1, 2020

Still needed (:

@bwplotka bwplotka reopened this Apr 1, 2020
@stale stale bot removed the stale label Apr 1, 2020
@stale
Copy link

stale bot commented May 1, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 1, 2020
@bwplotka
Copy link
Member Author

bwplotka commented May 2, 2020 via email

@stale stale bot removed the stale label May 2, 2020
@stale
Copy link

stale bot commented Jun 1, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 1, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Jun 1, 2020 via email

@stale stale bot removed the stale label Jun 1, 2020
@stale
Copy link

stale bot commented Jul 1, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 1, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Jul 1, 2020

This was done on Prometheus side, so we can start using it! Help wanted

@stale
Copy link

stale bot commented Jul 31, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@bwplotka
Copy link
Member Author

The params were added but limit not introduced yet @kakkoyun is working on it.

@bwplotka bwplotka reopened this Sep 11, 2020
@stale stale bot removed the stale label Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants