Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the
prometheus-longterm-metrics
andthanos
optional components #461base: master
Are you sure you want to change the base?
Add the
prometheus-longterm-metrics
andthanos
optional components #461Changes from 2 commits
d981ef3
06dc997
0d7178e
42f687d
a921527
0307996
2eab8b7
76e60f7
3f75d49
79a531d
59f6c68
2c6d5f5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should those be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which bits would be useful to configure? The endpoints, paths, images, other?
I agree that we could always add more configuration options, I'm just wondering which are a priority for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoints would be the priority, to allow serving them from some other location, though still a low priority relative to the feature as a whole. Worst case, redirects can be defined in the nginx configuration, so don't block the PR just for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indefinitely ! Do we actually want this? Can we set an expiry after like 10 years?
Grafana will be able to display data from Thanos go to back to 10 years? With this kind of extreme long term stats, what is the UI to visualize it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can choose to change this if you wish. Thanos suggests keeping data indefinitely by default. If you do not need to keep data forever, I suggest just using the
prometheus-longterm-monitoring
component without thanos and setting thePROMETHEUS_LONGTERM_RETENTION_TIME
to whatever you'd like.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, is there a switch to disable Thanos?
Same question, in the case we want to use Thanos, how to visualize the data stored on Thanos? I assume if Thanos is enabled, the retention duration on the Prometheus side will be very short to avoid doubling the storage so without data being stored in Prometheus, how to visualize that data stored on Thanos.
Just a question. If another component is required, we can do it in a follow up Pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer my own question, Thanos is actually a separate component so it does not have to be enabled together with the Prometheus-long-term component? The Prometheus-long-term component can function standalone of Thanos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right.
prometheus-longterm-metrics
collects and stores specific metrics that we want to keep for longer fromprometheus
. If you want to also enablethanos
, thenthanos
will store those same metrics in a much more compact/efficient way so that you can store more data over a longer time period.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with forever storage for long-term-metrics. The point is to keep an archive of key metrics. If those are daily or hourly, archiving a few dozen metrics won't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see how much space that will take in practice and we will adjust. And eventually we need a way to visualize those older metrics. Otherwise, what's the point to keep them forever if we can not visualize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this dependency will make the new Prometheus long term not able to run standalone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it cannot run standalone, it collects metrics from the services that are enabled in the monitoring component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say I already have a bunch of PAVICS servers running with monitoring enabled and I just want to point this new Prometheus to aggregate all the data?
Can be in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to create a method for saving existing prometheus data over the long term for a single birdhouse/PAVICS deployment. If you want something that will collect prometheus data for multiple servers I would recommend creating a new repository to host that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the architecture here is pluggable, we do not need a separate repo and duplicate the work. To deploy the 2nd Prometheus only, on a separate machine, I see we only enable the proxy and the prometheus-longterm-metrics and optionally thanos components on the new machine and that's it.
But agree we can make this "standalone" support in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a long term but it still scrape every 15 sec? Or I might misunderstood this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It collects and stores metrics that are already collected by the other prometheus service. Depending on how often the other prometheus instance is calculating those metrics, you may not need such a small
scrape_interval
. What interval would you suggest?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not fully understand federation yet but I had the impression the 2nd Prometheus would query the max/min/average value from the 1st Prometheus at a much longer interval. Ex: the 1st Prometheus scrap every minute. The 2nd Prometheus would scrap every hour, taking the max/min/average over the hour of the 1st Prometheus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need multiple prometheus instances for that. You just need to create recording rules that calculate the metrics that you want to store for a longer period and then use the second prometheus instance to grab those metrics for long-term storage.
The second prometheus instance is necessary only because the data retention time is configured for each instance, so if you want to keep data longer you need a separate prometheus instance.
If you're storing data for a long time with
thanos
instead, you still need a second prometheus instance so that you can select which datathanos
will store.thanos
stores all data from a prometheus instance, you cannot pick and choose, so we get around this problem by only collecting the data we are interested in the "longterm" prometheus instance and only usingthanos
to store data from that specific instance.The best way to think about it is:
prometheus
: collects all metrics we are interested in and stores them for a short period of timeprometheus-longterm-metrics
: selects some metrics collected byprometheus
that we are interested in storing for a long period of time.thanos
: selects all metrics fromprometheus-longterm-metrics
and stores them efficiently over an even longer period of timeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see clearer now, thanks !
Just to confirm, the current setup
prometheus-longterm-metrics
is collecting everything fromprometheus
so we are not yet "selecting some metrics" yet, right?It's alright we can perform the selection in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps sorry, just read your other reply now and so to answer my own remark:
prometheus-longterm-metrics
is collecting only metrics fromgroup: longterm-metrics
fromprometheus
.