[stable/prometheus] Allow multiple server replicas for HA setup #5115

serathius · 2018-04-18T15:29:24Z

Prometheus HA setup is configuration where you can multiple prometheus servers with same configuration scraping same targets.
It really simple setup and usually it would require just setting replicaCount to 2.

Unfortunately it's not possible in current helm chart because it causes deploying 2 servers trying to use same persistent volume claim.
As result one server is deployed, other one is stuck waiting for volume that is already being used.

Proposed solution:
Migrate prometheus server deployment to stateful set and replace PVC with template inside server manifest.

How to reproduce it:
helm install stable/prometheus --set server.replicaCount=2

Anything else we need to know:
I can help with implementation.

cc @mgoodness

The text was updated successfully, but these errors were encountered:

empyrean987 · 2018-05-04T18:09:12Z

I create the persistent volume claims, before launching prometheus, so the data lives through tearing prometheus down and back up again. This is the only thing I think that you really need to do so data is not lost. But the issue is if we use more than replica count of 1 for HA which is the next step towards redundancy. How do we map the persistent volume claims to the replicas???

  persistentVolume:
    ## If true, Prometheus server will create/use a Persistent Volume Claim
    ## If false, use emptyDir
    ##
    enabled: true

    ## Prometheus server data Persistent Volume access modes
    ## Must match those of existing PV or dynamic provisioner
    ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
    ##
    accessModes:
      - ReadWriteOnce

    ## Prometheus server data Persistent Volume annotations
    ##
    annotations: {}

    ## Prometheus server data Persistent Volume existing claim name
    ## Requires server.persistentVolume.enabled: true
    ## If defined, PVC must be created manually before volume will be bound
    existingClaim: "prometheus-1"

jpds · 2018-06-20T08:17:36Z

I took the approach of deploying the chart twice into two different namespaces and having the pipeline run over the values.yaml for both of them. One for AZ-A and the other for AZ-B - another advantage of this being that one knows that they're explicitly in different availability zones.

stale · 2018-08-19T02:17:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale · 2018-09-02T03:07:29Z

This issue is being automatically closed due to inactivity.

marcuslindfeldt · 2018-10-12T06:35:42Z

Has this issue been resolved yet? We should probably remove the replicaCount value for prometheus server if it's not supported. And maybe document how to achieve HA with @jpds approach if it's not possible to solve in an other way?

giacomoguiulfo · 2018-10-12T18:43:35Z

@marcuslindfeldt This PR will address the issue by switching Prometheus from a Deployment to a StatefulSet #7116

prageethw · 2019-01-13T08:57:26Z

@giacomoguiulfo my understanding is that this still does not solve the problem of not having the ability to run more than 1 replicas of prom server ( I tried it crashes :))? so should we still keep the ability to specify multiple replicas in the helm chart?

giacomoguiulfo · 2019-01-14T08:51:48Z

@prageethw Did you configured it properly? It is not as simple as changing replicas...

prageethw · 2019-01-15T01:26:42Z

@giacomoguiulfo I just followed instructions in helm... DO you have a sample that I can have a look, please?
I thought it is simple as having the number of replicas, I can't find any other info in helm chart, to be honest :(

giacomoguiulfo · 2019-01-15T13:37:21Z

@prageethw You have to set x.statefulset.enabled to true, where x is the server and/or alertmanager. Documentation is in the values.yaml file. Feel free to create an issue (or PR) to add this to the README.md and I will fix (or review) it later.

prageethw · 2019-01-16T00:07:53Z

@giacomoguiulfo thanks i will update docs with pr

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2018

stale bot closed this as completed Sep 2, 2018

giacomoguiulfo mentioned this issue Oct 12, 2018

[stable/prometheus]: add optional Prometheus StatefulSets #7116

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/prometheus] Allow multiple server replicas for HA setup #5115

[stable/prometheus] Allow multiple server replicas for HA setup #5115

serathius commented Apr 18, 2018 •

edited

Loading

empyrean987 commented May 4, 2018

jpds commented Jun 20, 2018 •

edited

Loading

stale bot commented Aug 19, 2018

stale bot commented Sep 2, 2018

marcuslindfeldt commented Oct 12, 2018

giacomoguiulfo commented Oct 12, 2018 •

edited

Loading

prageethw commented Jan 13, 2019

giacomoguiulfo commented Jan 14, 2019

prageethw commented Jan 15, 2019

giacomoguiulfo commented Jan 15, 2019 •

edited

Loading

prageethw commented Jan 16, 2019

[stable/prometheus] Allow multiple server replicas for HA setup #5115

[stable/prometheus] Allow multiple server replicas for HA setup #5115

Comments

serathius commented Apr 18, 2018 • edited Loading

empyrean987 commented May 4, 2018

jpds commented Jun 20, 2018 • edited Loading

stale bot commented Aug 19, 2018

stale bot commented Sep 2, 2018

marcuslindfeldt commented Oct 12, 2018

giacomoguiulfo commented Oct 12, 2018 • edited Loading

prageethw commented Jan 13, 2019

giacomoguiulfo commented Jan 14, 2019

prageethw commented Jan 15, 2019

giacomoguiulfo commented Jan 15, 2019 • edited Loading

prageethw commented Jan 16, 2019

serathius commented Apr 18, 2018 •

edited

Loading

jpds commented Jun 20, 2018 •

edited

Loading

giacomoguiulfo commented Oct 12, 2018 •

edited

Loading

giacomoguiulfo commented Jan 15, 2019 •

edited

Loading