Skip to content
This repository was archived by the owner on Feb 22, 2022. It is now read-only.

[stable/prometheus] Allow multiple server replicas for HA setup #5115

Closed
serathius opened this issue Apr 18, 2018 · 11 comments · Fixed by #7116
Closed

[stable/prometheus] Allow multiple server replicas for HA setup #5115

serathius opened this issue Apr 18, 2018 · 11 comments · Fixed by #7116
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@serathius
Copy link

serathius commented Apr 18, 2018

Prometheus HA setup is configuration where you can multiple prometheus servers with same configuration scraping same targets.
It really simple setup and usually it would require just setting replicaCount to 2.

Unfortunately it's not possible in current helm chart because it causes deploying 2 servers trying to use same persistent volume claim.
As result one server is deployed, other one is stuck waiting for volume that is already being used.

Proposed solution:
Migrate prometheus server deployment to stateful set and replace PVC with template inside server manifest.

How to reproduce it:
helm install stable/prometheus --set server.replicaCount=2

Anything else we need to know:
I can help with implementation.

cc @mgoodness

@empyrean987
Copy link

I create the persistent volume claims, before launching prometheus, so the data lives through tearing prometheus down and back up again. This is the only thing I think that you really need to do so data is not lost. But the issue is if we use more than replica count of 1 for HA which is the next step towards redundancy. How do we map the persistent volume claims to the replicas???

  persistentVolume:
    ## If true, Prometheus server will create/use a Persistent Volume Claim
    ## If false, use emptyDir
    ##
    enabled: true

    ## Prometheus server data Persistent Volume access modes
    ## Must match those of existing PV or dynamic provisioner
    ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
    ##
    accessModes:
      - ReadWriteOnce

    ## Prometheus server data Persistent Volume annotations
    ##
    annotations: {}

    ## Prometheus server data Persistent Volume existing claim name
    ## Requires server.persistentVolume.enabled: true
    ## If defined, PVC must be created manually before volume will be bound
    existingClaim: "prometheus-1"

@jpds
Copy link
Collaborator

jpds commented Jun 20, 2018

I took the approach of deploying the chart twice into two different namespaces and having the pipeline run over the values.yaml for both of them. One for AZ-A and the other for AZ-B - another advantage of this being that one knows that they're explicitly in different availability zones.

@stale
Copy link

stale bot commented Aug 19, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2018
@stale
Copy link

stale bot commented Sep 2, 2018

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Sep 2, 2018
@marcuslindfeldt
Copy link

Has this issue been resolved yet? We should probably remove the replicaCount value for prometheus server if it's not supported. And maybe document how to achieve HA with @jpds approach if it's not possible to solve in an other way?

@giacomoguiulfo
Copy link
Collaborator

giacomoguiulfo commented Oct 12, 2018

@marcuslindfeldt This PR will address the issue by switching Prometheus from a Deployment to a StatefulSet #7116

@prageethw
Copy link

@giacomoguiulfo my understanding is that this still does not solve the problem of not having the ability to run more than 1 replicas of prom server ( I tried it crashes :))? so should we still keep the ability to specify multiple replicas in the helm chart?

@giacomoguiulfo
Copy link
Collaborator

@prageethw Did you configured it properly? It is not as simple as changing replicas...

@prageethw
Copy link

@giacomoguiulfo I just followed instructions in helm... DO you have a sample that I can have a look, please?
I thought it is simple as having the number of replicas, I can't find any other info in helm chart, to be honest :(

@giacomoguiulfo
Copy link
Collaborator

giacomoguiulfo commented Jan 15, 2019

@prageethw You have to set x.statefulset.enabled to true, where x is the server and/or alertmanager. Documentation is in the values.yaml file. Feel free to create an issue (or PR) to add this to the README.md and I will fix (or review) it later.

@prageethw
Copy link

@giacomoguiulfo thanks i will update docs with pr

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants