Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/redis] Recovered master pod fail to add back the original replication group when sentinel enabled #17244

Closed
carmenlau opened this issue Sep 19, 2019 · 10 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@carmenlau
Copy link

Describe the bug
When both cluster and sentinel are enabled. The recovered master pod fail to add back to the original cluster, it forms another replication group with new master_replid instead.

Which chart:
stable/redis version 9.1.5

What happened:

  1. Install stable/redis with both cluster and sentinel enabled.
  2. When the master is down, one of the slave will take up the role of master which as expected.
  3. When the master resume, the master node will form a new replications group with new master_replid. There will be 2 replication groups in the cluster, the first one is the original replication group with new master (pod of slave statefulset), the second one is a new replication group with 1 master (pod of master statefulset) and no slave.

What you expected to happen:

  1. The master pod should add back to the original replication group as slave and synchronize with the new master.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy chart with cluster and sentinel enabled.
  2. Restart the master by deleting the master pod.
  3. Connect and check the info replication of redis nodes.
@javsalgar
Copy link
Collaborator

Hi,

I was unable to reproduce the issue. I scale down the master

❯ kubectl scale sts brazen-cricket-redis-master --replicas=0

Then one of the slaves got elected

I have no name!@brazen-cricket-redis-slave-1:/$ redis-cli -a $REDIS_PASSWORD
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2

Then I restored the master

❯ kubectl scale sts brazen-cricket-redis-master --replicas=1

And the master is a slave

I have no name!@brazen-cricket-redis-master-0:/$ redis-cli -a $REDIS_PASSWORD
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave

Could you provide more details of your environment?

@carmenlau
Copy link
Author

I tried again and the problem still exists. Please find the details below and let me know if more information is needed. Thanks!

Reproduce steps

When the master is just started.
connect to tiny-controller-session-redis-master-0

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:3
slave0:ip=10.24.2.204,port=6379,state=online,offset=65645,lag=1
slave1:ip=10.24.2.205,port=6379,state=online,offset=65782,lag=1
slave2:ip=10.24.2.206,port=6379,state=online,offset=65919,lag=0
master_replid:04a7f6407778e64d1ff0ee0a6354261043e792e9
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:65919
second_repl_offset:-1

I down scale the master to 0

> kubectl scale sts tiny-controller-session-redis-master --replicas=0

One of the slaves becomes master as expected.
connect to tiny-controller-session-redis-slave-0

127.0.0.1:6371> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.24.2.206,port=6379,state=online,offset=89892,lag=0
slave1:ip=10.24.2.205,port=6379,state=online,offset=89755,lag=0
master_replid:f6d5172073a2337b73ae6f0f67342763f6715940
master_replid2:04a7f6407778e64d1ff0ee0a6354261043e792e9
master_repl_offset:89892
second_repl_offset:88226

I scale up the master back to 1

> kubectl scale sts tiny-controller-session-redis-master --replicas=1

The master won't join back to the original replication group
connect to tiny-controller-session-redis-master-0

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:0
master_replid:6d0fb9cd02a4b2b38ca310e1839633db9db1a1f2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1

My deployment details

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.8-gke.10", GitCommit:"f53039cc1e5295eed20969a4f10fb6ad99461e37", GitTreeState:"clean", BuildDate:"2019-06-19T20:48:40Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}
  • The values that I used. I used the values-production.yaml file from the chart repo and updated sentinel.enabled to true.
## Global Docker image parameters
  ## Please, note that this will override the image parameters, including dependencies, configured to use the global value
  ## Current available global Docker image parameters: imageRegistry and imagePullSecrets
  ##
  # global:
  #   imageRegistry: myRegistryName
  #   imagePullSecrets:
  #     - myRegistryKeySecretName
  #   storageClass: myStorageClass

  ## Bitnami Redis image version
  ## ref: https://hub.docker.com/r/bitnami/redis/tags/
  ##
  image:
    registry: docker.io
    repository: bitnami/redis
    ## Bitnami Redis image tag
    ## ref: https://github.com/bitnami/bitnami-docker-redis#supported-tags-and-respective-dockerfile-links
    ##
    tag: 5.0.5-debian-9-r138
    ## Specify a imagePullPolicy
    ## Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent'
    ## ref: http://kubernetes.io/docs/user-guide/images/#pre-pulling-images
    ##
    pullPolicy: IfNotPresent
    ## Optionally specify an array of imagePullSecrets.
    ## Secrets must be manually created in the namespace.
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
    ##
    # pullSecrets:
    #   - myRegistryKeySecretName

  ## String to partially override redis.fullname template (will maintain the release name)
  ##
  # nameOverride:

  ## String to fully override redis.fullname template
  ##
  # fullnameOverride:

  ## Cluster settings
  cluster:
    enabled: true
    slaveCount: 3

  ## Use redis sentinel in the redis pod. This will disable the master and slave services and
  ## create one redis service with ports to the sentinel and the redis instances
  sentinel:
    enabled: true
    ## Bitnami Redis Sentintel image version
    ## ref: https://hub.docker.com/r/bitnami/redis-sentinel/tags/
    ##
    image:
      registry: docker.io
      repository: bitnami/redis-sentinel
      ## Bitnami Redis image tag
      ## ref: https://github.com/bitnami/bitnami-docker-redis-sentinel#supported-tags-and-respective-dockerfile-links
      ##
      tag: 5.0.5-debian-9-r131
      ## Specify a imagePullPolicy
      ## Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent'
      ## ref: http://kubernetes.io/docs/user-guide/images/#pre-pulling-images
      ##
      pullPolicy: IfNotPresent
      ## Optionally specify an array of imagePullSecrets.
      ## Secrets must be manually created in the namespace.
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
      ##
      # pullSecrets:
      #   - myRegistryKeySecretName
    masterSet: mymaster
    initialCheckTimeout: 5
    quorum: 2
    downAfterMilliseconds: 60000
    failoverTimeout: 18000
    parallelSyncs: 1
    port: 26379
    ## Additional Redis configuration for the sentinel nodes
    ## ref: https://redis.io/topics/config
    ##
    configmap:
    ## Configure extra options for Redis Sentinel liveness and readiness probes
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
    ##
    livenessProbe:
      enabled: true
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 5
      successThreshold: 1
      failureThreshold: 5
    readinessProbe:
      enabled: true
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 5
    ## Redis Sentinel resource requests and limits
    ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
    # resources:
    #   requests:
    #     memory: 256Mi
    #     cpu: 100m
    ## Redis Sentinel Service properties
    service:
      ##  Redis Sentinel Service type
      type: ClusterIP
      sentinelPort: 26379
      redisPort: 6379

      ## Specify the nodePort value for the LoadBalancer and NodePort service types.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
      ##
      # sentinelNodePort:
      # redisNodePort:

      ## Provide any additional annotations which may be required. This can be used to
      ## set the LoadBalancer service type to internal only.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
      ##
      annotations: {}
      loadBalancerIP:

  ## Specifies the Kubernetes Cluster's Domain Name.
  ##
  clusterDomain: cluster.local

  networkPolicy:
    ## Specifies whether a NetworkPolicy should be created
    ##
    enabled: true

    ## The Policy model to apply. When set to false, only pods with the correct
    ## client label will have network access to the port Redis is listening
    ## on. When true, Redis will accept connections from any source
    ## (with the correct destination port).
    ##
    # allowExternal: true

  serviceAccount:
    ## Specifies whether a ServiceAccount should be created
    ##
    create: false
    ## The name of the ServiceAccount to use.
    ## If not set and create is true, a name is generated using the fullname template
    name:

  rbac:
    ## Specifies whether RBAC resources should be created
    ##
    create: false

    role:
      ## Rules to create. It follows the role specification
      # rules:
      #  - apiGroups:
      #    - extensions
      #    resources:
      #      - podsecuritypolicies
      #    verbs:
      #      - use
      #    resourceNames:
      #      - gce.unprivileged
      rules: []

  ## Redis pod Security Context
  securityContext:
    enabled: true
    fsGroup: 1001
    runAsUser: 1001

  ## Use password authentication
  usePassword: true
  ## Redis password (both master and slave)
  ## Defaults to a random 10-character alphanumeric string if not set and usePassword is true
  ## ref: https://github.com/bitnami/bitnami-docker-redis#setting-the-server-password-on-first-run
  ##
  password: REDIS_PASSWORD
  ## Use existing secret (ignores previous password)
  # existingSecret:

  ## Mount secrets as files instead of environment variables
  usePasswordFile: false

  ## Persist data to a persistent volume (Redis Master)
  persistence: {}
    ## A manually managed Persistent Volume and Claim
    ## Requires persistence.enabled: true
    ## If defined, PVC must be created manually before volume will be bound
    # existingClaim:

  # Redis port
  redisPort: 6379

  ##
  ## Redis Master parameters
  ##
  master:
    ## Redis command arguments
    ##
    ## Can be used to specify command line arguments, for example:
    ##
    command: "/run.sh"
    ## Additional Redis configuration for the master nodes
    ## ref: https://redis.io/topics/config
    ##
    configmap:
    ## Redis additional command line flags
    ##
    ## Can be used to specify command line flags, for example:
    ##
    ## extraFlags:
    ##  - "--maxmemory-policy volatile-ttl"
    ##  - "--repl-backlog-size 1024mb"
    extraFlags: []
    ## Comma-separated list of Redis commands to disable
    ##
    ## Can be used to disable Redis commands for security reasons.
    ## Commands will be completely disabled by renaming each to an empty string.
    ## ref: https://redis.io/topics/security#disabling-of-specific-commands
    ##
    disableCommands:
    - FLUSHDB
    - FLUSHALL

    ## Redis Master additional pod labels and annotations
    ## ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
    podLabels: {}
    podAnnotations: {}

    ## Redis Master resource requests and limits
    ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
    # resources:
    #   requests:
    #     memory: 256Mi
    #     cpu: 100m
    ## Use an alternate scheduler, e.g. "stork".
    ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
    ##
    # schedulerName:

    ## Configure extra options for Redis Master liveness and readiness probes
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
    ##
    livenessProbe:
      enabled: true
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 5
      successThreshold: 1
      failureThreshold: 5
    readinessProbe:
      enabled: true
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 5

    ## Redis Master Node selectors and tolerations for pod assignment
    ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
    ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#taints-and-tolerations-beta-feature
    ##
    # nodeSelector: {"beta.kubernetes.io/arch": "amd64"}
    # tolerations: []
    ## Redis Master pod/node affinity/anti-affinity
    ##
    affinity: {}

    ## Redis Master Service properties
    service:
      ##  Redis Master Service type
      type: ClusterIP
      port: 6379

      ## Specify the nodePort value for the LoadBalancer and NodePort service types.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
      ##
      # nodePort:

      ## Provide any additional annotations which may be required. This can be used to
      ## set the LoadBalancer service type to internal only.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
      ##
      annotations: {}
      loadBalancerIP:

    ## Enable persistence using Persistent Volume Claims
    ## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
    ##
    persistence:
      enabled: true
      ## The path the volume will be mounted at, useful when using different
      ## Redis images.
      path: /data
      ## The subdirectory of the volume to mount to, useful in dev environments
      ## and one PV for multiple services.
      subPath: ""
      ## redis data Persistent Volume Storage Class
      ## If defined, storageClassName: <storageClass>
      ## If set to "-", storageClassName: "", which disables dynamic provisioning
      ## If undefined (the default) or set to null, no storageClassName spec is
      ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
      ##   GKE, AWS & OpenStack)
      ##
      # storageClass: "-"
      accessModes:
      - ReadWriteOnce
      size: 8Gi

    ## Update strategy, can be set to RollingUpdate or onDelete by default.
    ## https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets
    statefulset:
      updateStrategy: RollingUpdate
      ## Partition update strategy
      ## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions
      # rollingUpdatePartition:

    ## Redis Master pod priorityClassName
    # priorityClassName: {}

  ##
  ## Redis Slave properties
  ## Note: service.type is a mandatory parameter
  ## The rest of the parameters are either optional or, if undefined, will inherit those declared in Redis Master
  ##
  slave:
    ## Slave Service properties
    service:
      ## Redis Slave Service type
      type: ClusterIP
      ## Redis port
      port: 6379
      ## Specify the nodePort value for the LoadBalancer and NodePort service types.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
      ##
      # nodePort:

      ## Provide any additional annotations which may be required. This can be used to
      ## set the LoadBalancer service type to internal only.
      ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
      ##
      annotations: {}
      loadBalancerIP:

    ## Redis slave port
    port: 6379
    ## Can be used to specify command line arguments, for example:
    ##
    command: "/run.sh"
    ## Additional Redis configuration for the slave nodes
    ## ref: https://redis.io/topics/config
    ##
    configmap:
    ## Redis extra flags
    extraFlags: []
    ## List of Redis commands to disable
    disableCommands:
    - FLUSHDB
    - FLUSHALL

    ## Redis Slave pod/node affinity/anti-affinity
    ##
    affinity: {}

    ## Configure extra options for Redis Slave liveness and readiness probes
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
    ##
    livenessProbe:
      enabled: true
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      successThreshold: 1
      failureThreshold: 5
    readinessProbe:
      enabled: true
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds: 10
      successThreshold: 1
      failureThreshold: 5

    ## Redis slave Resource
    # resources:
    #   requests:
    #     memory: 256Mi
    #     cpu: 100m

    ## Redis slave selectors and tolerations for pod assignment
    # nodeSelector: {"beta.kubernetes.io/arch": "amd64"}
    # tolerations: []

    ## Use an alternate scheduler, e.g. "stork".
    ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
    ##
    # schedulerName:

    ## Redis slave pod Annotation and Labels
    podLabels: {}
    podAnnotations: {}

    ## Redis slave pod priorityClassName
    # priorityClassName: {}

    ## Enable persistence using Persistent Volume Claims
    ## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
    ##
    persistence:
      enabled: true
      ## The path the volume will be mounted at, useful when using different
      ## Redis images.
      path: /data
      ## The subdirectory of the volume to mount to, useful in dev environments
      ## and one PV for multiple services.
      subPath: ""
      ## redis data Persistent Volume Storage Class
      ## If defined, storageClassName: <storageClass>
      ## If set to "-", storageClassName: "", which disables dynamic provisioning
      ## If undefined (the default) or set to null, no storageClassName spec is
      ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
      ##   GKE, AWS & OpenStack)
      ##
      # storageClass: "-"
      accessModes:
      - ReadWriteOnce
      size: 8Gi

    ## Update strategy, can be set to RollingUpdate or onDelete by default.
    ## https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets
    statefulset:
      updateStrategy: RollingUpdate
      ## Partition update strategy
      ## https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions
      # rollingUpdatePartition:

  ## Prometheus Exporter / Metrics
  ##
  metrics:
    enabled: true

    image:
      registry: docker.io
      repository: bitnami/redis-exporter
      tag: 1.1.1-debian-9-r10
      pullPolicy: IfNotPresent
      ## Optionally specify an array of imagePullSecrets.
      ## Secrets must be manually created in the namespace.
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
      ##
      # pullSecrets:
      #   - myRegistryKeySecretName

    ## Metrics exporter resource requests and limits
    ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
    ##
    # resources: {}

    ## Extra arguments for Metrics exporter, for example:
    ## extraArgs:
    ##   check-keys: myKey,myOtherKey
    # extraArgs: {}

    ## Metrics exporter pod priorityClassName
    # priorityClassName: {}
    service:
      type: ClusterIP
      ## Use serviceLoadBalancerIP to request a specific static IP,
      ## otherwise leave blank
      # loadBalancerIP:
      annotations: {}
    ## Metrics exporter pod Annotation and Labels
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9121"
    # podLabels: {}

    # Enable this if you're using https://github.com/coreos/prometheus-operator
    serviceMonitor:
      enabled: false
      ## Specify a namespace if needed
      # namespace: monitoring
      # fallback to the prometheus default unless specified
      # interval: 10s
      ## Defaults to what's used if you follow CoreOS [Prometheus Install Instructions](https://github.com/helm/charts/tree/master/stable/prometheus-operator#tldr)
      ## [Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#prometheus-operator-1)
      ## [Kube Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#exporters)
      selector:
        prometheus: kube-prometheus

  ##
  ## Init containers parameters:
  ## volumePermissions: Change the owner of the persist volume mountpoint to RunAsUser:fsGroup
  ##
  volumePermissions:
    enabled: false
    image:
      registry: docker.io
      repository: bitnami/minideb
      tag: stretch
      pullPolicy: Always
      ## Optionally specify an array of imagePullSecrets.
      ## Secrets must be manually created in the namespace.
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
      ##
      # pullSecrets:
      #   - myRegistryKeySecretName
    resources: {}
    # resources:
    #   requests:
    #     memory: 128Mi
    #     cpu: 100m

  ## Redis config file
  ## ref: https://redis.io/topics/config
  ##
  configmap: |-
    # Enable AOF https://redis.io/topics/persistence#append-only-file
    appendonly yes
    # Disable RDB persistence, AOF persistence already enabled.
    save ""

  ## Sysctl InitContainer
  ## used to perform sysctl operation to modify Kernel settings (needed sometimes to avoid warnings)
  sysctlImage:
    enabled: false
    command: []
    registry: docker.io
    repository: bitnami/minideb
    tag: stretch
    pullPolicy: Always
    ## Optionally specify an array of imagePullSecrets.
    ## Secrets must be manually created in the namespace.
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
    ##
    # pullSecrets:
    #   - myRegistryKeySecretName
    mountHostSys: false
    resources: {}
    # resources:
    #   requests:
    #     memory: 128Mi
    #     cpu: 100m

@stale
Copy link

stale bot commented Oct 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2019
@stale
Copy link

stale bot commented Nov 10, 2019

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Nov 10, 2019
@ejpir
Copy link

ejpir commented May 7, 2020

@carmenlau have you been able to fix this, we face the same issue :(

@ronaldvanderrest
Copy link

Glad that I found you guys are facing the same issue. Especially since statefulsets can't set the restartpolicy to 'Never'. So everytime you even scale the nodegroup that the master is running on, you will face a new replication group. I'll be following this one, as this has been bugging me for quite sometime.

@carmenlau
Copy link
Author

I have not tried the new version chart yet, may try it later. So I still have the problem. Which version are you guys using?

@ejpir
Copy link

ejpir commented May 8, 2020

redis-10.6.13 @ AWS EKS. Latest Bitnami chart.

We fetch the chart, modify production-values.yml to enable sentinel to true and network policy enabled.

Rollout goes fine, master and slaves are connected properly.

then:

  • scale down master
  • re-election takes place and one of the slaves becomes master
  • scale up master
  • master is also master, not in contact with the slaves

Both master and slaves have this in their sentinel.conf:

$ cat mounted-etc/sentinel.conf
dir "/tmp"
bind 0.0.0.0
port 26379
sentinel monitor mymaster redis-fixredit2-master-0.redis-fixredit2-headless.core-e.svc.cluster.local 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 18000
sentinel parallel-syncs mymaster 1$

but they are all connecting to each other over IP (seems like it, no hostnames are mentioned in the logging), which I think is the root cause.

@RoyKimYYZ
Copy link

I'm facing this issue. I delete the master redis node/pod and it doesn't connect to the new master node. Any new suggestions/workarounds?

@pzf-cpu
Copy link

pzf-cpu commented Jan 12, 2021

hey, sadly still facing same issue if some one has any input or suggestions will be greatly appreciated. fyi using v8.0.12

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants