Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster in fail state after creation #12

Closed
naude-r opened this issue Jun 30, 2020 · 7 comments
Closed

Cluster in fail state after creation #12

naude-r opened this issue Jun 30, 2020 · 7 comments

Comments

@naude-r
Copy link

naude-r commented Jun 30, 2020

the created cluster is in failed state after following the helm installation instructions:

cluster nodes
a7f489e84b6f4428ccd2deafaff5c17378152608 10.42.0.27:6379@16379 myself,master - 0 0 1 connected 0-5460

cluster nodes
a7f489e84b6f4428ccd2deafaff5c17378152608 10.42.0.27:6379@16379 myself,master - 0 0 1 connected 0-5460
127.0.0.1:6379> cluster info
cluster_state:fail
cluster_slots_assigned:5461
cluster_slots_ok:5461
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:1
cluster_current_epoch:1
cluster_my_epoch:1
cluster_stats_messages_sent:0
cluster_stats_messages_received:0

all pods are running:

./kubectl get pods -n xmf-infra -l app=opstree-redis-master
NAME                     READY   STATUS    RESTARTS   AGE
opstree-redis-master-0   2/2     Running   0          9m37s
opstree-redis-master-1   2/2     Running   0          9m16s
opstree-redis-master-2   2/2     Running   0          8m48s

./kubectl get pods -n xmf-infra -l app=opstree-redis-slave
NAME                    READY   STATUS    RESTARTS   AGE
opstree-redis-slave-0   2/2     Running   0          9m41s
opstree-redis-slave-1   2/2     Running   0          9m11s
opstree-redis-slave-2   2/2     Running   0          8m48s

logs are the same for every instance:

./kubectl logs -n xmf-infra opstree-redis-master-0 -c opstree-redis-master
Redis is running without password which is not recommended
Starting redis service.....
7:C 30 Jun 2020 08:13:43.409 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7:C 30 Jun 2020 08:13:43.409 # Redis version=5.0.8, bits=64, commit=00000000, modified=0, pid=7, just started
7:C 30 Jun 2020 08:13:43.409 # Configuration loaded
7:M 30 Jun 2020 08:13:43.410 * Node configuration loaded, I'm a7f489e84b6f4428ccd2deafaff5c17378152608
7:M 30 Jun 2020 08:13:43.410 * Running mode=cluster, port=6379.
7:M 30 Jun 2020 08:13:43.410 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7:M 30 Jun 2020 08:13:43.410 # Server initialized
7:M 30 Jun 2020 08:13:43.410 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
7:M 30 Jun 2020 08:13:43.411 * Ready to accept connections

the operator log provides an indication of the issue:

{"level":"info","ts":1593504943.6889641,"logger":"controller_redis","msg":"Successfully executed the command","Request.Namespace":"xmf-infra","Request.Name":"opstree-redis","Command":["redis-cli","--cluster","add-node","10.42.0.49:6379","10.42.0.48:6379","--cluster-slave","-a",""],"Output":">>> Adding node 10.42.0.49:6379 to cluster 10.42.0.48:6379\nNode 10.42.0.48:6379 replied with error:\nERR Client sent AUTH, but no password is set\n"}`

at this point the cluster was re-created with a password. this still did not work:

{"level":"info","ts":1593506080.7510774,"logger":"controller_redis","msg":"Successfully executed the command","Request.Namespace":"xmf-infra","Request.Name":"opstree-redis","Command":["redis-cli","--cluster","create","10.42.0.50:6379","10.42.0.53:6379","10.42.0.55:6379","--cluster-yes","-a","test"],"Output":"[ERR] Node 10.42.0.53:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n"}

cluster was re-created again after removing previous persistent storage.

operator is now "hanging":

{"level":"info","ts":1593507552.3035424,"logger":"controller_redis","msg":"Redis cluster creation command is","Request.Namespace":"xmf-infra","Request.Name":"dev-redis","Command":["redis-cli","--cluster","create","10.42.0.62:6379","10.42.0.66:6379","10.42.0.70:6379","--cluster-yes","-a","test"]}

performing the cluster creation manually results in:

redis-cli --cluster create 10.42.0.62:6379 10.42.0.66:6379 10.42.0.70:6379 --cluster-yes -a "test"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
[ERR] Node 10.42.0.66:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

any idea how this can be resolved?

@wangyp0701
Copy link

I had the same problem
All pods were destroyed and rebuilt, but the configuration file was not updated synchronously

image
image

@iamabhishek-dubey
Copy link
Member

Thanks for reporting the issue, I will check this

@iamabhishek-dubey
Copy link
Member

Hmm seems like a bug, we will try to fix this by the weekend.

@naude-r
Copy link
Author

naude-r commented Jul 28, 2020

using a different installation we had to add "--cluster-announce-ip $(POD_IP)". suspect a similar change may be required.

@iamabhishek-dubey
Copy link
Member

So we have tested this internally in few days, there will be a new release of 0.2

@iamabhishek-dubey
Copy link
Member

This will be fixed in this PR
#14

@iamabhishek-dubey
Copy link
Member

This is fixed #14

devkmsg added a commit to devkmsg/redis-operator that referenced this issue Jan 30, 2024
…/internal patches

Merge in OSS/redis-operator from ~ATHOMPSON/redis-operator:sync-internal-cs-main-to-0.14 to cs-main

* commit '2ea8fcaf61b322186f8a0a2c4e7bcb310f55ea2d':
  Revert "Handle nil probe"
  Handle nil probe
  Bumps prometheus/client_golang to address vuln
  Adds CODEOWNERS for our internal branch
  [Feature] Add Redis Sentinel Support  (OT-CONTAINER-KIT#408)
  Fixed Redis Replicate Cache bug (OT-CONTAINER-KIT#424)
  [Feature] : Add Replication Mode to the Redis Operator (OT-CONTAINER-KIT#417)
  [Development][Add] Added recreation logic for statefulset (OT-CONTAINER-KIT#411)
  Fixes issue with arm64 support. (OT-CONTAINER-KIT#404)
  [Development][Add] Added nodeSelector and tolerations for cluster (OT-CONTAINER-KIT#410)
  Add Label Selector to pod anti affinity  (OT-CONTAINER-KIT#407)
  When cr annotation update,sts annotations will not updated! (OT-CONTAINER-KIT#398)
  fix: invalid memory address or nil pointer dereference (OT-CONTAINER-KIT#395)
  export redis exporter as a container port (OT-CONTAINER-KIT#393)
  [Development][Add] Added feature for additional volume mounts (OT-CONTAINER-KIT#389)
  fix crash with go panic (OT-CONTAINER-KIT#385)
  Add check PersistenceEnabled not nil (OT-CONTAINER-KIT#380)
  [feature]add serviceType functionality for standalone and cluster with annotations (OT-CONTAINER-KIT#376)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants