Cluster in fail state after creation #12

naude-r · 2020-06-30T09:14:45Z

the created cluster is in failed state after following the helm installation instructions:

cluster nodes
a7f489e84b6f4428ccd2deafaff5c17378152608 10.42.0.27:6379@16379 myself,master - 0 0 1 connected 0-5460

cluster nodes
a7f489e84b6f4428ccd2deafaff5c17378152608 10.42.0.27:6379@16379 myself,master - 0 0 1 connected 0-5460
127.0.0.1:6379> cluster info
cluster_state:fail
cluster_slots_assigned:5461
cluster_slots_ok:5461
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:1
cluster_current_epoch:1
cluster_my_epoch:1
cluster_stats_messages_sent:0
cluster_stats_messages_received:0

all pods are running:

./kubectl get pods -n xmf-infra -l app=opstree-redis-master
NAME                     READY   STATUS    RESTARTS   AGE
opstree-redis-master-0   2/2     Running   0          9m37s
opstree-redis-master-1   2/2     Running   0          9m16s
opstree-redis-master-2   2/2     Running   0          8m48s

./kubectl get pods -n xmf-infra -l app=opstree-redis-slave
NAME                    READY   STATUS    RESTARTS   AGE
opstree-redis-slave-0   2/2     Running   0          9m41s
opstree-redis-slave-1   2/2     Running   0          9m11s
opstree-redis-slave-2   2/2     Running   0          8m48s

logs are the same for every instance:

./kubectl logs -n xmf-infra opstree-redis-master-0 -c opstree-redis-master
Redis is running without password which is not recommended
Starting redis service.....
7:C 30 Jun 2020 08:13:43.409 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7:C 30 Jun 2020 08:13:43.409 # Redis version=5.0.8, bits=64, commit=00000000, modified=0, pid=7, just started
7:C 30 Jun 2020 08:13:43.409 # Configuration loaded
7:M 30 Jun 2020 08:13:43.410 * Node configuration loaded, I'm a7f489e84b6f4428ccd2deafaff5c17378152608
7:M 30 Jun 2020 08:13:43.410 * Running mode=cluster, port=6379.
7:M 30 Jun 2020 08:13:43.410 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7:M 30 Jun 2020 08:13:43.410 # Server initialized
7:M 30 Jun 2020 08:13:43.410 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
7:M 30 Jun 2020 08:13:43.411 * Ready to accept connections

the operator log provides an indication of the issue:

{"level":"info","ts":1593504943.6889641,"logger":"controller_redis","msg":"Successfully executed the command","Request.Namespace":"xmf-infra","Request.Name":"opstree-redis","Command":["redis-cli","--cluster","add-node","10.42.0.49:6379","10.42.0.48:6379","--cluster-slave","-a",""],"Output":">>> Adding node 10.42.0.49:6379 to cluster 10.42.0.48:6379\nNode 10.42.0.48:6379 replied with error:\nERR Client sent AUTH, but no password is set\n"}`

at this point the cluster was re-created with a password. this still did not work:

{"level":"info","ts":1593506080.7510774,"logger":"controller_redis","msg":"Successfully executed the command","Request.Namespace":"xmf-infra","Request.Name":"opstree-redis","Command":["redis-cli","--cluster","create","10.42.0.50:6379","10.42.0.53:6379","10.42.0.55:6379","--cluster-yes","-a","test"],"Output":"[ERR] Node 10.42.0.53:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n"}

cluster was re-created again after removing previous persistent storage.

operator is now "hanging":

{"level":"info","ts":1593507552.3035424,"logger":"controller_redis","msg":"Redis cluster creation command is","Request.Namespace":"xmf-infra","Request.Name":"dev-redis","Command":["redis-cli","--cluster","create","10.42.0.62:6379","10.42.0.66:6379","10.42.0.70:6379","--cluster-yes","-a","test"]}

performing the cluster creation manually results in:

redis-cli --cluster create 10.42.0.62:6379 10.42.0.66:6379 10.42.0.70:6379 --cluster-yes -a "test"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
[ERR] Node 10.42.0.66:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

any idea how this can be resolved?

The text was updated successfully, but these errors were encountered:

wangyp0701 · 2020-07-03T05:55:03Z

I had the same problem
All pods were destroyed and rebuilt, but the configuration file was not updated synchronously

iamabhishek-dubey · 2020-07-23T15:29:21Z

Thanks for reporting the issue, I will check this

iamabhishek-dubey · 2020-07-23T15:36:36Z

Hmm seems like a bug, we will try to fix this by the weekend.

naude-r · 2020-07-28T08:04:40Z

using a different installation we had to add "--cluster-announce-ip $(POD_IP)". suspect a similar change may be required.

iamabhishek-dubey · 2020-07-28T10:39:30Z

So we have tested this internally in few days, there will be a new release of 0.2

iamabhishek-dubey · 2020-07-30T15:19:03Z

This will be fixed in this PR
#14

iamabhishek-dubey · 2020-08-01T13:02:57Z

This is fixed #14

…/internal patches Merge in OSS/redis-operator from ~ATHOMPSON/redis-operator:sync-internal-cs-main-to-0.14 to cs-main * commit '2ea8fcaf61b322186f8a0a2c4e7bcb310f55ea2d': Revert "Handle nil probe" Handle nil probe Bumps prometheus/client_golang to address vuln Adds CODEOWNERS for our internal branch [Feature] Add Redis Sentinel Support (OT-CONTAINER-KIT#408) Fixed Redis Replicate Cache bug (OT-CONTAINER-KIT#424) [Feature] : Add Replication Mode to the Redis Operator (OT-CONTAINER-KIT#417) [Development][Add] Added recreation logic for statefulset (OT-CONTAINER-KIT#411) Fixes issue with arm64 support. (OT-CONTAINER-KIT#404) [Development][Add] Added nodeSelector and tolerations for cluster (OT-CONTAINER-KIT#410) Add Label Selector to pod anti affinity (OT-CONTAINER-KIT#407) When cr annotation update,sts annotations will not updated! (OT-CONTAINER-KIT#398) fix: invalid memory address or nil pointer dereference (OT-CONTAINER-KIT#395) export redis exporter as a container port (OT-CONTAINER-KIT#393) [Development][Add] Added feature for additional volume mounts (OT-CONTAINER-KIT#389) fix crash with go panic (OT-CONTAINER-KIT#385) Add check PersistenceEnabled not nil (OT-CONTAINER-KIT#380) [feature]add serviceType functionality for standalone and cluster with annotations (OT-CONTAINER-KIT#376)

iamabhishek-dubey closed this as completed Aug 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster in fail state after creation #12

Cluster in fail state after creation #12

naude-r commented Jun 30, 2020

wangyp0701 commented Jul 3, 2020

iamabhishek-dubey commented Jul 23, 2020

iamabhishek-dubey commented Jul 23, 2020

naude-r commented Jul 28, 2020

iamabhishek-dubey commented Jul 28, 2020

iamabhishek-dubey commented Jul 30, 2020

iamabhishek-dubey commented Aug 1, 2020

Cluster in fail state after creation #12

Cluster in fail state after creation #12

Comments

naude-r commented Jun 30, 2020

wangyp0701 commented Jul 3, 2020

iamabhishek-dubey commented Jul 23, 2020

iamabhishek-dubey commented Jul 23, 2020

naude-r commented Jul 28, 2020

iamabhishek-dubey commented Jul 28, 2020

iamabhishek-dubey commented Jul 30, 2020

iamabhishek-dubey commented Aug 1, 2020