Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[argo-cd] Switch to bitnami/redis and bitnami/redis-cluster chart #1111

Closed
aslafy-z opened this issue Jan 31, 2022 · 15 comments
Closed

[argo-cd] Switch to bitnami/redis and bitnami/redis-cluster chart #1111

aslafy-z opened this issue Jan 31, 2022 · 15 comments
Labels
argo-cd awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. enhancement New feature or request no-issue-activity

Comments

@aslafy-z
Copy link
Contributor

Is your feature request related to a problem?

I have some issues with the redis-ha chart. If some pods are destroyed, they don't synchronize back well and I have to delete all the pods and wait for all of them to become ready again.

Related helm chart

argo-cd

Describe the solution you'd like

I feel like this chart should use bitnami maintained charts which are now the "default" for a major part of the community.

See
https://artifacthub.io/packages/helm/bitnami/redis
https://artifacthub.io/packages/helm/bitnami/redis-cluster

Describe alternatives you've considered

No response

Additional context

No response

@aslafy-z aslafy-z added the enhancement New feature or request label Jan 31, 2022
@gmoirod
Copy link

gmoirod commented Mar 7, 2022

👍
Actually it does not work on Openshift out-of-the-box.
It needs to create RoleBindings and ServiceAccount specific for Redis before.

@mkilchhofer
Copy link
Member

The kustomize manifests living in the upstream project over there uses the rendered YAMLs from @DandyDeveloper's chart https://github.com/argoproj/argo-cd/blob/v2.3.3/manifests/ha/base/redis-ha/chart/requirements.yaml

The intent of this helm-repository here is to use the same architecture as the upstream projects (Argo CD, Workflows, etc.). IMHO you should file an issue over there: https://github.com/argoproj/argo-cd/issues/new/choose

@mkilchhofer mkilchhofer added the awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. label Apr 22, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@aslafy-z
Copy link
Contributor Author

No-stale

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@DandyDeveloper
Copy link

DandyDeveloper commented Aug 22, 2022

@mkilchhofer What's the issue here? Redis clustering vs. Sentinel are very different so this change could impact Argo quite a lot.

Feel free to raise the issue in my chart for Redis, I'm pretty active and try my best to maintain. Right now, I'm the only real active maintainer.

@mkilchhofer
Copy link
Member

mkilchhofer commented Sep 29, 2022

Hi @DandyDeveloper,

Context: We at @swisspost tried switching on redis-ha in the Argo CD chart. We used it like 1-2 month or so our AWS EKS clusters. We use cluster autoscaling and also upgrade our clusters once a week (new AWS AMI for the workers).

Issue: One problem we saw is that one of the 3 redis pods is unhappy:

$ kubectl logs argocd-server-6499778d-2n56j
(..)
redis: 2022/03/11 10:35:37 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2022/03/11 10:35:37 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: EOF
time="2022-03-11T10:35:38Z" level=warning msg="Failed to resync revoked tokens. retrying again in 1 minute: EOF"
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 10.116.191.151:46704->172.20.36.205:6379: write: broken pipe
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 10.116.191.151:46704->172.20.36.205:6379: write: broken pipe
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2022/03/11 10:35:38 pubsub.go:168: redis: discarding bad PubSub connection: EOF

The redis logs of at least one replica was full of:

kubectl logs argocd-redis-ha-server-1 -c redis
(..)
1:S 11 Mar 2022 07:09:31.312 * Non blocking connect for SYNC fired the event.
1:S 11 Mar 2022 07:09:31.312 * Master replied to PING, replication can continue...
1:S 11 Mar 2022 07:09:31.313 * Partial resynchronization not possible (no cached master)
1:S 11 Mar 2022 07:09:31.313 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 11 Mar 2022 07:09:32.324 * Connecting to MASTER 172.20.145.11:6379
1:S 11 Mar 2022 07:09:32.324 * MASTER <-> REPLICA sync started
1:S 11 Mar 2022 07:09:32.324 * Non blocking connect for SYNC fired the event.
1:S 11 Mar 2022 07:09:32.325 * Master replied to PING, replication can continue...
1:S 11 Mar 2022 07:09:32.325 * Partial resynchronization not possible (no cached master)
1:S 11 Mar 2022 07:09:32.326 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 11 Mar 2022 07:09:33.328 * Connecting to MASTER 172.20.145.11:6379
1:S 11 Mar 2022 07:09:33.328 * MASTER <-> REPLICA sync started
1:S 11 Mar 2022 07:09:33.329 * Non blocking connect for SYNC fired the event.
(..)

Resolution: We then always fixed it like this:

$ kubectl -n argocd delete po -l app=redis-ha
pod "argocd-redis-ha-server-0" deleted
pod "argocd-redis-ha-server-1" deleted
pod "argocd-redis-ha-server-2" deleted

And after 2 monthes of annoying redis issues we switched back to single-replica redis. After that we never faced a redis related issue again.

@DandyDeveloper
Copy link

@mkilchhofer How long ago was this?

We had a split brain scenario that was the result of bad Sentinel election. Its been resolved permanently a while back by introducing a pod for checking on this and explicitly solving split brain issues like above.

This is surface level assumption, I'd need more logs from the elected master / cluster state to provide more context.

The latest Argo should include the latest Redis chart, so, I would highly recommend trying this again.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@pierluigilenoci
Copy link

🎛️

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2023
@pierluigilenoci
Copy link

Why closed???

@pierluigilenoci
Copy link

@mkilchhofer ?

@pdeva
Copy link

pdeva commented Apr 23, 2023

we are seeing this issue too. not sure why it’s been closed.

@DandyDeveloper
Copy link

I maintain the redis being used, the problem in question should be resolved long ago.

If people are experiencing problems, throw me a link to the issue or describe the issue so I can investigate.

I believe they closed this because my reply indicated things are fixed and we had no follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-cd awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. enhancement New feature or request no-issue-activity
Projects
None yet
Development

No branches or pull requests

6 participants