Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflector not watching secrets after period of time (still happening) #467

Open
gwenael-lebarzic opened this issue Oct 16, 2024 · 10 comments

Comments

@gwenael-lebarzic
Copy link

Hello.

As issue #341 is closed, I open a new one.

As described in #341 , we encountered the same problem the 7th of October 2024, Reflector stopped replicating secrets.
Reflector did not log anything anymore (neither namespace watcher, configmap watcher or secret watcher).

Here is the end of the log :

2024-10-06 21:36:26.009 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected my-ns1/my-secret where permitted. Created 0 - Updated 19 - Deleted 0 - Validated 0.
2024-10-06 21:36:50.648 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:35:36.4365983. Faulted: False.
2024-10-06 21:36:50.649 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2024-10-06 21:36:50.755 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected my-ns1/my-secret2 where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 18.
2024-10-06 21:36:50.803 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected my-ns1/my-secret3 where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 19.
2024-10-06 21:36:50.868 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected my-ns1/my-secret where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 19.
2024-10-06 22:04:18.618 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Session closed. Duration: 00:56:30.0338703. Faulted: False.
2024-10-06 22:04:18.618 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources
2024-10-06 22:08:14.073 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:44:21.4212938. Faulted: False.
2024-10-06 22:08:14.073 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources

After this time, there was no log at all anymore. Concerning the metrics, the pod reflector cpu was almost zero (seems normal because it wasn't doing anything anymore. Nothing specific about the memory usage just before the incident.

Here are informations about version :

  • Reflector: emberstack/kubernetes-reflector:7.1.256
  • cluster: GKE - 1.29.8-gke.1096000
  • Cloud Provider: GCP

Is it possible to solve this problem please ? It makes reflector solution unstable unfortunately :(.

@gwenael-lebarzic
Copy link
Author

Hello. I don't know if someone could check the problem ?

@gwenael-lebarzic
Copy link
Author

Up

@enterdv
Copy link

enterdv commented Nov 1, 2024

Same for me

  • Reflector: emberstack/kubernetes-reflector:7.1.288
  • cluster: GKE - v1.30.5-gke.1014001
  • Cloud Provider: GCP

@gwenael-lebarzic
Copy link
Author

Hello.

Is it possible to get a status on this behaviour please ?

Best regards.

@iusergii
Copy link

Started to observe the same issue once our cluster reached 5k secrets in all namespaces. My guess is that Kubernetes API paginates responses ListSecretForAllNamespacesAsync and code is not handling pagination.

@gwenael-lebarzic
Copy link
Author

Started to observe the same issue once our cluster reached 5k secrets in all namespaces. My guess is that Kubernetes API paginates responses ListSecretForAllNamespacesAsync and code is not handling pagination.

In my kubernetes cluster, where we have this problem, we have a total of 62 secrets.

@enterdv
Copy link

enterdv commented Nov 28, 2024

Started to observe the same issue once our cluster reached 5k secrets in all namespaces. My guess is that Kubernetes API paginates responses ListSecretForAllNamespacesAsync and code is not handling pagination.

I have 4 secrets in my cluster and the issue exists.

@iusergii
Copy link

Did you try to set watcher timeout ?

@aDisplayName
Copy link

Did you try to set watcher timeout ?

We had such case today. In our case, we have specified the watcher timeout to 900. The only logs we got were the following three lines

2024-12-17 14:34:44.247 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2024-12-17 14:34:44.247 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources
2024-12-17 14:34:44.247 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources

No log entries like Session closed. Duration: {duration}. Faulted: {faulted} was every showing up.

It seems somehow the following code got stuck somewhere at

                Logger.LogInformation("Requesting {type} resources", typeof(TResource).Name);
                using var watcher = OnGetWatcher(stoppingToken);
                var watchList = watcher.WatchAsync<TResource, TResourceList>(cancellationToken: stoppingToken);

                await foreach (var (type, item) in watchList
                                   .WithCancellation(stoppingToken))
                    await Mediator.Publish(new WatcherEvent
                    {
                        Item = item,
                        Type = type
                    }, stoppingToken);

version: emberstack/kubernetes-reflector:7.1.288

@vojtechspacir
Copy link

I have the same issue.

We were previously using version 6.x and have now updated to the latest 7.x. As recommended, I adjusted the watcher timeout to "900", but this did not resolve the problem.

It doesn’t matter how many namespaces or secrets I have, after a short period of time the synchronization of new secrets completely stops.

I tested with up to 10000 namespaces, 50000 secrets, and 50000 configmaps. The test used a simple secret containing a username and password and a configmap with four values.

Once the race condition is met, the synchronization of new secrets stops. However, the deletion of secrets still works, and the creation and deletion of configmaps also work. The number of Kubernetes Reflector pod replicas doesn’t seem to affect the issue; only CPU usage increases with the large number of secrets/configmaps I use.

The only working solution is to restart the pod, after which the mirroring of secrets resumes.

AKS version: 1.28.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants