-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak in the operator #1984
Comments
I restarted the operator with the latest bc12 and pprof profiling turned on. Memory still seems to be increasing over time. I took a heap profile right after restarting the operator and then again multiple times afterwards. |
I did the same test and I also saw a steady increase of the RSS memory (I have not been able to configure metricbeat to collect the go metrics): I might be wrong but there is something suspicious in the source code of the go client. Before I dive into the theory some facts about the operator and the "work queue":
cloud-on-k8s/pkg/controller/common/certificates/expiration.go Lines 27 to 37 in fb2e693
Knowing that here is the part that seems suspicious to me: // waitingLoop runs until the workqueue is shutdown and keeps a check on the list of items to be added.
func (q *delayingType) waitingLoop() {
/* .... */
for {
/* .... */
if waitingForQueue.Len() > 0 {
entry := waitingForQueue.Peek().(*waitFor)
nextReadyAt = q.clock.After(entry.readyAt.Sub(now)) // <=== Here
}
select {
case <-q.stopCh:
return
case <-q.heartbeat.C():
// continue the loop, which will add ready items
case <-nextReadyAt:
// continue the loop, which will add ready items
/* ... */ A loop is scheduled at least every 10 seconds because of the heartbeat ( Unless I'm missing something it means that at least every 10 seconds we are creating timers that will start to expire (and be removed from memory) in 365 days... |
Excellent analysis @barkbay! I think this is spot on. Just one addition we are also scheduling long running requeues in the license controller to re-issue new licenses after the expiry of the current ones. cloud-on-k8s/pkg/controller/license/license_controller.go Lines 74 to 89 in 25caaf1
I am thinking we can work around this issue by managing long running re-queues in a dedicated scheduler structure (managed by ECK not go-client's workqueue) that correctly handles a single timer per key (typically namespaced name + the controller that initiated the timer) stopping existing timers before creating new ones. We could then inject an event into the correct workqueue just in time once a timer fires. |
I have drafted the idea in #1985 |
Unassigning myself as I am out the next couple of days and we are probably going with a simpler solution of not scheduling long running re-queues. |
Let's implement an easy workaround we discussed: schedule a requeue in 10 hours instead of 365 days. This should mitigate the memory leak problem (timers still accumulate but are garbage-collected after 10 hours). |
Let's not forget the license controller where something similar happens ( |
We have an issue where the underlying timer used by client-go worker queue implementation stays in memory until it expires. Since one gets created at every reconciliation attempt, we end up with a big bunch of timers in memory that will expire in 365 days by default. To mitigate the memory leak, let's wait for no more than 10 hours to reconcile. For more details, see elastic#1984.
We have an issue where the underlying timer used by client-go worker queue implementation stays in memory until it expires. Since one gets created at every reconciliation attempt, we end up with a big bunch of timers in memory that will expire in 365 days by default. To mitigate the memory leak, let's wait for no more than 10 hours to reconcile. This is done at the level of the aggregated results, to decouple this wokaround from any business logic like certs expiration. For more details, see elastic#1984.
* Ensure we don't RequeueAfter for more than 10 hours We have an issue where the underlying timer used by client-go worker queue implementation stays in memory until it expires. Since one gets created at every reconciliation attempt, we end up with a big bunch of timers in memory that will expire in 365 days by default. To mitigate the memory leak, let's wait for no more than 10 hours to reconcile. This is done at the level of the aggregated results, to decouple this wokaround from any business logic like certs expiration. For more details, see #1984. * Use aggregated results in the license controller
I'm not sure either is necessary since this works for now, and when we land a fix upstream (or someone else does) we can revisit this. But there were two other options we discussed out of band I thought might be worth noting here:
|
Memory usage seem to have stabilised over the last 24 hours: Not sure yet the memory leak is fixed though. Also I realized I should probably generate some reconciliations event (eg. patch ES annotations with some randomness every X seconds), since the memory leak we observed was directly related to reconciliations happening. |
I setup a k8s job that patches the Elasticsearch resource annotation every second, to trigger reconciliations: apiVersion: batch/v1
kind: Job
metadata:
name: generate-reconciliations
namespace: elastic-system
spec:
template:
metadata:
name: generate-reconciliations
spec:
serviceAccount: elastic-operator
containers:
- name: generate-reconciliations
image: bitnami/kubectl:latest
command:
- bash
args:
- -c
- while true; do kubectl patch elasticsearch monitor -n beats -p "{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}" --type merge; sleep 1; done
restartPolicy: Never And restarted the operator to use a recent nightly build that includes the memory leak fix. I expect to see memory growing for the first 10 hours, then being approximately constant for the remaining timeframe. |
Finally observed what I wanted to! 🎉 This is when triggering around 1 reconciliation per second with the above job. In this scenario we seem to be in the 20-30MB range additional stable memory usage. |
Hey I'm curious what tools you used to collect the performance data and view the graphs? I haven't done much performance monitoring in K8s but am wanting to learn. |
Hey @ljdelight, we are running Metricbeats on Kubernetes, which pushes metrics into the configured Elasticsearch cluster, that we visualize with Kibana. |
What's your process when you "collect a heap profile"? Like, do you have profiling code in the binary that writes the heap data on some interval, or do you pprof the binary (which needs specific compile flags?). I'm thinking of approaches and I imagine it's tricky in K8s if there's no pprof in the container. |
We expose pprof endpoints in dev mode: https://github.com/elastic/cloud-on-k8s/blob/master/cmd/manager/main.go#L162 |
After running a recent
1.0.0-beta1-b10
for two days with metrics reporting turned on and fetching those via metricbeat into Elasticsearch it seems that memory consumption has roughly doubled even though the number of clusters managed has been more or less constant (I have created and destroyed a few test clusters in that time)Worth noting regarding the severity of this issue that the process's memory after two days of uptime is still at 25Mi which half of what we have as
requests
and far below the limit of 150Mi.Operator manifest used:
https://github.com/pebrc/cloud-on-k8s/blob/dd77e135df28a80c10d603fd272009928eee0618/config/samples/beats/all-in-one.yaml#L3015-L3054
Go mstats struct with comments: https://golang.org/src/runtime/mstats.go
Visualisation: Heap Alloc bytes averaged in 10m interval
Visualisation: Heap Objects averaged in 10m intervals
Goroutines: averaged in 10m intervals are constant 👍
Number of Reconciliations per controller: max in 10m intervals (note this is a counter so increase over time is expected)
The text was updated successfully, but these errors were encountered: