Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix goroutine leak on config reload #508

Closed
wants to merge 1 commit into from

Conversation

camathieu
Copy link

Promxy is leaking goroutines on config reloads leading to OOM kills in the long run.

From : https://instance-that-reload-its-config-way-too-often/debug/pprof/goroutine?debug=1

goroutine profile: total 63342
21111 @ 0x4374c0 0x447763 0x9bb257 0x467571
#	0x9bb256	github.com/prometheus/prometheus/discovery.(*Manager).sender+0x116	/home/jacksontj/workspace/golang/src/github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/discovery/manager.go:234

21110 @ 0x4374c0 0x405f77 0x405c7b 0x20cc731 0x467571
#	0x20cc730	github.com/jacksontj/promxy/pkg/servergroup.(*ServerGroup).Sync+0x16b0	/home/jacksontj/workspace/golang/src/github.com/jacksontj/promxy/pkg/servergroup/servergroup.go:103

21110 @ 0x4374c0 0x405f77 0x405c7b 0x9ba052 0x467571
#	0x9ba051	github.com/prometheus/prometheus/discovery.(*Manager).Run+0x71	/home/jacksontj/workspace/golang/src/github.com/jacksontj/promxy/vendor/github.com/prometheus/prometheus/discovery/manager.go:142
...

To reproduce start a Promxy instance with the default configuration from the repository and send some SIGHUPs.


I found two issues :

  • oldState ServerGroups are not cancelled if appenders didn't change
  • Prometheus discovery target managers never close SyncCh blocking the ServerGroup Sync() goroutine forever

Promxy is leaking goroutines on config reloads leading to OOM kills in
the long run.

two issues :
 - oldState ServerGroups are not cancelled if appenders didn't change
 - Prometheus discovery target managers never close SyncCh blocking the ServerGroup Sync() goroutine forever
@camathieu
Copy link
Author

ping @jacksontj

@jacksontj jacksontj closed this May 19, 2022
@jacksontj jacksontj reopened this May 19, 2022
@jacksontj
Copy link
Owner

@camathieu thanks for the contribution! This looks good, but there was a problem with CI. I have pushed up a fix for that, so I'll need a rebase of this so CI can pass (so we can merge!)

@jacksontj
Copy link
Owner

@camathieu A friendly reminder here -- this should work after a rebase (to include the fix for CI) :)

@jacksontj
Copy link
Owner

I have rebased this commit into a new PR (#551) -- so closing this one out.

@jacksontj jacksontj closed this Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants