Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

Goroutine leak in overlaySwitchForwarder #3807

Closed
bboreham opened this issue May 27, 2020 · 2 comments
Closed

Goroutine leak in overlaySwitchForwarder #3807

bboreham opened this issue May 27, 2020 · 2 comments

Comments

@bboreham
Copy link
Contributor

A customer experiencing occasional out-of-memory problems sent this goroutine dump:

File: weaver
Build ID: 7a18060a2796ffec016c133f2cb7fe98a402dd26
Type: goroutine
Time: May 26, 2020 at 2:30pm (UTC)
Showing nodes accounting for 3866, 100% of 3866 total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                              3500 92.49% |   github.com/weaveworks/weave/router.(*overlaySwitchForwarder).run
                                                80  2.11% |   github.com/weaveworks/weave/router.monitorForwarder
                                                78  2.06% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/mesh.(*gossipSender).run
                                                41  1.08% |   github.com/weaveworks/weave/router.(*fastDatapathForwarder).doHeartbeats
                                                39  1.03% |   github.com/weaveworks/weave/router.(*sleeveForwarder).run
                                                39  1.03% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/mesh.(*LocalConnection).actorLoop
                                                 1 0.026% |   github.com/weaveworks/weave/ipam.(*Allocator).actorLoop
                                                 1 0.026% |   github.com/weaveworks/weave/router.(*FastDatapath).run
                                                 1 0.026% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/common/signals.(*Handler).Loop
                                                 1 0.026% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/go-checkpoint.CheckInterval.func1
                                                 1 0.026% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/mesh.(*connectionMaker).queryLoop
                                                 1 0.026% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/mesh.(*localPeer).actorLoop
                                                 1 0.026% |   github.com/weaveworks/weave/vendor/github.com/weaveworks/mesh.(*routes).run
         0     0%     0%       3784 97.88%                | runtime.selectgo
                                              3784   100% |   runtime.gopark
----------------------------------------------------------+-------------

With ~19 peers in the network they shouldn't have 3500 of anything, but I am currently puzzled how it gets into this state.

@bboreham
Copy link
Contributor Author

Given some issue which causes two peers to break connection after it is formed, e.g. a firewall blocking UDP traffic between them, they will repeatedly make and break connections which drives up the number of goroutines.

In tests I couldn't find a way to leak faster than about one per minute, but this customer had cases where the growth was thousands per hour.

The leak should be fixed by #3808.

@bboreham
Copy link
Contributor Author

Fixed by #3808

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant