Skip to content

FleetAutoscaler keeps alive all TLS connections permanently causing memory leak on webhook server #2278

@craftyc0der

Description

@craftyc0der

What happened:
Over time my https server which hosts the FleetAutoscaler webhook goes OOM. This is caused by 1000s of never dying sockets on the server. This does NOT happen when I call it with cURL or a browser. It only happens with Agones calls the endpoint.

/app $ lsof -p $PID | grep socket
...
1       /app/zeus-rest  socket:[289294771]
1       /app/zeus-rest  socket:[289294783]
1       /app/zeus-rest  socket:[289292336]
1       /app/zeus-rest  socket:[289291653]
1       /app/zeus-rest  socket:[289291654]
1       /app/zeus-rest  socket:[289293769]
1       /app/zeus-rest  socket:[289294780]
...

/app $ lsof -p $PID | grep socket | wc -l
6397
/app $ lsof -p $PID | grep socket | wc -l
6403
/app $ lsof -p $PID | grep socket | wc -l
6418

What you expected to happen:

I expect that when the FleetAutoscaler is called by Agones is either reuses the TLS client or it disconnects it. Keeping it alive and then making a new one seems naughty.

How to reproduce it (as minimally and precisely as possible):
Create a TLS FleetAutoscaler endpoint with keepalive turned on and no timeout specified and watch the sockets multiply.

Anything else we need to know?:
I suspect that this could be repaired by adding to
pkg/fleetautoscalers/fleetautoscalers.go

var client = http.Client{
	Timeout: 15 * time.Second,
+++	Transport: &http.Transport{
+++                DisableKeepAlives: true,
+++        },
}

I fixed it by disabling KeepAlive on the server side. But it took me several hours to figure out the problem because I could not reproduce it with any clients of my own.

Environment:

  • Agones version: 1.16
  • Kubernetes version (use kubectl version): 1.21
  • Cloud provider or hardware configuration: EKS and Minikube
  • Install method (yaml/helm): helm
  • Troubleshooting guide log(s):
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueThese are great first issues. If you are looking for a place to start, start here!help wantedWe would love help on these issues. Please come help us!kind/bugThese are bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions