Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GameServer marked unhealthy immediately when using REST API #1000

Closed
Omegastick opened this issue Aug 15, 2019 · 8 comments
Closed

GameServer marked unhealthy immediately when using REST API #1000

Omegastick opened this issue Aug 15, 2019 · 8 comments
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs. question I have a question!

Comments

@Omegastick
Copy link

Omegastick commented Aug 15, 2019

What happened:
When starting a GameServer using the REST API for communicating with the sidecar, the GameServer is immediately marked as unhealthy. This continues even if the health check service is disabled.

What you expected to happen:
The game server should be marked as unhealthy only after the initial health check waiting period.

How to reproduce it (as minimally and precisely as possible):
Create an application using the REST API to communicate with the sidecar.
I can throw together a minimal example application if you like.

Anything else we need to know?:
After #999, I decided to switch over to using the REST API (the SDK build time was making my Docker builds very long, anyway). I don't think this is related, because in #999 the GameServer briefly becomes ready before it crashes.

Looking at the sidecar logs (in the Pastebin below), the GameServer is being marked as unhealthy roughly two seconds after the sidecar starts.

Is there an example of the REST API being used anywhere?

Environment:

@Omegastick Omegastick added the kind/bug These are bugs. label Aug 15, 2019
@markmandel
Copy link
Collaborator

Can I please have copies of the kubectl describe gameserver <gs name> so we can see the events, and also the logs from the sidecar

@markmandel markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Aug 15, 2019
@markmandel
Copy link
Collaborator

Oh, I just saw the logs - awesome. The describe would also be helpful.

@markmandel
Copy link
Collaborator

I don't see anything about singularity-trainer-snnhs in the Agones controller log. Is this from the same time?

@roberthbailey
Copy link
Member

Can you verify that the gameserver container isn't restarting? There is currently a bug I encountered where if the gameserver restarts before ever becoming healthy, then it will be unhealthy indefinitely, regardless of the health checks. @aLekSer is looking into this.

@aLekSer
Copy link
Collaborator

aLekSer commented Aug 15, 2019

Yes, I will look at it as part of this issue.
#956

@Omegastick
Copy link
Author

Sorry, those two logs were actually from different runs. Here's two logs from the same run:
Sidecar:

{"ctlConf":{"Address":"localhost","IsLocal":false,"LocalFile":"","Timeout":0,"Test":""},"grpcPort":59357,"httpPort":59358,"message":"Starting sdk sidecar","severity":"info","source":"main","time":"2019-08-16T02:49:13.123397435Z","version":"0.12.0"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"created GameServer sidecar","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:13.139292491Z"}
{"message":"Starting SDKServer grpc service...","severity":"info","source":"main","time":"2019-08-16T02:49:13.13958613Z"}
{"message":"Starting SDKServer grpc-gateway...","severity":"info","source":"main","time":"2019-08-16T02:49:13.151124398Z"}
{"gsKey":"default/singularity-trainer-d2dv7","health":{"periodSeconds":5,"failureThreshold":3,"initialDelaySeconds":10},"message":"setting health configuration","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:13.239773391Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Starting GameServer health checking","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:13.239940314Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Starting SDKServer http health check...","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:13.239955493Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Starting workers...","queue":"agones.dev.default.singularity-trainer-d2dv7","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:13.239967738Z","workers":1}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Sending GameServer Event to connectedStreams","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:14.242782802Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Received Ready request, adding to queue","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.161620545Z"}
{"gsKey":"updateState","message":"Enqueuing","queue":"agones.dev.default.singularity-trainer-d2dv7","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.161857572Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.162167526Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.162273065Z"}
{"gsKey":"updateState","message":"Processing","queue":"agones.dev.default.singularity-trainer-d2dv7","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.171115581Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Updating state","severity":"info","source":"*sdkserver.SDKServer","state":"RequestReady","time":"2019-08-16T02:49:15.171414377Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"GameServerState already unhealthy. Skipping update.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:15.171601786Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:17.166464069Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:17.16810641Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:19.174507529Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:19.174846326Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:21.180557931Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:21.180889293Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:23.18658772Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:23.186926274Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:25.19245239Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:25.192794401Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:27.199602541Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:27.199941503Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:29.205562877Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:29.205910589Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:31.211613139Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:31.211951815Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:33.217552367Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:33.217855922Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:35.223563695Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:35.223855933Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:37.228546955Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:37.228844627Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:39.240113654Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:39.240474631Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:41.245543016Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:41.245841127Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Sending GameServer Event to connectedStreams","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:43.196217642Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health Ping Received","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:43.252670115Z"}
{"gsKey":"default/singularity-trainer-d2dv7","message":"Health stream closed.","severity":"info","source":"*sdkserver.SDKServer","time":"2019-08-16T02:49:43.25291413Z"}
...

Game:

[02:49:15    info] Serving on port: 7654
[02:49:15    info] Marking server as ready
[02:49:15    info] Health ping sent
[02:49:17    info] Health ping sent
[02:49:19    info] Health ping sent
[02:49:21    info] Health ping sent
[02:49:23    info] Health ping sent
[02:49:25    info] Health ping sent
[02:49:27    info] Health ping sent
[02:49:29    info] Health ping sent
[02:49:31    info] Health ping sent
[02:49:33    info] Health ping sent
[02:49:35    info] Health ping sent
[02:49:37    info] Health ping sent
[02:49:39    info] Health ping sent
[02:49:41    info] Health ping sent
[02:49:43    info] Health ping sent
[02:49:45    info] Health ping sent
[02:49:47    info] Health ping sent
[02:49:49    info] Health ping sent
[02:49:51    info] Health ping sent
[02:49:53    info] Health ping sent
[02:49:55    info] Health ping sent
[02:49:57    info] Health ping sent
[02:49:59    info] Health ping sent
[02:50:01    info] Health ping sent
...

kubectl describe gameserver:

Name:         singularity-trainer-d2dv7
Namespace:    default
Labels:       <none>
Annotations:  agones.dev/sdk-version: 0.12.0
API Version:  agones.dev/v1
Kind:         GameServer
Metadata:
  Creation Timestamp:  2019-08-16T02:49:11Z
  Finalizers:
    agones.dev
  Generate Name:     singularity-trainer-
  Generation:        1
  Resource Version:  231644
  Self Link:         /apis/agones.dev/v1/namespaces/default/gameservers/singularity-trainer-d2dv7
  UID:               6cbf97e3-bfd0-11e9-b99b-42010a920fe8
Spec:
  Container:  singularity-trainer
  Health:
    Failure Threshold:      3
    Initial Delay Seconds:  10
    Period Seconds:         5
  Ports:
    Container Port:  7654
    Host Port:       7565
    Name:            default
    Port Policy:     Dynamic
    Protocol:        TCP
  Scheduling:        Packed
  Template:
    Metadata:
      Creation Timestamp:  <nil>
    Spec:
      Containers:
        Image:              gcr.io/carbide-kite-208413/st-server
        Image Pull Policy:  Always
        Name:               singularity-trainer
        Resources:
Status:
  Address:    35.243.77.172
  Node Name:  gke-st-test-test-f1844041-133b
  Ports:
    Name:          default
    Port:          7565
  Reserved Until:  <nil>
  State:           Unhealthy
Events:
  Type     Reason          Age   From                   Message
  ----     ------          ----  ----                   -------
  Normal   PortAllocation  59s   gameserver-controller  Port allocated
  Normal   Creating        58s   gameserver-controller  Pod singularity-trainer-d2dv7 created
  Normal   Scheduled       58s   gameserver-controller  Address and port populated
  Warning  Unhealthy       55s   health-controller      Issue with Gameserver pod

@roberthbailey, I think you're onto something with the container restarting. Here's the relevant output of kubectl describe <pod-name>:

Containers:
  singularity-trainer:
    Container ID:   docker://19ea8686f4859628ef9a84e9ca34d81d022f53517959415fbe453b5df845c724
    Image:          gcr.io/carbide-kite-208413/st-server
    Image ID:       docker-pullable://gcr.io/carbide-kite-208413/st-server@sha256:f3bafc4c0715ea2587088225b42fb5fa3b1f0a1ac3e26545f4b6618e8df26733
    Port:           7654/TCP
    Host Port:      7565/TCP
    State:          Running
      Started:      Fri, 16 Aug 2019 11:49:15 +0900
    Last State:     Terminated
      Reason:       Error
      Exit Code:    139
      Started:      Fri, 16 Aug 2019 11:49:12 +0900
      Finished:     Fri, 16 Aug 2019 11:49:13 +0900
    Ready:          True
    Restart Count:  1

I'm not an expert on Kubernetes, but I assume this means that it segfaulted immediately after starting, then restarted and was fine. Does anyone know how I'd debug this?

@Omegastick
Copy link
Author

My server application was starting up before the sidecar, then exiting when it couldn't connect (because the sidecar hadn't started yet). On restart #956 was happening. I'll close this issue.

@markmandel
Copy link
Collaborator

Just created a PR #1004 to highlight this issue, so it doesn't impact other people in the same way.

@markmandel markmandel added the question I have a question! label Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs. question I have a question!
Projects
None yet
Development

No branches or pull requests

4 participants