Agones health check shouldn't fail during game server container image pull #2966

mtcode · 2023-02-13T19:17:07Z

What happened:

The gameserver-sidecar health check fails while the game server container image is being pulled.

What you expected to happen:

The health check should not fail at this time, allowing image pull to complete without terminating the game server.

How to reproduce it (as minimally and precisely as possible):

Given the following example config, the health check will fail 105 seconds after start if the game server isn't healthy.

health:
    failureThreshold: 3
    initialDelaySeconds: 60
    periodSeconds: 15

Unfortunately, this doesn't account for the amount of time that it takes for the container image to be pulled, which can exceed 105 seconds in some cases. For example, if an image pull takes 3 minutes, the health check will fail and attempt to terminate the pod after 105 seconds.

Anything else we need to know?:

This behavior traces back to Agones v1.19, when #2355 was merged which altered the order of container startup so that Agones starts first and the game server starts second. This causes the delay timer to start earlier than previously.

A workaround is to increase the initialDelaySeconds to a larger value, the longest we expect image pull to take. This prevents the health check from failing and terminating the game server, but configuring the delay that large introduces a blind spot in monitoring. If image pulls take less than the delay, such as in the case where the image already exists locally, then there is no health monitoring during the remaining duration until the initialDelaySeconds expires.

Environment:

Agones version: v1.28
Kubernetes version (use kubectl version): v1.23
Cloud provider or hardware configuration: AWS
Install method (yaml/helm): Helm
Troubleshooting guide log(s):
Others:

The text was updated successfully, but these errors were encountered:

zmerlynn · 2023-03-14T16:05:00Z

Initially, my thinking on this bug was that initialDelaySeconds was absolutely the right thing to use here and that the "blind" healthcheck period wasn't super important. I've been really going back and forth, though.

Pat of the problem is there's no "right" order to start the sidecar vs the game server. There are pros and cons to each:

(Current) Sidecar starts before game server:
- (Pro) The SDK is reachable when the game server starts.
- (Con) Health settings have to include the time for the game server to start healthchecks, which may include cold start delays for e.g. image pulls or game server binary start.
- (Con) Complicates Autopilot resource adjustment because the game server is second in the manifest.
(before Move SDK sidecar to first position in container list #2355) Game server starts before sidecar:
- (Pro) Health checks flow immediately when game server is ready.
- (Con) The SDK may not be reachable when the game server starts. This is actually documented in the REST client documentation.

Frankly I'm leaning towards the following:

We move back to pre-Move SDK sidecar to first position in container list #2355 under a feature gate, and
we either add an SDK call or document an existing one that can be used as "are you there yet?", to allow game servers to wait on the SDK startup.

cc @roberthbailey @markmandel who were involved in #2355 and #2351

markmandel · 2023-03-14T17:05:19Z

I personally wouldn't go down the path of whether / what order to start the sidecar in.

I think we need a way to know what state the gameserver container is in -- much llike the (hackery) we do for health checking (we set annotations on the GameServer).

agones/pkg/gameservers/health.go

Lines 252 to 296 in 71c90a4

    
           func (hc *HealthController) skipUnhealthyGameContainer(gs *agonesv1.GameServer, pod *corev1.Pod) (bool, error) { 
        
           	if !metav1.IsControlledBy(pod, gs) { 
        
           		// This is not the Pod we are looking for 🤖 
        
           		return false, nil 
        
           	} 
        
           	// If the GameServer is before Ready, both annotation values should be "" 
        
           	// If the GameServer is past Ready, both the annotations should be exactly the same. 
        
           	// If they are annotations are different, then the data between the GameServer and the Pod is out of sync, 
        
           	// in which case, send it back to the queue to try again. 
        
           	gsReadyContainerID := gs.ObjectMeta.Annotations[agonesv1.GameServerReadyContainerIDAnnotation] 
        
           	if pod.ObjectMeta.Annotations[agonesv1.GameServerReadyContainerIDAnnotation] != gsReadyContainerID { 
        
           		return false, workerqueue.NewDebugError(errors.Errorf("pod and gameserver %s data are out of sync, retrying", gs.ObjectMeta.Name)) 
        
           	} 
        
           	if gs.IsBeforeReady() { 
        
           		hc.baseLogger.WithField("gs", gs.ObjectMeta.Name).WithField("state", gs.Status.State).Debug("skipUnhealthyGameContainer: Is Before Ready. Checking failed container") 
        
           		// If the reason for failure was a container failure, then we can skip moving to Unhealthy. 
        
           		// otherwise, we know it was one of the other reasons (eviction, lack of ports), so we should definitely go to Unhealthy. 
        
           		return hc.failedContainer(pod), nil 
        
           	} 
        
           	// finally, we need to check if the failed container happened after the gameserver was ready or before. 
        
           	for _, cs := range pod.Status.ContainerStatuses { 
        
           		if cs.Name == gs.Spec.Container { 
        
           			if cs.State.Terminated != nil { 
        
           				hc.baseLogger.WithField("gs", gs.ObjectMeta.Name).WithField("podStatus", pod.Status).Debug("skipUnhealthyGameContainer: Container is terminated, returning false") 
        
           				return false, nil 
        
           			} 
        
           			if cs.LastTerminationState.Terminated != nil { 
        
           				// if the current container is running, and is the ready container, then we know this is some 
        
           				// other pod update, and we previously had a restart before we got to being Ready, and therefore 
        
           				// shouldn't move to Unhealthy. 
        
           				check := cs.ContainerID == gsReadyContainerID 
        
           				if !check { 
        
           					hc.baseLogger.WithField("gs", gs.ObjectMeta.Name).WithField("gsMeta", gs.ObjectMeta).WithField("podStatus", pod.Status).Debug("skipUnhealthyGameContainer: Container crashed after Ready, returning false") 
        
           				} 
        
           				return check, nil 
        
           			} 
        
           			break 
        
           		} 
        
           	} 
        
           	hc.baseLogger.WithField("gs", gs.ObjectMeta.Name).WithField("gsMeta", gs.ObjectMeta).WithField("podStatus", pod.Status).Debug("skipUnhealthyGameContainer: Game Container has not crashed, game container may be healthy") 
        
           	return false, nil

Maybe we take a similar approach (or allow the sidecar to see Pods as well as GameServers?)

As an interesting approach - the sidecar can create and patch events! And events will let us know if a container is pulling -- that may be a good way to do this - maybe rather than allowing visibility into pods, we just allow visibility into events? (is that better / worse?)

zmerlynn · 2023-03-14T18:05:42Z

🤔 I spent a while looking into this at some point, and there's really no good way for one container to learn that another is up in a clean way:

You can jump through hoops so SDK can monitor the game server container as you're describing, but they are mostly inherently racy, or have other drawbacks like load on kube-apiserver, which we already induce a lot of.
You can do something hacky with shared process namespace to see the other container come up, i.e. start sidecar, sidecar waits for a new PID to appear.
Or you can do something network based between the game server and the SDK. But once you go down this route, you quickly realize "something network based" is basically health/readiness checks.

The last, "something network based" is by far the cleanest approach, since each side of the process is allowed flexbility on when to claim it's started or not. The simplicity and flexibility are one of the reasons k8s does it that way, too: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

zmerlynn · 2023-03-15T16:32:50Z

I've thought about this more, and I am back to thinking that InitialDelaySeconds is the correct abstraction to use here. Here's my argument:

The game server container can already use liveness probes from kubelet. kubelet knows exactly when the container starts and already has rich configuration for liveness and startup probes.
initialDelaySeconds is only relevant until the first healthcheck is received. So it should be thought of as not a forced delay but rather "how long to wait before Agones will start considering failing a GameServer". Other processes, like kubelet container restart, can still restart the container, and there's plenty of valid use cases where I can imagine a long initialDelaySeconds and expect a flapping container, even outside of image pulls - e.g. crashing because some other dependency hasn't started.

A workaround is to increase the initialDelaySeconds to a larger value, the longest we expect image pull to take. This prevents the health check from failing and terminating the game server, but configuring the delay that large introduces a blind spot in monitoring.

As long as you have liveness probes from kubelet (which is already aware of container pull/start), what's the blind spot in monitoring? Remember that we are mostly likely talking about a container that has not yet called Ready() [1], so if something were to fail, it's likely to just result in the container restarting pre-Ready() due to failed liveness checks, after the long pull. If it succeeds, no harm no foul.

I think I'm missing the problem with a long initialDelaySeconds. Can somewhere describe the scenario with a timeline?

[1] Sidebar: I think there is currently nothing technically stopping the game server from going Ready() without a single call to Health(), but a typical game server is going to start its health-ping routine before calling Ready(). We should perhaps close this possibility, though, and call touchHealthLastUpdated on Ready() as well.

mtcode · 2023-03-16T18:14:29Z

initialDelaySeconds is only relevant until the first healthcheck is received. So it should be thought of as not a forced delay but rather "how long to wait before Agones will start considering failing a GameServer".

While I agree with the statement, the Agones health check doesn't really have much intrinsic value if there is no action taken upon failure. Setting initialDelaySeconds to a large value where the Agones health check will not act for the duration if there is an actual internal liveness issue that the kubelet won't detect is the blind spot that I referred to.

Here's an example:

initialDelaySeconds is set to 300 to allow an additional 5 minutes for images to be pulled from a container repository, and prevent the Agones health check from evicting game servers while they are in ImagePullBackoff. The Agones sidecar starts while the main container is waiting for its image to be pulled, and the delay prevents the health check from running (and failing) and the pod from being killed. After 5 minutes, the main container starts. Now if there are any health issues, Agones will act on them and restart the game server as necessary. All is well.

The second time around, initialDelaySeconds is still set to 300, but the image already exists locally in the cluster, so both the sidecar and the main container start immediately. However, maybe there's an internal issue and the game server becomes unhealthy, without crashing, and its health endpoint returns error codes, or maybe the health endpoint doesn't respond at all. Now, we have to wait 5 minutes before the health check starts failing and the pod gets evicted.

This time period where the Agones health check cannot act because of the forced artificial delay before it can consider failing a GameServer is problematic.

zmerlynn · 2023-03-16T22:54:52Z

Thanks for the detailed reply, @mtcode!

However, maybe there's an internal issue and the game server becomes unhealthy, without crashing, and its health endpoint returns error codes, or maybe the health endpoint doesn't respond at all. Now, we have to wait 5 minutes before the health check starts failing and the pod gets evicted.

I'm arguing this condition should be covered by the kubelet liveness check instead. If liveness probes fail, kubelet will kill the pod (and then either restart or not, depending on the restartPolicy), then agones-controller will see the container termination and the GameServer will go Unhealthy - which is what we want.

While I agree with the statement, the Agones health check doesn't really have much intrinsic value if there is no action taken upon failure.

This statement might actually be true. I'm not seeing a lot of value for Agones health checks over kubelet probes, and Agones container management will always be inherently racy when compared to kubelet. I'm actually wondering if it would be better just to advocate for kubelet probes instead and let Agones pick up the failed container, but I realize this is .. kind of a radical position to take. I will do a little more research to understand why we have our own health check system.

zmerlynn · 2023-03-16T23:20:22Z

Ok, I had an internal discussion with @markmandel and now I get what's going on. The sidecar is proxying the liveness probe anyways, to avoid the game server having to establish its own probes. Let me think on this, but I still think a network based solution is about the only way forward.

zmerlynn · 2023-03-17T00:13:28Z

Okay! After talking about this longer with @markmandel and @mtcode, I think we might have a plan! Sorry for the confusion above, I really didn't understand that the Agones sidecar was proxying /gshealthz for the game server, which absolutely makes sense.

Here's the thinking, hat tip to @markmandel for connecting the dots. Background:

We spin up the sidecar first, and the game server second. [1] So the SDK is running by the time the GS is up.

On the game server container, we today tack on a liveness probe to /gshealthz - this liveness probe is served by the SDK server. Code here:

agones/pkg/gameservers/controller.go

Lines 699 to 715 in a5af677

    
           return gs.ApplyToPodContainer(pod, gs.Spec.Container, func(c corev1.Container) corev1.Container { 
        
           	if c.LivenessProbe == nil { 
        
           		c.LivenessProbe = &corev1.Probe{ 
        
           			ProbeHandler: corev1.ProbeHandler{ 
        
           				HTTPGet: &corev1.HTTPGetAction{ 
        
           					Path: "/gshealthz", 
        
           					Port: intstr.FromInt(8080), 
        
           				}, 
        
           			}, 
        
           			InitialDelaySeconds: gs.Spec.Health.InitialDelaySeconds, 
        
           			PeriodSeconds:       gs.Spec.Health.PeriodSeconds, 
        
           			FailureThreshold:    gs.Spec.Health.FailureThreshold, 
        
           		} 
        
           	} 
        
           	return c 
        
           })

But on line 708 of that code, we set initialDelaySeconds. kubelet waits for image pull and container start, then just sleeps initialDelaySeconds before sending any probes (documented here). Buuut.. in our setup, this results in a huge gap (as @mtcode insisted, and was correct) - with a large initialDelaySeconds, kubelet doesn't even start monitoring until well after the container start.

Solution:

We don't need the initialDelaySeconds on container configuration. We haven't needed it since Move SDK sidecar to first position in container list #2355 changed the container ordering. Why not? Because the game server's liveness probes are going to the SDK server, which is already up!
And, in fact, we can use the kubelet liveness probe to determine whether the container is up or not. Rather than have the SDK server start its own initialDelaySeconds when the SDK starts, we can instead have the SDK wait for the first kubelet probe. When we first see /gshealthz, we can start the SDK initialDelaySeconds and then wait for the game server to initialize.

[1] Sidebar: There's some nuance as to why the way we currently do it is a little off and not totally guaranteed, but it generally works fine because the game server binaries take a while to pull.

markmandel · 2023-03-17T03:09:49Z

Thanks for the comprehensive writeup!

When we first see /gshealthz, we can start the SDK initialDelaySeconds and then wait for the game server to initialize.

Question on this point. My thought here was that we pass the initialDelaySeconds down to the Pod's configured health check:

agones/pkg/gameservers/controller.go

Lines 699 to 712 in dc592e9

    
           return gs.ApplyToPodContainer(pod, gs.Spec.Container, func(c corev1.Container) corev1.Container { 
        
           	if c.LivenessProbe == nil { 
        
           		c.LivenessProbe = &corev1.Probe{ 
        
           			ProbeHandler: corev1.ProbeHandler{ 
        
           				HTTPGet: &corev1.HTTPGetAction{ 
        
           					Path: "/gshealthz", 
        
           					Port: intstr.FromInt(8080), 
        
           				}, 
        
           			}, 
        
           			InitialDelaySeconds: gs.Spec.Health.InitialDelaySeconds, 
        
           			PeriodSeconds:       gs.Spec.Health.PeriodSeconds, 
        
           			FailureThreshold:    gs.Spec.Health.FailureThreshold, 
        
           		} 
        
           	}

That way the SDK itself doesn't even need to track or be aware of the initialDelaySeconds, since it's baked into that initial ping on /gshealthz from the kubelet. It becomes a matter of "just start health checking on first hit on /gshealthz, whenever that happens.

Or are we saying the same thing?

roberthbailey · 2023-03-17T09:48:35Z

[1] Sidebar: There's some nuance as to why the way we currently do it is a little off and not totally guaranteed, but it generally works fine because the game server binaries take a while to pull.

This doesn't sound entirely correct to me. While the first pull onto a node can be slow, subsequent game server pods that start on the same machine shouldn't incur any pull time as the container image will be cached. So in cases where you have lots of game servers per machine (like in the simple game server load tests we run) most of the game server binaries will have next to 0 pull time at startup.

zmerlynn · 2023-03-21T17:21:37Z

That way the SDK itself doesn't even need to track or be aware of the initialDelaySeconds, since it's baked into that initial ping on /gshealthz from the kubelet. It becomes a matter of "just start health checking on first hit on /gshealthz, whenever that happens.

I'm a little confused - InitialDelaySeconds is already baked into that healthcheck, and that's the issue. Since the game server container starts second, we can assume that the SDK should already be started (which means we don't need to tell the kubelet to delay checking) - so I think it should not be in the healthcheck config for the container.

However, initialDelaySeconds still serves a purpose - even measuring the startup from the time the container starts to the time of initial healthcheck, the game server may have a slow startup. i.e. with more concrete example timing:

Let's take a more concrete example with:

health:
    initialDelaySeconds: 30
    periodSeconds: 5
    failureThreshold: 3

and propose flow, example with a game server that takes 25s to start up:

time	event
0	SDK starts
2m	game server container is pulled and starts
2m	kubelet sends check to `/gshealthz`, SDK records container start time and replies `ok`
2m5s	kubelet sends check to `/gshealthz`, SDK replies `ok`
2m10s	kubelet sends check to `/gshealthz`, SDK replies `ok`
2m15s	kubelet sends check to `/gshealthz`, SDK replies `ok`
2m20s	kubelet sends check to `/gshealthz`, SDK replies `ok`
2m25	game server calls `Health()`, and continues - SDK continues replying `ok` after this.

The point here is that the GS might still have other conditions where initialDelaySeconds makes sense, but it also makes sense to track it in case the game server hangs. Above, if we had gone to 2m30s without hearing from the gameserver, we would have started failing kubelet healthchecks.

This doesn't sound entirely correct to me. While the first pull onto a node can be slow, subsequent game server pods that start on the same machine shouldn't incur any pull time as the container image will be cached. So in cases where you have lots of game servers per machine (like in the simple game server load tests we run) most of the game server binaries will have next to 0 pull time at startup.

I confirmed with our internal sig-node team previously that the way we are currently doing it is less guaranteed than the original blog post and may still race the container startup. I think we mostly don't see it because the SDK starts quickly.

zmerlynn · 2023-03-27T21:30:15Z

I was able to repro this by creating a large image simple-game-server. In fact the pull time was comically large with a 3G image: 69eeaab

This took about 2m12 on a GKE Autopilot cluster, even with Image Streaming: Successfully pulled image "gcr.io/zml-gke-dev/simple-game-server:0.15-big" in 2m12.508554656s (2m12.508571775s including waiting)

Unfortunately this was good for about one pull, as Image Streaming does successfully cache it after, for every node in the project. So I'll need to test with it disabled, or re-push each time (either works, really).

Implements googleforgames#2966 (comment): * Remove the InitialDelaySeconds from the game server container configuration. The SDK will be available prior to the game server starting. * Rework how InitialDelaySeconds works in the SDK: Rather than starting the timer in Run(), start the timer on first /gshealthz, the URL for the kubelet liveness probe for the game server. kubelet will not send a liveness probe until after the container has started, so we can use the first /gshealthz to indicate the container is actually running. * We still need the concept of InitialDelaySeconds to handle the case that, after container creation, the game server takes a while to initialize before calling Health(). This is more-or-less what the field meant prior to googleforgames#2355, so this PR is more returning it to that state.

zmerlynn · 2023-03-31T17:37:06Z

@markmandel and I talked about this more yesterday, and settled on a different model:

(echoing Rework game server health initial delay handling #3046 (comment)): I missed the point he made above that we could just rely on kubelet to drive the initial timing, and just don't check until then
I noticed we could actually further simplify and remove our own go routine that was periodically monitoring the health state. Instead we can rely on the /gshealthz signal from kubelet.

So describing a bit more thoroughly:

We remove any knowledge in the SDK of InitialDelaySeconds
We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler
Along the way, I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy.

See googleforgames#2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * I was glancing at how time was used through the SDK and noticed one place where we don't cast to UTC - adjusted that.

See googleforgames#2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy.

* Rework health check handling of InitialDelaySeconds See #2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * Close race if enqueueState is called rapidly before update can succeed * Re-add Autopilot 1.26 to test matrix (removed in #3059)

zmerlynn · 2023-04-04T20:06:42Z

Send revert #3068, will close when we get it back in.

This is a redrive of googleforgames#3046, which was reverted in googleforgames#3068 Rework health check handling of InitialDelaySeconds. See googleforgames#2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * Close race if enqueueState is called rapidly before update can succeed * Re-add Autopilot 1.26 to test matrix (removed in googleforgames#3059)

* Rework game server health initial delay handling This is a redrive of #3046, which was reverted in #3068 Rework health check handling of InitialDelaySeconds. See #2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * Close race if enqueueState is called rapidly before update can succeed * Re-add Autopilot 1.26 to test matrix (removed in #3059) * Close consistency race in syncGameServerRequestReadyState: If the SDK and controller win the race to update the Pod with the GameServerReadyContainerIDAnnotation before kubelet even gets a chance to add the running containers to the Pod, the controller may update the pod with an empty annotation, which then confuses further runs. * Fixes TestPlayerConnectWithCapacityZero flakes May fully fix #2445 as well

* Rework health check handling of InitialDelaySeconds See googleforgames#2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * Close race if enqueueState is called rapidly before update can succeed * Re-add Autopilot 1.26 to test matrix (removed in googleforgames#3059)

* Rework game server health initial delay handling This is a redrive of googleforgames#3046, which was reverted in googleforgames#3068 Rework health check handling of InitialDelaySeconds. See googleforgames#2966 (comment): * We remove any knowledge in the SDK of InitialDelaySeconds * We remove the runHealth goroutine from main and shift this responsibility to the /gshealthz handler Along the way: * I noted that the FailureThreshold doesn't need to be enforced on both the kubelet and SDK side, so in the injected liveness probe, I dropped that to 1. Previously we were waiting more probes than we needed to. In practice this is not terribly relevant since the SDK pushes it into Unhealthy. * Close race if enqueueState is called rapidly before update can succeed * Re-add Autopilot 1.26 to test matrix (removed in googleforgames#3059) * Close consistency race in syncGameServerRequestReadyState: If the SDK and controller win the race to update the Pod with the GameServerReadyContainerIDAnnotation before kubelet even gets a chance to add the running containers to the Pod, the controller may update the pod with an empty annotation, which then confuses further runs. * Fixes TestPlayerConnectWithCapacityZero flakes May fully fix googleforgames#2445 as well

mtcode · 2023-04-12T18:43:35Z

Thank you! I see that this was released in Agones v1.31.0.

zmerlynn · 2023-04-12T19:02:23Z

@mtcode Yup! Give it a whirl, feedback quite welcome!

mtcode added the kind/bug These are bugs. label Feb 13, 2023

markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Feb 13, 2023

zmerlynn self-assigned this Mar 14, 2023

zmerlynn mentioned this issue Mar 28, 2023

Rework game server health initial delay handling #3046

Merged

zmerlynn closed this as completed in #3046 Apr 4, 2023

zmerlynn reopened this Apr 4, 2023

zmerlynn mentioned this issue Apr 5, 2023

Rework game server health initial delay handling #3072

Merged

zmerlynn closed this as completed in #3072 Apr 6, 2023

Kalaiselvi84 added this to the 1.31.0 milestone Apr 10, 2023

markmandel mentioned this issue Nov 10, 2023

GameServer going to Unhealthy before either container has started + Shutdown() fails when unhealthy #2549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agones health check shouldn't fail during game server container image pull #2966

Agones health check shouldn't fail during game server container image pull #2966

mtcode commented Feb 13, 2023

zmerlynn commented Mar 14, 2023 •

edited

Loading

markmandel commented Mar 14, 2023

zmerlynn commented Mar 14, 2023

zmerlynn commented Mar 15, 2023 •

edited

Loading

mtcode commented Mar 16, 2023

zmerlynn commented Mar 16, 2023

zmerlynn commented Mar 16, 2023

zmerlynn commented Mar 17, 2023

markmandel commented Mar 17, 2023

roberthbailey commented Mar 17, 2023

zmerlynn commented Mar 21, 2023

zmerlynn commented Mar 27, 2023

zmerlynn commented Mar 31, 2023 •

edited

Loading

zmerlynn commented Apr 4, 2023

mtcode commented Apr 12, 2023

zmerlynn commented Apr 12, 2023

Agones health check shouldn't fail during game server container image pull #2966

Agones health check shouldn't fail during game server container image pull #2966

Comments

mtcode commented Feb 13, 2023

zmerlynn commented Mar 14, 2023 • edited Loading

markmandel commented Mar 14, 2023

zmerlynn commented Mar 14, 2023

zmerlynn commented Mar 15, 2023 • edited Loading

mtcode commented Mar 16, 2023

zmerlynn commented Mar 16, 2023

zmerlynn commented Mar 16, 2023

zmerlynn commented Mar 17, 2023

markmandel commented Mar 17, 2023

roberthbailey commented Mar 17, 2023

zmerlynn commented Mar 21, 2023

zmerlynn commented Mar 27, 2023

zmerlynn commented Mar 31, 2023 • edited Loading

zmerlynn commented Apr 4, 2023

mtcode commented Apr 12, 2023

zmerlynn commented Apr 12, 2023

zmerlynn commented Mar 14, 2023 •

edited

Loading

zmerlynn commented Mar 15, 2023 •

edited

Loading

zmerlynn commented Mar 31, 2023 •

edited

Loading