Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agones-system gets stuck in "Terminating" #1778

Closed
domgreen opened this issue Sep 1, 2020 · 12 comments · Fixed by #1783
Closed

agones-system gets stuck in "Terminating" #1778

domgreen opened this issue Sep 1, 2020 · 12 comments · Fixed by #1783
Labels
kind/bug These are bugs.
Milestone

Comments

@domgreen
Copy link
Contributor

domgreen commented Sep 1, 2020

What happened:

When deleting the agones-system namespace it got stuck in the Terminating state.

What you expected to happen:

It manages to successfully terminate the namespace without manual intervention.

How to reproduce it (as minimally and precisely as possible):

Not 100% sure what if any special things happened in the cluster to make it get stuck in terminating but in general:

  • Install Agones via YAML
  • Run a gameserver
  • delete the namespace.

Anything else we need to know?:
Some commands I used to get it to delete:

kubectl get ns                                                                                                                                                                                                                    

NAME              STATUS        AGE                                                                                                                                                                                                    
agones-system     Terminating   4d
kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n agones-system

error: unable to retrieve the complete list of server APIs: allocation.agones.dev/v1: the server is currently unable to handle the request
kubectl get ns agones-system -o json | jq                                                                                                                                                                                         
                                                                                                                                                                                                                                       
{                                                                                                                                                                                                                                      
  "apiVersion": "v1",
  "kind": "Namespace",
  "metadata": {
    "creationTimestamp": "...",
    "deletionTimestamp": "...",
    "name": "agones-system",
    "resourceVersion": "15278949",
    "selfLink": "/api/v1/namespaces/agones-system",
    "uid": "..."
  },
  "spec": {
    "finalizers": [
      "kubernetes"
    ]
  },
  "status": {
    "conditions": [
      {
        "lastTransitionTime": "...",
        "message": "Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: allocation.agones.dev/v1: the server is currently unable to handle the request",
        "reason": "DiscoveryFailed",
        "status": "True",
        "type": "NamespaceDeletionDiscoveryFailure"
      },
      {
        "lastTransitionTime": "...",
        "message": "All legacy kube types successfully parsed",
        "reason": "ParsedGroupVersions",
        "status": "False",
        "type": "NamespaceDeletionGroupVersionParsingFailure"
      },
      {
        "lastTransitionTime": "...",
        "message": "All content successfully deleted",
        "reason": "ContentDeleted",
        "status": "False",
        "type": "NamespaceDeletionContentFailure"
      }
    ],
    "phase": "Terminating"
  }
}
kubectl delete apiservice -n agones-system v1.allocation.agones.dev                                           
 
warning: deleting cluster-scoped resources, not scoped to the provided namespace
apiservice.apiregistration.k8s.io "v1.allocation.agones.dev" deleted

Finally followed this guide to help remove the namespace https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.1/troubleshoot/ns_terminating.html

Environment:

  • Agones version: 1.7.0
  • Kubernetes version (use kubectl version): 1.16.12-gke.3
  • Cloud provider or hardware configuration: gke
  • Install method (yaml/helm): yaml
  • Troubleshooting guide log(s):
  • Others:
@domgreen domgreen added the kind/bug These are bugs. label Sep 1, 2020
@aLekSer
Copy link
Collaborator

aLekSer commented Sep 1, 2020

I am able to reproduce this.
Not sure if this is related to the issue, but there are some warnings in events.

kubectl get events
LAST SEEN   TYPE      REASON                   OBJECT                                MESSAGE
4m28s       Warning   FailedToCreateEndpoint   endpoints/agones-allocator            Failed to create endpoint for service agones-system/agones-allocator: endpoints "agones-allocator" is forbidden: unable to create new content in namespace agones-system because it is being terminated
4m50s       Warning   FailedToCreateEndpoint   endpoints/agones-controller-service   Failed to create endpoint for service agones-system/agones-controller-service: endpoints "agones-controller-service" is forbidden:
unable to create new content in namespace agones-system because it is being terminated
4m29s       Warning   FailedToCreateEndpoint   endpoints/agones-ping-http-service    Failed to create endpoint for service agones-system/agones-ping-http-service: endpoints "agones-ping-http-service" is forbidden: unable to create new content in namespace agones-system because it is being terminated
4m29s       Warning   FailedToCreateEndpoint   endpoints/agones-ping-udp-service     Failed to create endpoint for service agones-system/agones-ping-udp-service: endpoints "agones-ping-udp-service" is forbidden: unable to create new content in namespace agones-system because it is being terminated

This might help in understanding better the situation and Kubernetes 1.16 (I did a test with 1.15 GKE cluster initially) would give more details in kubectl get ns agones-system I expect.
kubernetes/kubernetes#70916

@aLekSer
Copy link
Collaborator

aLekSer commented Sep 1, 2020

I installed agones with Terraform Helm module, latest master, GKE 1.16.13-gke.1 and received a different kubectl get ns output:

k get ns agones-system -o yaml
apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2020-09-01T16:06:03Z"
  deletionTimestamp: "2020-09-01T16:11:29Z"
  labels:
    name: agones-system
  name: agones-system
  resourceVersion: "3057"
  selfLink: /api/v1/namespaces/agones-system
  uid: 4b3d77b9-8765-40f6-a472-2b74a46e84fe
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2020-09-01T16:11:41Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: allocation.agones.dev/v1: the server is currently
      unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2020-09-01T16:11:35Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2020-09-01T16:12:05Z"
    message: 'Failed to delete all resource types, 1 remaining: unexpected items still
      remain in namespace: agones-system for gvr: /v1, Resource=pods'
    reason: ContentDeletionFailed
    status: "True"
    type: NamespaceDeletionContentFailure
  phase: Terminating

@markmandel
Copy link
Collaborator

Couple of questions:

  1. Which namespaces are you creating Agones and the GameServer in?
  2. Do you delete the GameServer before deleting Agones?

@domgreen
Copy link
Contributor Author

domgreen commented Sep 1, 2020

Couple of questions:

  1. Which namespaces are you creating Agones and the GameServer in?
  • Agones - agones-system
  • GameServer default
  1. Do you delete the GameServer before deleting Agones?

Nope, was basically trashing the cluster so wasnt being very gentle 😟

@markmandel
Copy link
Collaborator

Hmnn. Interesting.

Usually when I've run into this, it's because of a Finaliser issue - but we only set a Finaliser on the GameServer - which is not in the agones-system namespace. 🤔

@aLekSer
Copy link
Collaborator

aLekSer commented Sep 1, 2020

Well, this bug about deleting Agones controller in unusual way, which is not documented on agones.dev: by simply removing agones-system namespace. You could use kubectl delete -f install.yaml before removing the namespace and it would work.

@roberthbailey
Copy link
Member

I think the finalizer in the agones-system namespace is doing the right thing.

You need to uninstall agones before deleting the namespace, because there are CRDs installed with webhooks referencing the namespace where the agones controller is running.

@markmandel
Copy link
Collaborator

You need to uninstall agones before deleting the namespace, because there are CRDs installed with webhooks referencing the namespace where the agones controller is running.

Oooooh! That would make sense actually.

@domgreen
Copy link
Contributor Author

domgreen commented Sep 1, 2020

Yep, makes alot of sense. Worth adding something to docs or FAQ?

Will see if I can find a way around it for my use case (terraform destroy).

@aLekSer
Copy link
Collaborator

aLekSer commented Sep 1, 2020

We don't have a section about Agones uninstall in Install with YAML section. Which is a difference to Install using Helm.
https://agones.dev/site/docs/installation/install-agones/yaml/

@markmandel
Copy link
Collaborator

We don't have a section about Agones uninstall in Install with YAML section. Which is a difference to Install using Helm.
https://agones.dev/site/docs/installation/install-agones/yaml/

^ That definitely seems like a good addition!

@aLekSer
Copy link
Collaborator

aLekSer commented Sep 1, 2020

Well, I will create a PR soon, simple changing agones-system to agones-system2 (1.9.0-dev to 1.8.0) in install.yaml was enough to create Agones controller in a new namespace. (Only thing is certificate is valid for agones-controller-service.agones-system.svc, not agones-controller-service.agones-system2.svc) after this changes kubectl apply -f ./install.yaml and kubectl delete -f ./install.yaml stuck on

validatingwebhookconfiguration.admissionregistration.k8s.io "agones-validation-webhook" deleted

However kubectl delete ns agones-system2 did not timeout and was successful.

kubectl get ns agones-system2  -o yaml
apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2020-09-01T19:53:25Z"
  deletionTimestamp: "2020-09-01T19:56:25Z"
  name: agones-system2
  resourceVersion: "64933"
  selfLink: /api/v1/namespaces/agones-system2
  uid: ...
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2020-09-01T19:56:31Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2020-09-01T19:56:31Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2020-09-01T19:56:31Z"
    message: All content successfully deleted
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  phase: Terminating
kubectl get ns agones-system2  -o yaml
Error from server (NotFound): namespaces "agones-system2" not found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants