Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

P0: Dependent objects are unexpectedly deleted #4117

Closed
6 tasks done
yqwang-ms opened this issue Jan 8, 2020 · 2 comments
Closed
6 tasks done

P0: Dependent objects are unexpectedly deleted #4117

yqwang-ms opened this issue Jan 8, 2020 · 2 comments
Assignees

Comments

@yqwang-ms
Copy link
Member

yqwang-ms commented Jan 8, 2020

Short summary about the issue/question:
In PAI, we saw some times for V100LP and int beds, that priorityclass and configmap got unexpectedly deleted.

Root Cause Summary:
We have made the cluster-scoped priorityclass owner to be namespace-scoped framework, whose behavior is undefined.

Note: Cross-namespace owner references are disallowed by design. This means: 1) Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped. 2) Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/

For previous version v1.14.2 deployed by paictl, the behavior MAY be job hang in deletion.
For current version v1.15.6 deployed by kubespray, the behavior MAY be job dependent objects are unexpectedly deleted.

Fix Plan:
We should not make framework is the owner of priorityclass.

In future (long term), we may need to replace priorityclass with queue sort plugin to achieve FIFO, however queue sort plugin is still an alpha feature.

Root Cause Analysis:
The whole story is

  1. RestServer PATCH priorityclass's owner to be framework

  2. GC controller checks if the priorityclass's owner exists, and because GC assume the cluster-scoped priorityclass's owner is also cluster-scoped. So check ApiServer with URL:
    /apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg, which is definitely wrong.

  3. For previous version v1.14.2 deployed by paictl, the ApiServer response is 400
    I1107 08:57:17.913195 1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/f5mqjubyf5mqjuazehjq6x2zehgq6uuze9qprtazdhmq6x2z6r: (151.3µs) 400 [kube-controller-manager/v1.14.2 (linux/amd64) kubernetes/66049e3/generic-garbage-collector 10.151.41.16:49658]
    For current version v1.15.6 deployed by kubespray, the ApiServer response is 404
    I0108 09:20:02.289055 1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg: (183.698µs) 404 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
    The response changed due to PR apiextensions: check request scope against CRD scope correctly kubernetes/kubernetes#80750 to fix CVE-2019-11247: API server allows access to custom resources via wrong scope kubernetes/kubernetes#80983.

  4. GC controller will treat 400 as an err, so always retry it and so we may see job hang in deletion.
    GC controller will treat 404 as object not found, so directly believes the framework object (assume its UID is U1) does not exist, so delete the priorityclass.

  5. The GC believed "fact" U1 object does not exist is cached, so after this, if GC controller start to check the framework's configmap, it just use the cache, and believe the configmap's owner does not exist too (even it now can use correct URL to check with ApiServer), so delete the configmap.

The most time of the investigation is caused by

  1. It is hard to repro and previous log level is not enough. So until today, we got the log and find the root cause. Create Set and deploy v=2 log level for all k8s components #4123 to refine log level.
  2. Previous ApiServer behavior knowledge is based on previous version which returns 400, which will never cause unexpectedly deletion. So this stale knowledge, mislead the inference during debugging. So we should not upgrade the release k8s version after already developing and testing against an old version for a long time. @ydye for the awareness.
  3. We have enabled ApiServer HA (P0.5: Disallow ApiServer HA for Pod safety #4120), which may be another potential root cause if GC controller is connected to ApiServers by a load balancer. But it is not the case (anyway it is the case for FC and Hived, so just decreased its priority) and much harder to happen. Anyway it distracts the investigation for the real cause.

All Related Logs:
Job: https://v100lp.openp.ai/job-detail.html?username=v-yugzh&jobName=nbgtval0107a_s2_a12_65d1
framework name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg

GC Controller v4 log:

I0108 09:20:02.036690       1 garbagecollector.go:403] processing item [frameworkcontroller.microsoft.com/v1/Framework, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg, uid: fe4cc369-395c-46e5-beae-7a89ab42ac86]
I0108 09:20:02.050164       1 disruption.go:397] updatePod called on pod "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0"
I0108 09:20:02.050204       1 disruption.go:460] No PodDisruptionBudgets found for pod erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0, PodDisruptionBudget controller will avoid syncing.
I0108 09:20:02.050212       1 disruption.go:400] No matching pdb for pod "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0"
I0108 09:20:02.289530       1 garbagecollector.go:403] processing item [scheduling.k8s.io/v1/PriorityClass, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority, uid: d6218e03-a339-467a-927b-6ff243df404a]
I0108 09:20:02.289544       1 garbagecollector.go:403] processing item [v1/Secret, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid: a9f19d42-9445-49de-a10b-f3393c4decf4]
I0108 09:20:02.289564       1 garbagecollector.go:403] processing item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]

I0108 09:20:02.294078       1 garbagecollector.go:516] delete object [scheduling.k8s.io/v1/PriorityClass, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority, uid: d6218e03-a339-467a-927b-6ff243df404a] with propagation policy Background
I0108 09:20:02.295615       1 garbagecollector.go:516] delete object [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb] with propagation policy Foreground
I0108 09:20:02.295626       1 garbagecollector.go:516] delete object [v1/Secret, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid: a9f19d42-9445-49de-a10b-f3393c4decf4] with propagation policy Background

I0108 09:20:02.318945       1 graph_builder.go:517] add [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb] to the attemptToDelete, because it's waiting for its dependents to be deleted
I0108 09:20:02.319193       1 garbagecollector.go:403] processing item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]
I0108 09:20:02.324780       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=secrets, namespace default, name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid a9f19d42-9445-49de-a10b-f3393c4decf4, event type delete
I0108 09:20:02.325717       1 garbagecollector.go:525] remove DeleteDependents finalizer for item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]
I0108 09:20:02.340638       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=configmaps, namespace default, name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid 3d8de90a-1b96-44ad-95db-d3b78f0988eb, event type delete

ApiServer v4 log:

I0108 09:20:02.289055       1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg: (183.698µs) 404 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.293743       1 wrap.go:47] GET /apis/scheduling.k8s.io/v1/priorityclasses/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority: (3.683953ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.295187       1 wrap.go:47] GET /api/v1/namespaces/default/secrets/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred: (4.989737ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.295290       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (5.058236ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.318786       1 wrap.go:47] DELETE /apis/scheduling.k8s.io/v1/priorityclasses/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority: (24.406491ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.318966       1 wrap.go:47] DELETE /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (22.980508ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.324623       1 wrap.go:47] DELETE /api/v1/namespaces/default/secrets/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred: (28.600338ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.325443       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (6.026623ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.328800       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (2.765865ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.340556       1 wrap.go:47] PATCH /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (11.186558ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]

FC v2 log:

I0108 09:20:02.069054       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Added: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.069919       9 controller.go:1145] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to create ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.069958       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationPending] to [AttemptCreationRequested]
I0108 09:20:02.069977       9 controller.go:675] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkTimeoutCheck after 4m59.9999905s: FrameworkAttemptCreationTimeoutCheck
I0108 09:20:02.070001       9 controller.go:943] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: syncFrameworkState: Waiting ConfigMap to appear in the local cache or timeout

I0108 09:20:02.318923       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Updated: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.341065       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Deleted: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt

W0108 09:25:02.070561       9 controller.go:755] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: syncFrameworkState: ConfigMap does not appear in the local cache within timeout 5m0s, so consider it was deleted and explicitly delete it
I0108 09:25:02.075686       9 controller.go:1116] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to delete ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, 3d8de90a-1b96-44ad-95db-d3b78f0988eb: confirm: true
I0108 09:25:02.075834       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationRequested] to [AttemptCompleted]
I0108 09:25:02.096900       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCompleted] to [AttemptCreationPending]
I0108 09:25:02.107065       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Added: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:25:02.107638       9 controller.go:1145] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to create ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:25:02.107659       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationPending] to [AttemptCreationRequested]
I0108 09:25:02.141901       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationRequested] to [AttemptPreparing]
W0108 09:25:02.185634       9 controller.go:487] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg][taskrole][0]: Failed to create Pod erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-taskrole-0: pods "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-taskrole-0" is forbidden: no PriorityClass with name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority was found
@yqwang-ms yqwang-ms changed the title Dependent object are unexpected deleetd Dependent objects are unexpected deleetd Jan 8, 2020
@yqwang-ms yqwang-ms changed the title Dependent objects are unexpected deleetd Dependent objects are unexpectedly deleted Jan 8, 2020
@yqwang-ms yqwang-ms changed the title Dependent objects are unexpectedly deleted P0: Dependent objects are unexpectedly deleted Jan 8, 2020
@yqwang-ms yqwang-ms assigned abuccts and unassigned ydye Jan 8, 2020
abuccts added a commit that referenced this issue Jan 8, 2020
Remove framework owner reference for priority class, ref #4117.
@yqwang-ms yqwang-ms self-assigned this Jan 8, 2020
abuccts added a commit that referenced this issue Jan 9, 2020
Remove framework owner reference for priority class, ref #4117.
abuccts added a commit that referenced this issue Jan 10, 2020
* add switch to enable priority class for job FIFO
* remove framework owner reference for priority class, ref #4117
abuccts added a commit that referenced this issue Jan 10, 2020
Update job priority class

* add switch to enable priority class for job FIFO
* remove framework owner reference for priority class, ref #4117
@fanyangCS
Copy link
Contributor

@yqwang-ms , shall we close this issue?

@yqwang-ms
Copy link
Member Author

All sub items done, close it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants