P0: Dependent objects are unexpectedly deleted #4117

yqwang-ms · 2020-01-08T03:40:16Z

Short summary about the issue/question:
In PAI, we saw some times for V100LP and int beds, that priorityclass and configmap got unexpectedly deleted.

Root Cause Summary:
We have made the cluster-scoped priorityclass owner to be namespace-scoped framework, whose behavior is undefined.

Note: Cross-namespace owner references are disallowed by design. This means: 1) Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped. 2) Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/

For previous version v1.14.2 deployed by paictl, the behavior MAY be job hang in deletion.
For current version v1.15.6 deployed by kubespray, the behavior MAY be job dependent objects are unexpectedly deleted.

Fix Plan:
We should not make framework is the owner of priorityclass.

P0 Default not to create priority class Default not to create priority class #4124 @abuccts
P0 Remove framework owner reference for priority class [Hotfix] Remove owner reference for priority class #4121 @abuccts
P0 Set and deploy v=2 log level for all k8s components Set and deploy v=2 log level for all k8s components #4123 @Binyang2014
P0.5: Support to add Pod Creation http error pattern to errorspec Support to add Pod Creation http error pattern to errorspec #4125 @yqwang-ms
P0.5: Watchdog auto delete leak priorityclass P0.5: Watchdog auto delete leak priorityclass #3866 @Binyang2014
P0.5 Disallow ApiServer HA in beds P0.5: Disallow ApiServer HA for Pod safety #4120 @Binyang2014
Above are tracked in Pure K8S Beta Release Plan - v0.17 #3872

In future (long term), we may need to replace priorityclass with queue sort plugin to achieve FIFO, however queue sort plugin is still an alpha feature.

Root Cause Analysis:
The whole story is

RestServer PATCH priorityclass's owner to be framework
GC controller checks if the priorityclass's owner exists, and because GC assume the cluster-scoped priorityclass's owner is also cluster-scoped. So check ApiServer with URL:
/apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg, which is definitely wrong.
For previous version v1.14.2 deployed by paictl, the ApiServer response is 400
I1107 08:57:17.913195 1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/f5mqjubyf5mqjuazehjq6x2zehgq6uuze9qprtazdhmq6x2z6r: (151.3µs) 400 [kube-controller-manager/v1.14.2 (linux/amd64) kubernetes/66049e3/generic-garbage-collector 10.151.41.16:49658]
For current version v1.15.6 deployed by kubespray, the ApiServer response is 404
I0108 09:20:02.289055 1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg: (183.698µs) 404 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
The response changed due to PR apiextensions: check request scope against CRD scope correctly kubernetes/kubernetes#80750 to fix CVE-2019-11247: API server allows access to custom resources via wrong scope kubernetes/kubernetes#80983.
GC controller will treat 400 as an err, so always retry it and so we may see job hang in deletion.
GC controller will treat 404 as object not found, so directly believes the framework object (assume its UID is U1) does not exist, so delete the priorityclass.
The GC believed "fact" U1 object does not exist is cached, so after this, if GC controller start to check the framework's configmap, it just use the cache, and believe the configmap's owner does not exist too (even it now can use correct URL to check with ApiServer), so delete the configmap.

The most time of the investigation is caused by

It is hard to repro and previous log level is not enough. So until today, we got the log and find the root cause. Create Set and deploy v=2 log level for all k8s components #4123 to refine log level.
Previous ApiServer behavior knowledge is based on previous version which returns 400, which will never cause unexpectedly deletion. So this stale knowledge, mislead the inference during debugging. So we should not upgrade the release k8s version after already developing and testing against an old version for a long time. @ydye for the awareness.
We have enabled ApiServer HA (P0.5: Disallow ApiServer HA for Pod safety #4120), which may be another potential root cause if GC controller is connected to ApiServers by a load balancer. But it is not the case (anyway it is the case for FC and Hived, so just decreased its priority) and much harder to happen. Anyway it distracts the investigation for the real cause.

All Related Logs:
Job: https://v100lp.openp.ai/job-detail.html?username=v-yugzh&jobName=nbgtval0107a_s2_a12_65d1
framework name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg

GC Controller v4 log:

I0108 09:20:02.036690       1 garbagecollector.go:403] processing item [frameworkcontroller.microsoft.com/v1/Framework, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg, uid: fe4cc369-395c-46e5-beae-7a89ab42ac86]
I0108 09:20:02.050164       1 disruption.go:397] updatePod called on pod "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0"
I0108 09:20:02.050204       1 disruption.go:460] No PodDisruptionBudgets found for pod erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0, PodDisruptionBudget controller will avoid syncing.
I0108 09:20:02.050212       1 disruption.go:400] No matching pdb for pod "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv160tnyr9g60w0-taskrole-0"
I0108 09:20:02.289530       1 garbagecollector.go:403] processing item [scheduling.k8s.io/v1/PriorityClass, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority, uid: d6218e03-a339-467a-927b-6ff243df404a]
I0108 09:20:02.289544       1 garbagecollector.go:403] processing item [v1/Secret, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid: a9f19d42-9445-49de-a10b-f3393c4decf4]
I0108 09:20:02.289564       1 garbagecollector.go:403] processing item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]

I0108 09:20:02.294078       1 garbagecollector.go:516] delete object [scheduling.k8s.io/v1/PriorityClass, namespace: , name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority, uid: d6218e03-a339-467a-927b-6ff243df404a] with propagation policy Background
I0108 09:20:02.295615       1 garbagecollector.go:516] delete object [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb] with propagation policy Foreground
I0108 09:20:02.295626       1 garbagecollector.go:516] delete object [v1/Secret, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid: a9f19d42-9445-49de-a10b-f3393c4decf4] with propagation policy Background

I0108 09:20:02.318945       1 graph_builder.go:517] add [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb] to the attemptToDelete, because it's waiting for its dependents to be deleted
I0108 09:20:02.319193       1 garbagecollector.go:403] processing item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]
I0108 09:20:02.324780       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=secrets, namespace default, name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred, uid a9f19d42-9445-49de-a10b-f3393c4decf4, event type delete
I0108 09:20:02.325717       1 garbagecollector.go:525] remove DeleteDependents finalizer for item [v1/ConfigMap, namespace: default, name: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid: 3d8de90a-1b96-44ad-95db-d3b78f0988eb]
I0108 09:20:02.340638       1 resource_quota_monitor.go:354] QuotaMonitor process object: /v1, Resource=configmaps, namespace default, name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, uid 3d8de90a-1b96-44ad-95db-d3b78f0988eb, event type delete

ApiServer v4 log:

I0108 09:20:02.289055       1 wrap.go:47] GET /apis/frameworkcontroller.microsoft.com/v1/frameworks/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg: (183.698µs) 404 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.293743       1 wrap.go:47] GET /apis/scheduling.k8s.io/v1/priorityclasses/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority: (3.683953ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.295187       1 wrap.go:47] GET /api/v1/namespaces/default/secrets/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred: (4.989737ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.295290       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (5.058236ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.318786       1 wrap.go:47] DELETE /apis/scheduling.k8s.io/v1/priorityclasses/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority: (24.406491ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.318966       1 wrap.go:47] DELETE /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (22.980508ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.324623       1 wrap.go:47] DELETE /api/v1/namespaces/default/secrets/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-regcred: (28.600338ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.325443       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (6.026623ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.328800       1 wrap.go:47] GET /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (2.765865ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]
I0108 09:20:02.340556       1 wrap.go:47] PATCH /api/v1/namespaces/default/configmaps/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt: (11.186558ms) 200 [kube-controller-manager/v1.15.6 (linux/amd64) kubernetes/7015f71/system:serviceaccount:kube-system:generic-garbage-collector 10.8.1.7:34546]

FC v2 log:

I0108 09:20:02.069054       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Added: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.069919       9 controller.go:1145] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to create ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.069958       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationPending] to [AttemptCreationRequested]
I0108 09:20:02.069977       9 controller.go:675] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkTimeoutCheck after 4m59.9999905s: FrameworkAttemptCreationTimeoutCheck
I0108 09:20:02.070001       9 controller.go:943] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: syncFrameworkState: Waiting ConfigMap to appear in the local cache or timeout

I0108 09:20:02.318923       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Updated: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:20:02.341065       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Deleted: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt

W0108 09:25:02.070561       9 controller.go:755] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: syncFrameworkState: ConfigMap does not appear in the local cache within timeout 5m0s, so consider it was deleted and explicitly delete it
I0108 09:25:02.075686       9 controller.go:1116] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to delete ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt, 3d8de90a-1b96-44ad-95db-d3b78f0988eb: confirm: true
I0108 09:25:02.075834       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationRequested] to [AttemptCompleted]
I0108 09:25:02.096900       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCompleted] to [AttemptCreationPending]
I0108 09:25:02.107065       9 controller.go:324] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: enqueueFrameworkObj: Framework ConfigMap Added: erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:25:02.107638       9 controller.go:1145] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Succeeded to create ConfigMap erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-attempt
I0108 09:25:02.107659       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationPending] to [AttemptCreationRequested]
I0108 09:25:02.141901       9 funcs.go:588] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg]: Transitioned Framework from [AttemptCreationRequested] to [AttemptPreparing]
W0108 09:25:02.185634       9 controller.go:487] [default/erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg][taskrole][0]: Failed to create Pod erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-taskrole-0: pods "erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-taskrole-0" is forbidden: no PriorityClass with name erpqjxb7f9m7wvk2cxu7crbc60rk0dv1bxtk4qv164t5ydhncgrg-priority was found

The text was updated successfully, but these errors were encountered:

Remove framework owner reference for priority class, ref #4117.

* add switch to enable priority class for job FIFO * remove framework owner reference for priority class, ref #4117

Update job priority class * add switch to enable priority class for job FIFO * remove framework owner reference for priority class, ref #4117

fanyangCS · 2020-02-27T06:53:52Z

@yqwang-ms , shall we close this issue?

yqwang-ms · 2020-02-27T07:37:43Z

All sub items done, close it

yqwang-ms assigned ydye and Binyang2014 Jan 8, 2020

yqwang-ms changed the title ~~Dependent object are unexpected deleetd~~ Dependent objects are unexpected deleetd Jan 8, 2020

yqwang-ms changed the title ~~Dependent objects are unexpected deleetd~~ Dependent objects are unexpectedly deleted Jan 8, 2020

scarlett2018 mentioned this issue Jan 8, 2020

Pure K8S Beta Release Plan - v0.17 #3872

Closed

54 tasks

yqwang-ms changed the title ~~Dependent objects are unexpectedly deleted~~ P0: Dependent objects are unexpectedly deleted Jan 8, 2020

yqwang-ms assigned abuccts and unassigned ydye Jan 8, 2020

abuccts added a commit that referenced this issue Jan 8, 2020

Remove owner reference for priority class

ef79417

Remove framework owner reference for priority class, ref #4117.

abuccts mentioned this issue Jan 8, 2020

[Hotfix] Remove owner reference for priority class #4121

Merged

yqwang-ms self-assigned this Jan 8, 2020

This was referenced Jan 8, 2020

P0.5: Disallow ApiServer HA for Pod safety #4120

Closed

Set and deploy v=2 log level for all k8s components #4123

Closed

abuccts added a commit that referenced this issue Jan 9, 2020

Remove owner reference for priority class (#4121)

86a7eb3

Remove framework owner reference for priority class, ref #4117.

This was referenced Jan 9, 2020

Default not to create priority class #4124

Closed

P0.5: Watchdog auto delete leak priorityclass #3866

Closed

Support to add Pod Creation http error pattern to errorspec #4125

Closed

abuccts added a commit that referenced this issue Jan 10, 2020

Update job priority class

6a24c95

* add switch to enable priority class for job FIFO * remove framework owner reference for priority class, ref #4117

abuccts mentioned this issue Jan 10, 2020

[Rest Server] Update job priority class #4131

Merged

abuccts added a commit that referenced this issue Jan 10, 2020

[Rest Server] Update job priority class (#4131)

2ecfbde

Update job priority class * add switch to enable priority class for job FIFO * remove framework owner reference for priority class, ref #4117

yqwang-ms closed this as completed Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P0: Dependent objects are unexpectedly deleted #4117

P0: Dependent objects are unexpectedly deleted #4117

yqwang-ms commented Jan 8, 2020 •

edited

Loading

fanyangCS commented Feb 27, 2020

yqwang-ms commented Feb 27, 2020

P0: Dependent objects are unexpectedly deleted #4117

P0: Dependent objects are unexpectedly deleted #4117

Comments

yqwang-ms commented Jan 8, 2020 • edited Loading

fanyangCS commented Feb 27, 2020

yqwang-ms commented Feb 27, 2020

yqwang-ms commented Jan 8, 2020 •

edited

Loading