Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestKubeletConfigCreate flake? #417

Closed
runcom opened this issue Feb 12, 2019 · 13 comments · Fixed by #437 or #457
Closed

TestKubeletConfigCreate flake? #417

runcom opened this issue Feb 12, 2019 · 13 comments · Fixed by #437 or #457
Assignees

Comments

@runcom
Copy link
Member

runcom commented Feb 12, 2019

This test fails from time to time, let's track it down

--- FAIL: TestKubeletConfigCreate (0.12s)
    --- PASS: TestKubeletConfigCreate/aws (0.10s)
    --- FAIL: TestKubeletConfigCreate/none (0.01s)
    	kubelet_config_controller_test.go:207: Expected
    			testing.CreateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"create", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"", Object:(*v1.MachineConfig)(0xc4202f3180)}
    		got
    			testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"kubeletconfigs"}, Subresource:"status"}, Object:(*v1.KubeletConfig)(0xc4202f0780)}
    	kubelet_config_controller_test.go:176: 2 additional expected actions:[{ActionImpl:{Namespace: Verb:patch Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:} Name:smaller-max-pods Patch:[123 34 109 101 116 97 100 97 116 97 34 58 123 34 102 105 110 97 108 105 122 101 114 115 34 58 91 34 57 57 45 109 97 115 116 101 114 45 104 53 53 50 109 45 115 109 97 108 108 101 114 45 109 97 120 45 112 111 100 115 45 107 117 98 101 108 101 116 34 93 125 125]} {ActionImpl:{Namespace: Verb:update Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:status} Object:0xc420133900}]
--- FAIL: TestKubeletConfigUpdates (0.17s)
    --- PASS: TestKubeletConfigUpdates/aws (0.07s)
    --- PASS: TestKubeletConfigUpdates/none (0.05s)
    --- FAIL: TestKubeletConfigUpdates/unrecognized (0.05s)
    	kubelet_config_controller_test.go:213: Expected
    			testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"99-master-p9l4d-kubelet"}
    		got
    			testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"kubeletconfigs"}, Subresource:"status"}, Object:(*v1.KubeletConfig)(0xc4203b1b80)}
    	kubelet_config_controller_test.go:76: 3 additional expected actions:[{ActionImpl:{Namespace: Verb:create Resource:{Group: Version: Resource:machineconfigs} Subresource:} Name: Object:0xc42003cf00} {ActionImpl:{Namespace: Verb:patch Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:} Name:smaller-max-pods Patch:[123 34 109 101 116 97 100 97 116 97 34 58 123 34 102 105 110 97 108 105 122 101 114 115 34 58 91 34 57 57 45 109 97 115 116 101 114 45 104 53 53 50 109 45 115 109 97 108 108 101 114 45 109 97 120 45 112 111 100 115 45 107 117 98 101 108 101 116 34 93 125 125]} {ActionImpl:{Namespace: Verb:update Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:status} Object:0xc4203b0f00}]
@runcom
Copy link
Member Author

runcom commented Feb 12, 2019

/cc @sjenning

@ashcrow
Copy link
Member

ashcrow commented Feb 12, 2019

Example: #415

rphillips added a commit to rphillips/machine-config-operator that referenced this issue Feb 15, 2019
@runcom runcom changed the title TestKubeletConfigCreate/none flake? TestKubeletConfigCreate flake? Feb 17, 2019
@runcom
Copy link
Member Author

runcom commented Feb 17, 2019

this is still happening @rphillips https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/446/pull-ci-openshift-machine-config-operator-master-unit/1182

-- FAIL: TestKubeletConfigUpdates (0.17s)
    --- PASS: TestKubeletConfigUpdates/aws (0.07s)
    --- PASS: TestKubeletConfigUpdates/none (0.05s)
    --- FAIL: TestKubeletConfigUpdates/unrecognized (0.05s)
    	kubelet_config_controller_test.go:213: Expected
    			testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"99-master-p9l4d-kubelet"}
    		got
    			testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"kubeletconfigs"}, Subresource:"status"}, Object:(*v1.KubeletConfig)(0xc4203b1b80)}
    	kubelet_config_controller_test.go:76: 3 additional expected actions:[{ActionImpl:{Namespace: Verb:create Resource:{Group: Version: Resource:machineconfigs} Subresource:} Name: Object:0xc42003cf00} {ActionImpl:{Namespace: Verb:patch Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:} Name:smaller-max-pods Patch:[123 34 109 101 116 97 100 97 116 97 34 58 123 34 102 105 110 97 108 105 122 101 114 115 34 58 91 34 57 57 45 109 97 115 116 101 114 45 104 53 53 50 109 45 115 109 97 108 108 101 114 45 109 97 120 45 112 111 100 115 45 107 117 98 101 108 101 116 34 93 125 125]} {ActionImpl:{Namespace: Verb:update Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:status} Object:0xc4203b0f00}]

@runcom
Copy link
Member Author

runcom commented Feb 18, 2019

Ok found out what's happening in tests at least, all the *Lister in the tests are racy or logic is flawed somewhere and can result in object not being properly listed after being added. Placed some debug and I can confirm that running with -race shows me that behavior:

== RUN   TestKubeletConfigUpdates/aws
runcom <nil> []runcom <nil> []=== RUN   TestKubeletConfigUpdates/none
runcom <nil> []runcom could not find any MachineConfigPool set for KubeletConfig smaller-max-pods []=== RUN   TestKubeletConfigUpdates/unrecognized
runcom <nil> []runcom <nil> []--- FAIL: TestKubeletConfigUpdates (0.59s)
    --- PASS: TestKubeletConfigUpdates/aws (0.22s)
    --- FAIL: TestKubeletConfigUpdates/none (0.13s)
        kubelet_config_controller_test.go:213: Expected
            	testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"99-master-5fvff-kubelet"}
            got
            	testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"kubeletconfigs"}, Subresource:"status"}, Object:(*v1.KubeletConfig)(0xc0002c2280)}
        kubelet_config_controller_test.go:76: 3 additional expected actions:[{ActionImpl:{Namespace: Verb:update Resource:{Group: Version: Resource:machineconfigs} Subresource:} Object:0xc00011d900} {ActionImpl:{Namespace: Verb:patch Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:} Name:smaller-max-pods Patch:[123 34 109 101 116 97 100 97 116 97 34 58 123 34 102 105 110 97 108 105 122 101 114 115 34 58 91 34 57 57 45 109 97 115 116 101 114 45 109 119 119 116 103 45 107 117 98 101 108 101 116 34 93 125 125]} {ActionImpl:{Namespace: Verb:update Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:status} Object:0xc0002c2f00}]

you can see the mcpLister failed to give back the expected object (see logs with "runcom " in them)

@runcom
Copy link
Member Author

runcom commented Feb 18, 2019

the informers store can also have bad data as a result of either a bug in test or code

@runcom
Copy link
Member Author

runcom commented Feb 18, 2019

Reopening since we still get this

@runcom runcom reopened this Feb 18, 2019
@sjenning
Copy link
Contributor

/assign @rphillips

@runcom
Copy link
Member Author

runcom commented Feb 19, 2019

I know what's happening, the listers for the mcp and the controllerconfig aren't in sync by the time we call syncHandler in tests and return an empty list which makes the syncHandler call syncStatusOnly which in turns call an unexpected (for the test) update action. I don't understand why the listers aren't in sync, they're just a store with a map and rwlock but there's no parallelism involved afaict.

umohnani8 added a commit to umohnani8/machine-config-operator that referenced this issue Feb 26, 2019
This is to ensure that the image is injected when the
installer is run.
This from an already approved & lgtm'ed PR openshift#417

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
@kikisdeliveryservice
Copy link
Contributor

@runcom I think I'm actually seeing this flake in my PR today:


TestKubeletConfigCreate (1.06s)
    --- FAIL: TestKubeletConfigCreate/aws (0.37s)
    	kubelet_config_controller_test.go:251: Expected
    			testing.CreateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"create", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"", Object:(*v1.MachineConfig)(0xc420408a00)}
    		got
    			testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"kubeletconfigs"}, Subresource:"status"}, Object:(*v1.KubeletConfig)(0xc420132500)}
    	kubelet_config_controller_test.go:84: 2 additional expected actions:[{ActionImpl:{Namespace: Verb:patch Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:} Name:smaller-max-pods Patch:[123 34 109 101 116 97 100 97 116 97 34 58 123 34 102 105 110 97 108 105 122 101 114 115 34 58 91 34 57 57 45 109 97 115 116 101 114 45 104 53 53 50 109 45 115 109 97 108 108 101 114 45 109 97 120 45 112 111 100 115 45 107 117 98 101 108 101 116 34 93 125 125]} {ActionImpl:{Namespace: Verb:update Resource:{Group:machineconfiguration.openshift.io Version:v1 Resource:kubeletconfigs} Subresource:status} Object:0xc42041adc0}]
    --- FAIL: TestKubeletConfigCreate/none (0.35s)

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/583/pull-ci-openshift-machine-config-operator-master-unit/1752

@runcom
Copy link
Member Author

runcom commented Mar 28, 2019

#583 (comment)

this is not happening anymore as a flake, just a test failure unrelated but related to ignition transition

@runcom runcom closed this as completed Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants