Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests flaky when run in parallel #5255

Closed
ianlewis opened this issue Sep 4, 2019 · 3 comments
Closed

Integration tests flaky when run in parallel #5255

ianlewis opened this issue Sep 4, 2019 · 3 comments
Assignees
Labels
area/testing priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@ianlewis
Copy link
Contributor

ianlewis commented Sep 4, 2019

I've found that integration tests that use a MinikubeRunner are pretty flaky when run in parallel.
Certain tests will fail with strange errors when run in parallel but complete successfully when run serially.

Example failure:

          - error: unable to recognize "STDIN": Get https://localhost:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused           
                                                                                                                                                            
                 With STDERR:                                                                                                                                 
                 ! The installed version of 'docker-machine-driver-kvm2' (1.3.1) is no longer current. Upgrade: https://minikube.sigs.k8s.io/docs/reference/driv
ers/kvm2#driver-installation                                                  
        *                                                                                                                                                   
        X Error starting cluster: timed out waiting to elevate kube-system RBAC privileges: creating clusterrolebinding: Post https://192.168.39.169:8443/apis/r
bac.authorization.k8s.io/v1beta1/clusterrolebindings?timeout=1m0s: x509: certificate is valid for 192.168.39.249, 10.96.0.1, 10.0.0.1, not 192.168.39.169
        *                                                                                                                                                       
        * Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
          - https://github.com/kubernetes/minikube/issues/new/choose                                                                                            
                                                                               
                 <------------ End of Start (TestGvisorRuntimeClass) log block

It seems that a static name is used for the VM network so running multiple VMs in parallel doesn't seem to work.

    minikube_runner.go:232: TestGvisorWorkload Failed to start minikube With error: exit status 70                                                          
                 begin Start log block ------------>                                                                                                      
                 With Profile: TestGvisorWorkload                                                                                                               
                 With Args: --container-runtime=containerd --docker-opt containerd=/var/run/containerd/containerd.sock
                 With Global Args:                                                                                                                              
                 With Driver Args: --vm-driver=kvm2 --wait-timeout=13m --wait=false                                                                             
                 With STDOUT:                                                                                                                                   
                 * [TestGvisorWorkload] minikube v1.4.0-beta.0 on Ubuntu 18.04     
        * Creating kvm2 VM (CPUs=2, Memory=2000MB, Disk=20000MB) ...                                                                                       
        Running pre-create checks...                                                                                                                        
        Creating machine...                                                                                                                                   
        (TestGvisorWorkload) Creating KVM machine...                                                                                                            
        (TestGvisorWorkload) KVM machine creation complete!                                                                         
                                                                                                                                                                
                 With STDERR:                                                                                                                                   
                 ! The installed version of 'docker-machine-driver-kvm2' (1.3.1) is no longer current. Upgrade: https://minikube.sigs.k8s.io/docs/reference/driv
ers/kvm2#driver-installation                                                                                                                                    
        E0904 08:34:21.187775   29774 start.go:801] StartHost: create: Error creating machine: Error in driver during machine creation: creating network: defini
ng network from xml:                                                                                                                                            
        <network>                                                                                                                                               
          <name>minikube-net</name>                                                                                                                             
          <dns enable='no'/>                                                                                                                                   
          <ip address='192.168.39.1' netmask='255.255.255.0'>                                                                                               
            <dhcp>                                                                                                                                     
              <range start='192.168.39.2' end='192.168.39.254'/>                                                                                              
            </dhcp>                                                                                                                                           
          </ip>                                                                                                                                                 
        </network>                                                                                                                                             
        : virError(Code=9, Domain=19, Message='operation failed: network 'minikube-net' already exists with uuid b236695e-df5e-403c-b58b-07bde41518d1')
@ianlewis
Copy link
Contributor Author

ianlewis commented Sep 4, 2019

I've also noticed that some past test runs exhibiting some strange behavior. For instance during the TestStartStopcni test, the minikube runner runs 'minikube start' with the right profile name but the arguments for the crio test.

https://storage.googleapis.com/minikube-builds/logs/5247/KVM_Linux.txt

=== CONT  TestStartStop/group/cni
 10:02:08 | Run: [/home/jenkins/workspace/KVM_Linux_integration/out/minikube-linux-amd64 -p=TestStartStopcni config set WantReportErrorPrompt false]
 10:02:08 | RunWithContext: [/home/jenkins/workspace/KVM_Linux_integration/out/minikube-linux-amd64 -p=TestStartStopcni start --vm-driver=kvm2  --wait-timeout=13m --wait=false --v=10 --logtostderr  --container-runtime=crio --disable-driver-mounts --extra-config=kubeadm.ignore-preflight-errors=SystemVerification]

I think this has to do with some incorrect fundamental assumptions about how the testing package does parallelism but I haven't looked into it deeply.

@tstromberg tstromberg self-assigned this Sep 4, 2019
@tstromberg
Copy link
Contributor

Thanks for the confirmation! I happened to have in my mind to remove the MinikubeRunner magic today and make the tests more explicit.

Related: #5254

@tstromberg tstromberg added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 4, 2019
@tstromberg tstromberg added this to the v1.4.0 milestone Sep 4, 2019
@tstromberg
Copy link
Contributor

tstromberg commented Sep 13, 2019

Thanks for reporting in!

The x509: certificate is valid for <ip> error was a TOCTOU issue has been addressed in the short-term by mutex locking generateCerts between processes. The complete solution -- per-cluster certs - is a bit more intensive: #4968

The network 'minikube-net' already exists with error was hopefully resolved by forcing tests to have unique. cluster names.

There is more work required required for the rarer issues, but parallel tests are now reliable when run locally for me. If you see any new unexplained failures, please open a new issue for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

2 participants