Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Set stability iterations on Windows tests to 1 #4246

Closed
wants to merge 2 commits into from

Conversation

PatrickLang
Copy link
Contributor

The tests with cfg.stabilityIterations checks are important and shouldn't be skipped. If they're too flaky to run 10x, at least run 1x.

I heard a report that service IPs were broken in acs-engine deployments, so I went to check on the test cases and found that they weren't run. This sets it back to run at least once.

Resolves #4245

@ghost ghost assigned PatrickLang Nov 14, 2018
@ghost ghost added the in progress label Nov 14, 2018
Copy link
Collaborator

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only run them on linux !eng.HasWindowsAgents()?

@codecov
Copy link

codecov bot commented Nov 14, 2018

Codecov Report

Merging #4246 into master will decrease coverage by 4.95%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #4246      +/-   ##
=========================================
- Coverage   55.45%   50.5%   -4.96%     
=========================================
  Files         109     109              
  Lines       16054   17160    +1106     
=========================================
- Hits         8903    8666     -237     
- Misses       6368    7710    +1342     
- Partials      783     784       +1

@PatrickLang
Copy link
Contributor Author

/hold
Thanks @jsturtevant . Will check and fix it tomorrow

@PatrickLang
Copy link
Contributor Author

Fixed & tests scheduled

@PatrickLang
Copy link
Contributor Author

PatrickLang commented Nov 14, 2018

Ok - this is actually failing now in all the windows runs with Azure-CNI v1.0.13. @jsturtevant @daschott and I thought it would fail, and it might be a real bug

STEP: Exposing a internal service for the linux nginx deployment
2018/11/14 16:41:51 $ kubectl expose deployment nginx-dns-kubernetes-southcentralus-1699-3293 --type ClusterIP -n default --target-port 80 --port 80
2018/11/14 16:41:51 #### $ kubectl expose deployment nginx-dns-kubernetes-southcentralus-1699-3293 --type ClusterIP -n default --target-port 80 --port 80 completed in 356.913314ms
STEP: Exposing a internal service for the windows iis deployment
2018/11/14 16:41:52 $ kubectl expose deployment iis-dns-kubernetes-southcentralus-1699-88446 --type ClusterIP -n default --target-port 80 --port 80
2018/11/14 16:41:52 #### $ kubectl expose deployment iis-dns-kubernetes-southcentralus-1699-88446 --type ClusterIP -n default --target-port 80 --port 80 completed in 399.014003ms
STEP: Connecting to Windows from another Windows deployment
2018/11/14 16:41:52 $ kubectl run windows-2-windows-kubernetes-southcentralus-1699-66436 -n default --image microsoft/windowsservercore:1803 --image-pull-policy=IfNotPresent --restart=Never --overrides { "spec": {"nodeSelector":{"beta.kubernetes.io/os":"windows"}}} --command -- powershell iwr -UseBasicParsing -TimeoutSec 60 iis-dns-kubernetes-southcentralus-1699-88446
2018/11/14 16:41:53 #### $ kubectl run windows-2-windows-kubernetes-southcentralus-1699-66436 -n default --image microsoft/windowsservercore:1803 --image-pull-policy=IfNotPresent --restart=Never --overrides { "spec": {"nodeSelector":{"beta.kubernetes.io/os":"windows"}}} --command -- powershell iwr -UseBasicParsing -TimeoutSec 60 iis-dns-kubernetes-southcentralus-1699-88446 completed in 314.919451ms
.
2018/11/14 16:42:29 iwr : The remote name could not be resolved: 

'iis-dns-kubernetes-southcentralus-1699-88446'

At line:1 char:1

+ iwr -UseBasicParsing -TimeoutSec 60 iis-dns-kubernetes-southcentralus ...

+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt 

   pWebRequest) [Invoke-WebRequest], WebException

    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe 

   ll.Commands.InvokeWebRequestCommand

 


2018/11/14 16:42:29 $ kubectl delete po -n default windows-2-windows-kubernetes-southcentralus-1699-66436
2018/11/14 16:42:30 #### $ kubectl delete po -n default windows-2-windows-kubernetes-southcentralus-1699-66436 completed in 367.684252ms
2018/11/14 16:42:30 Ran command on 1 of 1 desired attempts with 0 successes


• Failure [102.187 seconds]
Azure Container Cluster using the Kubernetes Orchestrator
/go/src/github.com/Azure/acs-engine/test/e2e/kubernetes/kubernetes_test.go:82
  with a windows agent pool
  /go/src/github.com/Azure/acs-engine/test/e2e/kubernetes/kubernetes_test.go:969
    should be able to resolve DNS across windows and linux deployments [It]
    /go/src/github.com/Azure/acs-engine/test/e2e/kubernetes/kubernetes_test.go:1015

    Expected
        <int>: 0
    to equal
        <int>: 1

    /go/src/github.com/Azure/acs-engine/test/e2e/kubernetes/kubernetes_test.go:1058
------------------------------

@PatrickLang
Copy link
Contributor Author

PatrickLang commented Nov 14, 2018

with Azure-CNI v1.0.12, k8s 1.12 and 1.13 failed the first attempt. Doing a second run to see if it's consistent or not.

STEP: Connecting to Linux from Windows deployment
2018/11/14 19:52:47 #### $ kubectl delete po -n default windows-2-windows-kubernetes-westus2-69593-1765 completed in 541.202353ms
2018/11/14 19:52:47 Ran command on 1 of 1 desired attempts with 1 successes

2018/11/14 19:52:47 $ kubectl run windows-2-linux-kubernetes-westus2-69593-73235 -n default --image microsoft/windowsservercore:1803 --image-pull-policy=IfNotPresent --restart=Never --overrides { "spec": {"nodeSelector":{"beta.kubernetes.io/os":"windows"}}} --command -- powershell iwr -UseBasicParsing -TimeoutSec 60 nginx-dns-kubernetes-westus2-69593-5913
2018/11/14 19:52:48 #### $ kubectl run windows-2-linux-kubernetes-westus2-69593-73235 -n default --image microsoft/windowsservercore:1803 --image-pull-policy=IfNotPresent --restart=Never --overrides { "spec": {"nodeSelector":{"beta.kubernetes.io/os":"windows"}}} --command -- powershell iwr -UseBasicParsing -TimeoutSec 60 nginx-dns-kubernetes-westus2-69593-5913 completed in 436.065763ms
.
2018/11/14 19:54:24 iwr : The remote name could not be resolved: 

'nginx-dns-kubernetes-westus2-69593-5913'

At line:1 char:1

+ iwr -UseBasicParsing -TimeoutSec 60 nginx-dns-kubernetes-westus2-6959 ...

+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt 

   pWebRequest) [Invoke-WebRequest], WebException

    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe 

   ll.Commands.InvokeWebRequestCommand

 


2018/11/14 19:54:24 $ kubectl delete po -n default windows-2-linux-kubernetes-westus2-69593-73235
2018/11/14 19:54:24 #### $ kubectl delete po -n default windows-2-linux-kubernetes-westus2-69593-73235 completed in 580.798093ms
2018/11/14 19:54:24 Ran command on 1 of 1 desired attempts with 0 successes


• Failure [164.412 seconds]

Failed on retry too. Going to test v1.0.12 across Windows+Linux next

@PatrickLang
Copy link
Contributor Author

The failed windows <-> Linux traffic issue could be #4188

@PatrickLang
Copy link
Contributor Author

I took the azure-cni version changes back out. Looks like this will need some more manual testing and maybe a new azure-cni version before it should merge.

@PatrickLang
Copy link
Contributor Author

Ok - just added the new Azure-CNI v1.0.14 for Windows. let's try again

@PatrickLang
Copy link
Contributor Author

This needs some manual investigation. All Windows tests failed - the nodes did not join the cluster.

@PatrickLang
Copy link
Contributor Author

Azure CNI 1.0.14 merged in #4297 , so retesting without that as part of the changelist

@jsturtevant
Copy link
Collaborator

I was able to get these tests to pass on my test cluster using cni v1.14. I am going to look at this a bit closer to see what is happening.

@PatrickLang
Copy link
Contributor Author

Windows Server 2019 merged to acs-engine master, so next attempt this will run on 2019

@daschott
Copy link

@PatrickLang does this mean the Dockerfile for kubletwin/pause (notice the typo btw) is also updated? I've found that the Dockerfile is still referencing microsoft registry which doesn't have images for 1809, when it should be updated to the new registry mcr.microsoft.com/windows/nanoserver:1809

@PatrickLang
Copy link
Contributor Author

PatrickLang commented Nov 30, 2018 via email

@CecileRobertMichon
Copy link
Contributor

Closing. @PatrickLang did this make it to https://github.com/Azure/aks-engine?

@ghost ghost removed the in progress label Jan 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants