Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

chore: Install Windows Server 2019 10C updates in Windows VHD #3956

Merged
merged 1 commit into from
Nov 7, 2020

Conversation

marosset
Copy link
Contributor

@marosset marosset commented Oct 21, 2020

Reason for Change:

Issue Fixed:

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@acs-bot acs-bot added size/M and removed size/S labels Oct 26, 2020
@jsturtevant
Copy link
Contributor

@daschott We are seeing consistent failures in our service to service tests for August optional patch (10.0.17763.1554): https://dev.azure.com/AzureContainerUpstream/Kubernetes/_build/results?buildId=13542&view=logs&j=49d03c1b-89df-5eb4-98f6-9a6bbfef7d4e&t=aa477d3b-e842-5070-3943-3b7afd505a42&l=2954

If I run the test by itself as in export GINKGO_FOCUS=${GINKGO_FOCUS:-"should be able to resolve DNS across windows and linux deployments"} I do not see the failure.

We have been running this test for a couple years and this test passes regularly in our other e2e tests on August patches (10.0.17763.1397). Are you getting any reports or know of any issues? What are next steps to debug this?

@AbelHu fyi

AbelHu
AbelHu previously approved these changes Oct 30, 2020
@jsturtevant
Copy link
Contributor

Was able to reproduce on local deployment and grab logs.

Connectivity via IP addresses are ok. CNI configuration looks ok as well. I am seeing hns Loadbalancers programed with the DNS entry.

But the following is in kubeproxy:

E1030 21:55:03.060596    6812 proxier.go:1142] Policy creation failed: hcnCreateLoadBalancer failed in Win32: The specified port already exists. (0x803b0013) {"Success":false,"Error":"The specified port already exists. ","ErrorCode":2151350291}

The ports don't look exhausted:

Rough estimation of the ephemeral port availability: up to 760 allocations of 64 contiguous TCP ports may be possible Successfully reserved 10 ranges of 64 ports

From within a pod:

PS C:\> Resolve-DnsName kubernetes.default.svc.cluster.local -QuickTimeout
Resolve-DnsName : kubernetes.default.svc.cluster.local : This operation returned because the timeout period expired
At line:1 char:1
+ Resolve-DnsName kubernetes.default.svc.cluster.local -QuickTimeout
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationTimeout: (kubernetes.default.svc.cluster.local:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_TIMEOUT,Microsoft.DnsClient.Commands.ResolveDnsName
 
PS C:\> Resolve-DnsName google.com -QuickTimeout                          
Resolve-DnsName : google.com : This operation returned because the timeout period expired
At line:1 char:1
+ Resolve-DnsName google.com -QuickTimeout
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationTimeout: (google.com:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_TIMEOUT,Microsoft.DnsClient.Commands.ResolveDnsName
 
PS C:\> Resolve-DnsName iis-dns-kubernetes-westus2-90206-11480 -server 10.240.0.7 -QuickTimeout
Resolve-DnsName : iis-dns-kubernetes-westus2-90206-11480 : DNS name does not exist
At line:1 char:1
+ Resolve-DnsName iis-dns-kubernetes-westus2-90206-11480 -server 10.240 ...        
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (iis-dns-kubernetes-westus2-90206-11480:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : DNS_ERROR_RCODE_NAME_ERROR,Microsoft.DnsClient.Commands.ResolveDnsNam

Going directly to the DNS pod works for the api server:

PS C:\> Resolve-DnsName kubernetes.default.svc.cluster.local -server 10.240.0.7 -QuickTimeout

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
kubernetes.default.svc.cluster.local           A      5     Answer     10.0.0.1

@codecov
Copy link

codecov bot commented Nov 2, 2020

Codecov Report

Merging #3956 (7db4ce0) into master (d400c00) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3956   +/-   ##
=======================================
  Coverage   73.69%   73.69%           
=======================================
  Files         147      147           
  Lines       23164    23164           
=======================================
  Hits        17070    17070           
  Misses       4979     4979           
  Partials     1115     1115           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d400c00...7db4ce0. Read the comment docs.

@jsturtevant
Copy link
Contributor

jsturtevant commented Nov 3, 2020

in 10.0.17763.1397 (August) the following works:

PS C:\k> $hnsNetwork = Get-HnsNetwork | Where-Object Name -EQ azure
PS C:\k> Remove-HnsNetwork $hnsNetwork  
PS C:\k> Get-HnsPolicyList | Remove-HnsPolicyList

In 10.0.17763.1554 (October) if the network doesn't exist you can't delete the vfp policies if network was removed first:

PS C:\k> $hnsNetwork = Get-HnsNetwork | Where-Object Name -EQ azure
PS C:\k> Remove-HnsNetwork $hnsNetwork  
PS C:\k> Get-HnsPolicyList | Remove-HnsPolicyList
Invoke-HnsRequest : @{Error=The network was not found. ; ErrorCode=2151350273; Success=False}
At C:\windows\system32\WindowsPowerShell\v1.0\Modules\HostNetworkingService\HostNetworkingService.psm1:130 char:30
+ ... | foreach { Invoke-HnsRequest -Method DELETE -Type  policylists -Id $ ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Invoke-HnsRequest

To unblock this, I have re-ordered calls

@@ -49,6 +49,11 @@ if ($global:EnableHostsConfigAgent) {
# Perform cleanup
#

Write-Log "Cleaning up persisted HNS policy lists"
# Workaround for https://github.com/kubernetes/kubernetes/pull/68923 in < 1.14,
# and https://github.com/kubernetes/kubernetes/pull/78612 for <= 1.15
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this "workaround" is still lets update the comments again to point the behavior you found in kube-proxy.

@marosset
Copy link
Contributor Author

marosset commented Nov 3, 2020

Changes look good to me.
Can we update the comments per my suggestion above?

@jsturtevant
Copy link
Contributor

Looks like it still failed. These steps work when I manually ran them. Looking into it

@jsturtevant
Copy link
Contributor

I was able to get this to pass with the changes in #4002. Once that merges I will rebase these changes

@marosset marosset merged commit b8b1291 into Azure:master Nov 7, 2020
@marosset marosset deleted the win-vhd-10C branch November 7, 2020 00:26
@acs-bot
Copy link

acs-bot commented Nov 7, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsturtevant, marosset

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jsturtevant,marosset]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jsturtevant
Copy link
Contributor

FYI @AbelHu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants