Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Updating Azure-CNI to v1.0.11 for Windows #3660

Merged
merged 3 commits into from
Aug 11, 2018

Conversation

PatrickLang
Copy link
Contributor

Resolves #3389 / #3447 / #3153

Includes two important Azure-CNI changes for Windows
Fix for unparseable error returned by CNI (#195)
Fix for IP Address leak in HNS failure scenario in windows CNI (#218)
Full notes at https://github.com/Azure/azure-container-networking/releases

Resolves Azure#3389 / Azure#3447 / Azure#3153

Includes two important Azure-CNI changes for Windows
  Fix for unparseable error returned by CNI (Azure#195)
  Fix for IP Address leak in HNS failure scenario in windows CNI (Azure#218)
Full notes at https://github.com/Azure/azure-container-networking/releases
@ghost ghost assigned PatrickLang Aug 10, 2018
@ghost ghost added the in progress label Aug 10, 2018
@jackfrancis jackfrancis changed the title Updating Azure-CNI to v1.0.11 Updating Azure-CNI to v1.0.11 for Windows Aug 10, 2018
@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #3660 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3660      +/-   ##
==========================================
- Coverage   55.69%   55.68%   -0.01%     
==========================================
  Files         108      108              
  Lines       16023    16023              
==========================================
- Hits         8924     8923       -1     
- Misses       6329     6334       +5     
+ Partials      770      766       -4

@acs-bot acs-bot added size/S and removed size/XS labels Aug 11, 2018
@jackfrancis
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 11, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, PatrickLang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit a841cca into Azure:master Aug 11, 2018
@ghost ghost removed the in progress label Aug 11, 2018
PatrickLang added a commit to PatrickLang/acs-engine that referenced this pull request Aug 13, 2018
PatrickLang added a commit to PatrickLang/acs-engine that referenced this pull request Aug 13, 2018
@PatrickLang
Copy link
Contributor Author

PatrickLang commented Aug 13, 2018

kubetest results

https://console.cloud.google.com/storage/browser/e2e-win-acs-engine/acs-engine_Azure/3660/pull-acs-engine/9af052c0-9cfb-11e8-8a41-0ae181cc06cf/1028074231952314368/?project=win-e2e-test&pli=1

Ran 155 of 1012 Specs in 6644.591 seconds
FAIL! -- 128 Passed | 27 Failed | 0 Pending | 857 Skipped

CecileRobertMichon pushed a commit to CecileRobertMichon/acs-engine that referenced this pull request Aug 13, 2018
@PatrickLang PatrickLang deleted the azurecni-1.0.11 branch August 14, 2018 15:41
@carlpett
Copy link
Contributor

@jackfrancis @PatrickLang When will there be a matching version bump on Linux? We're running into ipam panics (azure-vnet#176) on our Linux-only clusters, and have been instructed by support to upgrade to 1.0.11. We're about to stand up a few new clusters soon, and it would be helpful not having to manually bump them too (not to mention scaling existing clusters need to do this too)

@jackfrancis
Copy link
Member

Hi @carlpett, we'll get this into master today: #3722

Can you outline some repro behaviors to help us induce these panics? Would love to incorporate into our E2E tests.

@carlpett
Copy link
Contributor

Ah, great!
Sadly, no. It just started happening one day, first one then two nodes. Somehow IPs weren't released properly, I would guess? This is what we had in the vnet-logs:

2018/08/13 12:53:23 [cni-net] Plugin stopped.
2018/08/13 12:53:26 [cni-net] Plugin azure-vnet version v1.0.4-1-gf0f090e.
2018/08/13 12:53:26 [cni-net] Running on Linux version 4.13.0-1016-azure (buildd@lgw01-amd64-050) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)) #19-Ubuntu SMP Thu May 3 17:29:51 UTC 2018
2018/08/13 12:53:26 [net] Network interface: {Index:1 MTU:65536 Name:lo HardwareAddr: Flags:up|loopback} with IP addresses: [127.0.0.1/8 ::1/128]
2018/08/13 12:53:26 [net] Network interface: {Index:2 MTU:1500 Name:eth0 HardwareAddr:00:0d:3a:38:cc:6f Flags:up|broadcast} with IP addresses: [fe80::20d:3aff:fe38:cc6f/64]
2018/08/13 12:53:26 [net] Network interface: {Index:3 MTU:1500 Name:docker0 HardwareAddr:02:42:9b:d9:3e:e3 Flags:up|broadcast|multicast} with IP addresses: [172.17.0.1/16 fe80::42:9bff:fed9:3ee3/64]
2018/08/13 12:53:26 [net] Network interface: {Index:6 MTU:1500 Name:azure0 HardwareAddr:00:0d:3a:38:cc:6f Flags:up|broadcast|multicast} with IP addresses: [10.240.0.4/12 fe80::20d:3aff:fe38:cc6f/64]
2018/08/13 12:53:26 [net] Network interface: {Index:7 MTU:1500 Name:azveth95fac9b HardwareAddr:ba:24:7e:3e:72:b8 Flags:up|broadcast} with IP addresses: [fe80::b824:7eff:fe3e:72b8/64]
2018/08/13 12:53:26 [net] Network interface: {Index:9 MTU:1500 Name:azvethaff2be7 HardwareAddr:7a:12:9d:ec:18:4a Flags:up|broadcast} with IP addresses: [fe80::7812:9dff:feec:184a/64]
2018/08/13 12:53:26 [net] Network interface: {Index:11 MTU:1500 Name:azvethd21ec9d HardwareAddr:3e:30:f0:87:45:99 Flags:up|broadcast} with IP addresses: [fe80::3c30:f0ff:fe87:4599/64]
2018/08/13 12:53:26 [net] Network interface: {Index:13 MTU:1500 Name:azvetheadec43 HardwareAddr:0e:b8:01:21:29:cb Flags:up|broadcast} with IP addresses: [fe80::cb8:1ff:fe21:29cb/64]
2018/08/13 12:53:26 [net] Network interface: {Index:15 MTU:1500 Name:azveth78bdf7a HardwareAddr:2a:ab:64:c8:ab:81 Flags:up|broadcast} with IP addresses: [fe80::28ab:64ff:fec8:ab81/64]
2018/08/13 12:53:26 [net] Network interface: {Index:17 MTU:1500 Name:azvethba6d9e2 HardwareAddr:46:9f:30:6d:4c:53 Flags:up|broadcast} with IP addresses: [fe80::449f:30ff:fe6d:4c53/64]
2018/08/13 12:53:26 [net] Network interface: {Index:1143 MTU:1500 Name:azvethf42f8b8 HardwareAddr:be:c5:f6:ae:76:5a Flags:up|broadcast} with IP addresses: [fe80::bcc5:f6ff:feae:765a/64]
2018/08/13 12:53:26 [net] Network interface: {Index:1145 MTU:1500 Name:azveth579d9ed HardwareAddr:16:f5:d1:74:8b:b3 Flags:up|broadcast} with IP addresses: [fe80::14f5:d1ff:fe74:8bb3/64]
2018/08/13 12:53:26 [net] Network interface: {Index:1147 MTU:1500 Name:azveth23ac8ad HardwareAddr:ca:a9:23:40:a6:7a Flags:up|broadcast} with IP addresses: [fe80::c8a9:23ff:fe40:a67a/64]
2018/08/13 12:53:26 [net] Network interface: {Index:173 MTU:1500 Name:azveth0315cac HardwareAddr:96:88:83:55:79:db Flags:up|broadcast} with IP addresses: [fe80::9488:83ff:fe55:79db/64]
2018/08/13 12:53:26 [net] Network interface: {Index:723 MTU:1500 Name:azvethb71f7e7 HardwareAddr:ea:e0:1b:fc:8f:ce Flags:up|broadcast} with IP addresses: [fe80::e8e0:1bff:fefc:8fce/64]
2018/08/13 12:53:26 [net] Network interface: {Index:731 MTU:1500 Name:azveth8c7d9cb HardwareAddr:8a:77:03:03:c4:5d Flags:up|broadcast} with IP addresses: [fe80::8877:3ff:fe03:c45d/64]
2018/08/13 12:53:26 [net] Network interface: {Index:489 MTU:1500 Name:azveth0df63a7 HardwareAddr:e2:06:ed:9b:39:25 Flags:up|broadcast} with IP addresses: [fe80::e006:edff:fe9b:3925/64]
2018/08/13 12:53:26 [net] Store timestamp is 2018-08-13 12:03:27.421452313 +0000 UTC.
2018/08/13 12:53:26 [net] Restored state, &{Version:v1.0.4-1-gf0f090e TimeStamp:2018-08-13 12:03:27.424541211 +0000 UTC ExternalInterfaces:map[eth0:0xc42015e300] store:0xc4200197d0 Mutex:{state:0 sema:0}}
2018/08/13 12:53:26 [cni-net] Plugin started.
2018/08/13 12:53:26 [cni-net] Processing ADD command with args {ContainerID:20a71ccbc5a3935994de39b3eb71685dc849a0502eb8a23ae1ca3df5ef6a596a Netns:/proc/85520/ns/net IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=device;K8S_POD_NAME=curl;K8S_POD_INFRA_CONTAINER_ID=20a71ccbc5a3935994de39b3eb71685dc849a0502eb8a23ae1ca3df5ef6a596a Path:/opt/azure-vnet/bin:/opt/cni/bin}.
2018/08/13 12:53:26 [cni-net] Read network configuration &{CNIVersion:0.3.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet: Address: QueryInterval:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} AdditionalArgs:[]}.
2018/08/13 12:53:26 [cni-net] Found network azure with subnet 10.240.0.0/12.
2018/08/13 12:53:26 [cni] Calling plugin azure-vnet-ipam ADD nwCfg:&{CNIVersion:0.3.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet:10.240.0.0/12 Address: QueryInterval:} DNS:{Nameservers:[] Domain: Search:[] Options:[]} AdditionalArgs:[]}.
2018/08/13 12:53:26 [cni] Plugin azure-vnet-ipam returned result:<nil>, err:Failed to allocate address: No available addresses.
2018/08/13 12:53:26 [azure-vnet] Failed to allocate address: Failed to delegate: Failed to allocate address: No available addresses.
2018/08/13 12:53:26 [cni] Recovered panic: runtime error: invalid memory address or nil pointer dereference goroutine 1 [running]:
github.com/Azure/azure-container-networking/cni.(*Plugin).Execute.func1(0xc42016dec0)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:94 +0xbc
panic(0x69b3c0, 0x83ed90)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/Azure/azure-container-networking/cni/network.(*netPlugin).Add.func1(0xc42016d910, 0xc420212770, 0xc42016d8d0, 0xc42016d8f0, 0xc420176fc0)
        /go/src/github.com/Azure/azure-container-networking/cni/network/network.go:150 +0x9f
github.com/Azure/azure-container-networking/cni/network.(*netPlugin).Add(0xc420176fc0, 0xc420212770, 0x818c20, 0xc420240bd0)
        /go/src/github.com/Azure/azure-container-networking/cni/network/network.go:285 +0x13b7
github.com/Azure/azure-container-networking/cni.(PluginApi).Add-fm(0xc420212770, 0xc42021e456, 0x5)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:112 +0x39
github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc420155800, 0xc420212770, 0x81b220, 0xc420151e30, 0xc42016de80, 0x0, 0xc420154000)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:168 +0x19f
github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc420155800, 0xc42016de80, 0xc42016de68, 0x81b220, 0xc420151e30, 0xc42016de38)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:199 +0x384
github.com/containernetworking/cni/pkg/skel.PluginMainWithError(0xc42016de80, 0xc42016de68, 0x81b220, 0xc420151e30, 0xc420151e30)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:236 +0xed
github.com/Azure/azure-container-networking/cni.(*Plugin).Execute(0xc42000e108, 0x81b0e0, 0xc420176fc0, 0x0, 0x0)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:112 +0x127
main.main()
        /go/src/github.com/Azure/azure-container-networking/cni/network/plugin/main.go:93 +0x4d4

2018/08/13 12:53:26 Failed to execute network plugin, err:runtime error: invalid memory address or nil pointer dereference; goroutine 1 [running]:
github.com/Azure/azure-container-networking/cni.(*Plugin).Execute.func1(0xc42016dec0)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:94 +0xbc
panic(0x69b3c0, 0x83ed90)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/Azure/azure-container-networking/cni/network.(*netPlugin).Add.func1(0xc42016d910, 0xc420212770, 0xc42016d8d0, 0xc42016d8f0, 0xc420176fc0)
        /go/src/github.com/Azure/azure-container-networking/cni/network/network.go:150 +0x9f
github.com/Azure/azure-container-networking/cni/network.(*netPlugin).Add(0xc420176fc0, 0xc420212770, 0x818c20, 0xc420240bd0)
        /go/src/github.com/Azure/azure-container-networking/cni/network/network.go:285 +0x13b7
github.com/Azure/azure-container-networking/cni.(PluginApi).Add-fm(0xc420212770, 0xc42021e456, 0x5)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:112 +0x39
github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc420155800, 0xc420212770, 0x81b220, 0xc420151e30, 0xc42016de80, 0x0, 0xc420154000)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:168 +0x19f
github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc420155800, 0xc42016de80, 0xc42016de68, 0x81b220, 0xc420151e30, 0xc42016de38)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:199 +0x384
github.com/containernetworking/cni/pkg/skel.PluginMainWithError(0xc42016de80, 0xc42016de68, 0x81b220, 0xc420151e30, 0xc420151e30)
        /go/src/github.com/containernetworking/cni/pkg/skel/skel.go:236 +0xed
github.com/Azure/azure-container-networking/cni.(*Plugin).Execute(0xc42000e108, 0x81b0e0, 0xc420176fc0, 0x0, 0x0)
        /go/src/github.com/Azure/azure-container-networking/cni/plugin.go:112 +0x127
main.main()
        /go/src/github.com/Azure/azure-container-networking/cni/network/plugin/main.go:93 +0x4d4
.
2018/08/13 12:53:26 Report plugin error
2018/08/13 12:53:28 SendReport failed due to [Azure CNI] HTTP Post returned statuscode 500
2018/08/13 12:53:28 [cni-net] Plugin stopped.

(And then this was looping over and over)

The cluster had been running fine for ~77d when it started happening, if there is any significant boundary around there?

@sharmasushant
Copy link
Contributor

@carlpett
2018/08/13 12:53:26 [cni-net] Plugin azure-vnet version v1.0.4-1-gf0f090e.

You would need to upgrade to a newer version. The above nullref issue has already been fixed. The above happens when you run out of IPs on the node.

The fix will return correct error code to runtime, so that it schedules the POD on a different node.

@carlpett
Copy link
Contributor

@sharmasushant Yes, as we've already discussed in our support case thread :)
The reason I commented here was that the 1.0.11 version which I upgrade to wasn't being installed by default, so I was seeing a need to continue this manual process everything we set up a new cluster, or scaled, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants