Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

[WIP] Fix #1767 Custom VNET support for RS3 Windows #1810

Closed
wants to merge 3 commits into from

Conversation

JiangtianLi
Copy link
Contributor

What this PR does / why we need it:
Fix #1767 Custom VNET support for RS3 Windows

@ghost ghost assigned JiangtianLi Nov 21, 2017
@ghost ghost added the in progress label Nov 21, 2017
@jackfrancis
Copy link
Member

@JiangtianLi Could you add a deployment for vnet/kubernetesvnet-windows.json to the deployments array in test/acse-conf/acse-regression.json? This way we can continue to verify this cluster deployment pattern in periodic E2E regression tests.

@JiangtianLi
Copy link
Contributor Author

@jackfrancis Sure thing. Will update. Note, custom vnet for windows doesn't work fully yet due to issue with windows container network config. So this is still WIP.

@sharmasushant
Copy link
Contributor

@jackfrancis @JiangtianLi @tamilmani1989 We know that azure cni integration for windows is not finished. We will take up windows/azure-cni once we sort out any remaining issues with Linux.

@JiangtianLi
Copy link
Contributor Author

@sharmasushant Thanks. The issue I referred to is without azure network policy and cni, but the config in windows cni.

@JiangtianLi JiangtianLi changed the title Fix #1767 Custom VNET support for RS3 Windows [WIP] Fix #1767 Custom VNET support for RS3 Windows Dec 4, 2017
@jackfrancis
Copy link
Member

@JiangtianLi rebase should be relatively easy, it's due to this reorganization of the parts/ directory:

30ce15e

@JiangtianLi
Copy link
Contributor Author

@jackfrancis Thanks! Will rebase after fix the Windows subnet issue.

@feiskyer
Copy link
Member

feiskyer commented Dec 15, 2017

Will rebase after fix the Windows subnet issue.

@JiangtianLi Could you explain the problem in details?

@jay-stillman
Copy link

@JiangtianLi can you update us on the estimated time to get the checks completed? appears to be on hold currently

@JiangtianLi
Copy link
Contributor Author

JiangtianLi commented Dec 21, 2017

@feiskyer The root cause is that Windows container network subnet configured only supports */24 CIDR and custom vnet usually configures master and agent subnet in range that doesn't work with Windows container network limitation.
@jay-stillman This is a Windows platform issue and I am working with Windows networking team to address it. There is no ETA yet but I will update asap.

@jay-stillman
Copy link

@JiangtianLi with regards to the windows container network only supporting a /24 subnet, then surely this shouldnt be an issue based on the fact that when creating an agent pool with custom vnet we define each agent pool subnet? So if we have the windows agent pool configured to a /34 this will work? or is this a larger issue based on the master networking to nodes?

@JiangtianLi
Copy link
Contributor Author

@jay-stillman To clarify, in order for Windows node to communicate with master node or agent node, all the nodes need to be in the /24 subnet. For example, if the Windows node has IP address 10.240.0.4, the master node has IP address 10.240.255.5 and the other agent node has 10.240.0.5, the master node can't talk to Windows node while the agent node can. You need to configure master and agent vnet to be in the same 24 range.

@jay-stillman
Copy link

@JiangtianLi this is not correct..... the vnet cidr can be somethign such as 10.2.0.0/16 while the master subnet can be something like 10.2.10.0/24 and the linux pool can be something like 10.2.16.0/21 if running 5 agent nodes (each node requires its own /24 subnet) this config works with custom subnet... even with multiple agent pools.

So I am guessing you are actually referring to something else, ie related to windows networking.... however if all the subnets are in the vnet and the agent pool subnets are bound to the route table, then they can still route.....

Can you provide some clearer detail on this, as the above works and is how we for one run our various acs environments

@JiangtianLi
Copy link
Contributor Author

JiangtianLi commented Dec 21, 2017

@jay-stillman Sorry for the confusion. There is no limitation to configure vnet cidr in custom vnet on Azure. The limitation is in Windows container networking on Windows node, and that limits the connection/routing to Windows POD from another subnet, i.e, from master's node. Linux agent has no such issue. It is Windows container networking only.

@jay-stillman
Copy link

@JiangtianLi can you please provide any indication to when this will be resolved?

We currently are unable to use windows containers.... Is there any work around for this?

@JiangtianLi
Copy link
Contributor Author

/cc @madhanrm @dineshgovindasamy

@jay-stillman Sorry about the delay. We are working with networking team (cc-ed) for this. I don't see a straightforward workaround at this point but @madhanrm @dineshgovindasamy can chime in.

@lastcoolnameleft
Copy link

Given the limitation, I understand that it's currently not possible to deploy Windows Containers to an existing Vnet that are not properly sized; however, it is possible to deploy a Hybrid cluster into an existing Vnet that does match how Windows Container Networking expects the subnet to look?

I tried doing that with:

az network vnet create -g acs-engine-vnet-hybrid -n acs-engine-hybrid-vnet --address-prefixes 10.0.0.0/16 --subnet-name mgmt --subnet-prefix 10.0.0.0/24

And this as my template:

    "masterProfile": {
      "count": 1,
      "dnsPrefix": "acs-engine-vnet-hybrid",
      "vmSize": "Standard_A2_v2",
      "vnetSubnetId": "/subscriptions/<sub-id-removed>/resourceGroups/acs-engine-vnet-hybrid/providers/Microsoft.Network/virtualNetworks/acs-engine-hybrid-vnet/subnets/mgmt",
      "storageProfile" : "ManagedDisks",
      "firstConsecutiveStaticIP": "10.0.0.5",
      "vnetCidr": "10.0.0.0/16"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 1,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/<sub-id-removed>/resourceGroups/acs-engine-vnet-hybrid/providers/Microsoft.Network/virtualNetworks/acs-engine-hybrid-vnet/subnets/mgmt",
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "customNodeLabels": {
          "services": "linux"
        }
      },
      {
        "name": "agentpool2",
        "count": 1,
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/<sub-id-removed>/resourceGroups/acs-engine-vnet-hybrid/providers/Microsoft.Network/virtualNetworks/acs-engine-hybrid-vnet/subnets/mgmt",
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osType" : "Windows",
        "customNodeLabels": {
          "services": "windows"
        }
      }

But instead got the following error:

•100% ➜ acs-engine deploy --api-model kubernetes-hybrid.json --subscription-id $SUBSCRIPTION_ID --location southcentralus --resource-group acs-engine-vnet-hybrid
INFO[0014] Starting ARM Deployment (acs-engine-vnet-hybrid-1925927182). This will take some time...
INFO[0594] Finished ARM Deployment (acs-engine-vnet-hybrid-1925927182).
ERRO[0594] {"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"InvalidTemplate\",\r\n    \"message\": \"Unable to process template language expressions for resource '/subscriptions/<sub-id-removed>/resourceGroups/acs-engine-vnet-hybrid/providers/Microsoft.Compute/virtualMachines/18238k8s9010' at line '1' and column '39328'. 'The template variable 'subnet' is not found. Please see https://aka.ms/arm-template/#variables for usage details.'\"\r\n  }\r\n}"}]}}
FATA[0594] resources.DeploymentsClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details."

@JiangtianLi
Copy link
Contributor Author

@lastcoolnameleft You are using master HEAD, not this PR, right? The error is due to https://github.com/Azure/acs-engine/blob/master/parts/k8s/kuberneteswindowssetup.ps1#L57 and the same as #1767. It is because if your json defines custom vnet and variable subnet is not defined in that case https://github.com/Azure/acs-engine/blob/master/parts/k8s/kubernetesmastervars.t#L199

@lastcoolnameleft
Copy link

My apologies, I posted my comment in the PR, when I meant to post it in the issue.

That said, I believe that there's still an issue where the variable subnet is not defined by the output template and should be caught by the ACS-engine validator and fail the generation prior to trying to deploy.

@mboret
Copy link

mboret commented Mar 7, 2018

@JiangtianLi I've tested this PR with an hybrid cluster and a custom Vnet and I've been able to create a cluster. I'm using this template:

{
    "apiVersion": "vlabs",
    "properties": {
      "orchestratorProfile": {
        "orchestratorType": "Kubernetes",
        "orchestratorRelease": "1.8",
        "kubernetesConfig": {
          "networkPolicy": "none",
          "enableRbac": true
        }
      },
      "masterProfile": {
        "count": 2,
        "dnsPrefix": "DNSPREFIX",
        "vmSize": "Standard_D2_v2",
        "vnetSubnetId": "/subscriptions/SUBSCRIPTION/resourceGroups/RESOURCEGROUP/providers/Microsoft.Network/virtualNetworks/VNETNAME/subnets/SUBNETNAME",
        "firstConsecutiveStaticIP": "10.198.3.239",
        "OSDiskSizeGB": 50
      },
      "agentPoolProfiles": [
        {
          "name": "linuxpool",
          "count": 2,
          "vmSize": "Standard_D2_v2",
          "availabilityProfile": "AvailabilitySet",
          "OSDiskSizeGB": 50,
          "distro": "ubuntu",
          "vnetSubnetId": "/subscriptions/SUBSCRIPTION/resourceGroups/RESOURCEGROUP/providers/Microsoft.Network/virtualNetworks/VNETNAME/subnets/SUBNETNAME"          
        },
        {
          "name": "windowspool",
          "count": 2,
          "vmSize": "Standard_D2_v3",
          "availabilityProfile": "AvailabilitySet",
          "osType": "Windows",
          "OSDiskSizeGB": 100,
          "vnetSubnetId": "/subscriptions/SUBSCRIPTION/resourceGroups/RESOURCEGROUP/providers/Microsoft.Network/virtualNetworks/VNETNAME/subnets/SUBNETNAME"                
        }
      ],
      "windowsProfile": {
        "adminUsername": "ADMINUSERWINDOWS",
        "adminPassword": "WINDOWSPASSWORD"
      },
      "linuxProfile": {
        "adminUsername": "ADMINUSERLINUX",
        "ssh": {
          "publicKeys": [
            {
              "keyData": "LINUXPUBLICKEY"
            }
          ]
        }
      },
      "servicePrincipalProfile": {
        "clientId": "SPNAME",
        "secret": "SPPASSWORD"
      }
    }
  }

After the cluster creation I've updated the vnet subnets to add the k8s cluster route table.

Few points:

  1. The most recent k8s version that I've been able to deploy was the 1.8.3
  2. I've issue with the deployment of image on the windows node but seems related to the RS3 issue because I'm able to deploy microsoft/windowsservercore:1709 image.
  3. I cannot see the Windows node resource consumption(seems fixed with the k8s 1.9.3)
  4. The Windows node description show: "Container Runtime Version: docker://Unknown " (???) but I'm able to deploy container(ex: microsoft/windowsservercore:1709) on this node.

@mboret
Copy link

mboret commented Apr 3, 2018

@JiangtianLi Any update about this issue?

@JiangtianLi
Copy link
Contributor Author

@mboret acs-engine has switched to use Azure CNI as default for Windows cluster and custom VNET is supported with Azure CNI. With kubenet, custom VNET is still under investigation.

@mboret
Copy link

mboret commented Apr 5, 2018

@JiangtianLi This still doesn't work for me with another acs-engine version and an hybrid cluster with custom VNET. Even if I'm using Azure CNI(I ran into the same issue: #2565 with acs-engine v0.15.0 and kubernetes 1.10 and the same defintion I've post previously).

Summary, I'm able to deploy an hybrid cluster with custom Vnet, only if I'm using this PR with kubenet.

@JiangtianLi
Copy link
Contributor Author

@mboret Thanks for reporting to us. If you uses Azure CNI, which is default in latest master, you wouldn't need this PR for custom VNET. If you use kubenet (networkpolicy; none), then you need this PR but custom VNET doesn't completely work yet.

If Azure CNI doens't work for custom VNET in hybrid cluster, could you please file a separate issue?

@mboret
Copy link

mboret commented Apr 6, 2018

For sure. Created: #2612

@ghost ghost removed the in progress label May 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants