Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

How to deploy Windows Server 2016 agents on Kubernetes 1.9 ? #2094

Closed
ghost opened this issue Jan 18, 2018 · 13 comments
Closed

How to deploy Windows Server 2016 agents on Kubernetes 1.9 ? #2094

ghost opened this issue Jan 18, 2018 · 13 comments
Labels

Comments

@ghost
Copy link

ghost commented Jan 18, 2018

Is this a request for help?:
YES

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE I guess, it looks like a regression since it works for Kubernetes 1.8.x

What version of acs-engine?:
Version: v0.12.0
GitCommit: 1d33229
GitTreeState: clean

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.9.1

What happened:
When using this Kubernetes 1.8 acs-engine orchestratorProfile, I end up with Windows Server 2016 agents having a 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534) kernel.

"orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.8",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"
      }

But when I use this Kubernetes 1.9 acs-engine orchestratorProfile, I end up with Windows Server 2012 (!) agents having a 6.2.09200.192 kernel.

    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.9",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"

What you expected to happen:
Getting Windows Server 2016 agents on Kubernetes 1.9 like I did with Kubernetes 1.8.

How to reproduce it (as minimally and precisely as possible):
1. Deploy this acs-engine template:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.8",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"
      }
    },

    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-sl31061",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "linuxpool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
	      "storageProfile": "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
      },
      {
        "name": "windowspool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
        "storageProfile": "ManagedDisks",
        "OSDiskSizeGB": 100,
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
      }
    ],

    "linuxProfile": {
      "adminUsername": "[REDACTED]",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa [REDACTED]"
          }
        ]
      }
    },
    "windowsProfile": {
      "adminUsername": "[REDACTED]",
      "adminPassword": "[REDACTED]"
    },
    "servicePrincipalProfile": {
      "clientId": "[REDACTED]",
      "secret": "[REDACTED]"
    }
  }
}

2. You get Windows Server 2016 agents:

$ kubectl get nodes -o wide
NAME                        STATUS    ROLES     AGE       VERSION                          EXTERNAL-IP   OS-IMAGE                      KERNEL-VERSION                                           CONTAINER-RUNTIME
37061k8s9010                Ready     <none>    16m       v1.8.6-21+aad85df0a0abed-dirty   <none>        <unknown>                     10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)   docker://17.6.2
k8s-linuxpool1-37061256-0   Ready     agent     20m       v1.8.6                           <none>        Debian GNU/Linux 8 (jessie)   4.11.0-1016-azure                                        docker://17.5.0
k8s-master-37061256-0       Ready     master    20m       v1.8.6                           <none>        Debian GNU/Linux 8 (jessie)   4.11.0-1016-azure                                        docker://17.5.0

3. Deploy this acs-engine template:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.9",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"
      }
    },

    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-sl31062",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "linuxpool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
	      "storageProfile": "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Linux"
      },
      {
        "name": "windowspool1",
        "count": 1,
        "vmSize": "Standard_D2s_v3",
        "storageProfile": "ManagedDisks",
        "OSDiskSizeGB": 100,
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
      }
    ],

    "linuxProfile": {
      "adminUsername": "[REDACTED]",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa [REDACTED]"
          }
        ]
      }
    },
    "windowsProfile": {
      "adminUsername": "[REDACTED]",
      "adminPassword": "[REDACTED]"
    },
    "servicePrincipalProfile": {
      "clientId": "[REDACTED]",
      "secret": "[REDACTED]"
    }
  }
}

4. You get Windows Server 2012 agents:

$ kubectl get nodes -o wide
NAME                        STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                    KERNEL-VERSION   CONTAINER-RUNTIME
30771k8s9010                Ready     <none>    23s       v1.9.1    <none>        Windows Server Datacenter   6.2.09200.192       docker://17.6.2
k8s-linuxpool1-30771201-0   NotReady   agent     4m        v1.9.1    <none>    Debian GNU/Linux 9 (stretch)   4.11.0-1016-azure   docker://17.5.0
k8s-master-30771201-0       NotReady   master    4m        v1.9.1    <none>    Debian GNU/Linux 9 (stretch)   4.11.0-1016-azure   docker://17.5.0

Anything else we need to know:

@JiangtianLi
Copy link
Contributor

@odauby Can you verify if you have the following in your azuredeploy.json:

    "agentWindowsOffer": "WindowsServerSemiAnnual",
    "agentWindowsPublisher": "MicrosoftWindowsServer",
    "agentWindowsSku": "Datacenter-Core-1709-with-Containers-smalldisk",
    "agentWindowsVersion": "[parameters('agentWindowsVersion')]",

    "agentWindowsVersion": {
      "defaultValue": "latest",
      "metadata": {
        "description": "Version of the Windows Server 2016 OS image to use for the agent virtual machines."
      },
      "type": "string"
    },

and if you have agentWindowsVersion in your azuredeploy.parameters.json?

@JiangtianLi
Copy link
Contributor

JiangtianLi commented Jan 19, 2018

@odauby It is a bug in kubelet that gets the incorrect windows kernel version.

@feiskyer @taylorb-microsoft It seems that kubernetes/kubernetes#55143 doesn't work on windows 10. From https://msdn.microsoft.com/en-us/library/windows/desktop/ms724439(v=vs.85).aspx, GetVersion requires app to be manifested. Applications not manifested for Windows 8.1 or Windows 10 will return the Windows 8 OS version value (6.2). I tried a toy go program using GetVersion on Windows 10 and it returns 0x23f00206.

Given the limited win32 functions in golang, maybe we can read from registry directly?

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion
    CurrentBuildNumber    REG_SZ    16299
    CurrentMajorVersionNumber    REG_DWORD    0xa
    CurrentMinorVersionNumber    REG_DWORD    0x0

@feiskyer
Copy link
Member

@JiangtianLi Sorry, didn't notice this problem. Will file a PR to fix this.

@jsturtevant
Copy link
Collaborator

Is there a workaround for this?

@JiangtianLi
Copy link
Contributor

@jsturtevant The issue only affects the windows kernel version reported by kubelet. Windows node still has the correct OS image and Windows container should be able to run as is. Besides, @feiskyer already has the PR kubernetes/kubernetes#58498.

@jsturtevant
Copy link
Collaborator

I missed that it was only the reported value. I was able to confirm the Windows node is the correct OS image. Thanks for the clarification!

@masroorhasan
Copy link

masroorhasan commented Jan 31, 2018

I'm having a similar problem with the windows kernel deployed with acs-engine. Deployed cluster with Kubernetes orchestrator for version 1.7.

Here is the template I used:

"apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.7",
      "kubernetesConfig": {
        "dockerEngineVersion": "17.05.*"
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "---",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "windowspool2",
        "count": 2,
        "vmSize": "Standard_D2s_v3",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows"
      }
    ],

For both Standard_D2s_v3 and Standard_DS2_v2 I would expect Windows Server 2016 Datacenter but its being deployed with Windows Server Datacenter which seems to be having trouble windows containers on it.

cc @JiangtianLi @jsturtevant

@JiangtianLi
Copy link
Contributor

JiangtianLi commented Jan 31, 2018

@masroorhasan There are two issues here. 1. kubelet on windows reports the incorrect windows version and there is fix in upstream: kubernetes/kubernetes#58498. 2. Windows version actually deployed should be RS3 Windows, a.k.a. 1709. Windows containers has compatibility requirement: https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility so if you have Windows Sever 2016 container (RS1), it would not run on RS3 Windows and you need to rebuild the container image.

@masroorhasan
Copy link

masroorhasan commented Jan 31, 2018

Thanks for the quick response @JiangtianLi.

The kernerl version being reported is actually from docker on the host machine. The deployment with acs-engine produces agent machine profiles as:

Default Isolation: process
Kernel Version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
Operating System: Windows Server Datacenter
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8GiB

I've previously used ACS (from portal) to deploy kubernetes cluster and the profile of the windows machines were the following:

Default Isolation: process
Kernel Version: 10.0 14393 (14393.1715.amd64fre.rs1_release_inmarket.170906-1810)
Operating System: Windows Server 2016 Datacenter
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 7GiB

For your second point - I ran two windows containers to test it out:

  1. microsoft/iis:windowsservercore-1709 - this one ended up running fine but no outbound connection from within the container (separate issue). I was looking up nslookup (dns) for windows services fails after stopping vms and restarting a hybrid kubernetes cluster in azure #1903 and have tried Resolve-DnsName, Test-NetConnection 10.0.0.10 -Port 80, etc all timing out eventually.

  2. An image that derives microsoft/dotnet-framework:4.7 gets the following error. So would the assumption of dotnet-framework:4.7 being one of the RS3 compliant images be correct here?

Error: failed to start container "btscriptrunner-demo": Error response from daemon: {"message":"container btscriptrunner-demo encountered an error during CreateContainer: failure in a Windows system call: The operating system of the container does not match the operating system of the host. (0xc0370101) extra info: {\"SystemType\":\"Container\",\"Name\":\"btscriptrunner-demo\",\"Owner\":\"docker\",\"IsDummy\":false,\"VolumePath\":\"\\\\\\\\?\\\\Volume{3cb5e93f-46b3-4d61-ba64-c7aaba21c821}\",\"IgnoreFlushesDuringBoot\":true,\"LayerFolderPath\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\btscriptrunner-demo\",\"Layers\":[{\"ID\":\"39e7853a-3a44-561b-8201-8aac86732383\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\b88f091fc925249111991e370a0dfb415ef4a2e7f43870d3ea1557b0494fcd83\"},{\"ID\":\"d34130a6-c740-53e0-8a8f-345bad24c235\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\bcf633ece53230ea7a528b397a9540a8b63c1f202deb0f9b80446bf55e6b360b\"},{\"ID\":\"b1cb3ccb-d75d-5f53-adf9-608f8a006bbe\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\79d53e5a713fd2ff17591f64521b33f6fe7fa729bd38d6eee7fc616beaea20cb\"},{\"ID\":\"9a8b1f3d-c3b5-53aa-80d6-a461829d36c8\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\f9307ef99284378b513dd7ddfe91405565f11d9de15415d067d224e0a1f54434\"},{\"ID\":\"f4afcc79-02f8-5b7b-96e5-1c8d3e8a628c\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\2e10ddfbf3f7a3bcd9e9422d94e4a9eaa92aaa00bc720023b888cab95b616c32\"},{\"ID\":\"59f60e70-3e14-526b-8f3d-4c41216eae5c\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\cd3b476d14f55089fd0d27e924376adc849cf37963d9ecea584753ab013d2286\"},{\"ID\":\"891d5307-c9d3-5e44-a901-68506a368d6a\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\2b6e79eb72b056199fd2bf505befe4ed4e85a25d930a585021a5465e929bdedd\"},{\"ID\":\"22fd1eb4-e1dd-5050-9c95-8cc524d1bff4\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\ba2f4e9c2a803d5a43034df49000b7a0c4c533db0a7038c46abde2bd8557f54e\"},{\"ID\":\"9ce0bde7-7ed2-5801-bc54-a67f7b696f8a\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\dca42b46b498fc98350b23a61f0e4eb839ad3f1f9b476b0464810208aaba8e16\"},{\"ID\":\"a4ba00fd-86e2-5d3f-8ffb-4c04d3efb4e4\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\72aacb547890e612cd9d06e9fc377e07dbe78d7f87837ff17069e3bacc2745e2\"},{\"ID\":\"be9ffd70-9684-5e65-8611-9315bce0d9f2\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\55c7d5a0419af834f2026093fc7d78f700fe789f1696821a8755399897ca4667\"},{\"ID\":\"30d05282-95ba-5616-b9cb-445bffd8b86e\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\67811f51d6f5d8bc9c2580f900f1cb10be622e3b21416c7efa234e52d8de7e75\"},{\"ID\":\"e7a6d21d-bafb-5123-82ea-1e776a79bdaa\",\"Path\":\"C:\\\\ProgramData\\\\docker\\\\windowsfilter\\\\1698be012c25e8b500f326af909cb41f3b1eaa1e8b716f164ae24a12ed581246\"}],\"HostName\":\"btscriptrunner-demo\",\"MappedDirectories\":[{\"HostPath\":\"c:\\\\var\\\\lib\\\\kubelet\\\\pods\\\\7bd79d66-06b2-11e8-aa2a-000d3a2927de\\\\volumes\\\\kubernetes.io~secret\\\\default-token-hxpts\",\"ContainerPath\":\"c:\\\\var\\\\run\\\\secrets\\\\kubernetes.io\\\\serviceaccount\",\"ReadOnly\":true,\"BandwidthMaximum\":0,\"IOPSMaximum\":0}],\"HvPartition\":false,\"EndpointList\":null,\"NetworkSharedContainerName\":\"69cf0b7d632f2a4ad33b7a9e402f6e7e7e657108d91f9d7458c9c82599f08910\",\"Servicing\":false,\"AllowUnqualifiedDNSQuery\":false}"}
Error syncing pod

@JiangtianLi
Copy link
Contributor

@masroorhasan For the first point, acs-engine is using RS3 Windows while some regions in ACS still use RS1 Windows. From kernel version (Kernel Version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534), it is 1709 so you need to use container image with 1709 tag.

For the second point, the error is "The operating system of the container does not match the operating system of the host", which means microsoft/dotnet-framework:4.7 is not RS3 Windows based and therefore is not compatible with RS3 Windows cluster.

@masroorhasan
Copy link

Thanks @JiangtianLi - that makes sense. FYI, I switched to using acs-engine version 0.8 and to westus2 region - that seems to use windows server 2016 and resolve the outbound connection as well.

@ghost
Copy link

ghost commented Apr 6, 2018

I am forced to use Canadian data centers and I want to create a hybrid 1.9.6 kubernetes cluster. Im using acs-engine v0.14.6 and am running into this issue. Is there any workarounds? @JiangtianLi

@stale
Copy link

stale bot commented Mar 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

@stale stale bot added the stale label Mar 9, 2019
@stale stale bot closed this as completed Mar 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants