Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s 1.21 failing on Windows 20H2 Standard #72

Closed
sirredbeard opened this issue Aug 3, 2021 · 17 comments
Closed

k8s 1.21 failing on Windows 20H2 Standard #72

sirredbeard opened this issue Aug 3, 2021 · 17 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@sirredbeard
Copy link
Contributor

sirredbeard commented Aug 3, 2021

Tracking issue for rancher/rancher#32932

Workaround:

Document that only Windows 20H2 Datacenter is supported for 1.21.

@sirredbeard sirredbeard added bug Something isn't working documentation Improvements or additions to documentation labels Aug 3, 2021
@phillipsj
Copy link
Contributor

phillipsj commented Aug 3, 2021

You will need an rke2 server and a Windows agent to reproduce the errors we are seeing. What is happening is that when on a 20H2 Standard Edition server, containerd starts up and the containerd go client fails to connect to the fully qualified npipe address.

npipe:\\\\.pipe\containerd-containerd

If I start containerd manually and use critctl everything works as expected. If I change the code to use:

\\.pipe\containerd-containerd

Then it works as expected. When on 20H2 Datacenter Edition everything works and executes as expected.

RKE2 Server Install;

curl -sfL https://get.rke2.io | sh -
echo "RKE2_CNI=calico" >> /usr/local/lib/systemd/system/rke2-server.env
systemctl enable rke2-server.service
systemctl start rke2-server.service

curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL="testing" sh -
echo "RKE2_CNI=calico" >> /usr/local/lib/systemd/system/rke2-server.env
systemctl enable rke2-server.service
systemctl start rke2-server.service

#after rke2 has started
crictl config --set runtime-endpoint=unix:///run/k3s/containerd/containerd.sock

curl -o /usr/local/bin/calicoctl -sOL https://github.com/projectcalico/calicoctl/releases/download/v3.19.0/calicoctl
chmod +x /usr/local/bin/calicoctl
calicoctl ipam configure --strictaffinity=true

You will need a 20H2 Standard Edition and Datacenter Edition with the Windows Containers feature enabled. Then execute:

Invoke-WebRequest -Uri https://raw.githubusercontent.com/rancher/rke2/master/install.ps1 -Outfile install.ps1
New-Item -Type Directory c:/etc/rancher/rke2 -Force
Set-Content -Path c:/etc/rancher/rke2/config.yaml -Value @"
server: https://<server>:9345
token: <token from server node>
"@

$env:PATH+=";c:\var\lib\rancher\rke2\bin;c:\usr\local\bin"

[Environment]::SetEnvironmentVariable(
    "Path",
    [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine) + ";c:\var\lib\rancher\rke2\bin;c:\usr\local\bin",
    [EnvironmentVariableTarget]::Machine)

./install.ps1
rke2.exe agent

Original issue: rancher/rke2#1454

@kevpar
Copy link

kevpar commented Aug 6, 2021

Hi @phillipsj I'd like to understand better what issue you are seeing connecting to containerd via named pipe. Can you share some more details please?

  • What address is containerd configured to listen on?
  • What client is connecting to containerd?
  • What address is the client configured to connect to?
  • What error message is observed from the client?

One note is that I would expect the named pipe path to look like either npipe://\\.\pipe\containerd-containerd (in k8s or crictl config) or just \\.\pipe\containerd-containerd (in containerd config). I see the format of your address is slightly different above, but not sure if that's just a typo in the issue or not.

@phillipsj
Copy link
Contributor

@keypar here are the details:

  • The address is \.\pipe\containerd-containerd
  • The containerd go client
  • The containerd go client is using : npipe:\\.pipe\containerd-containerd
  • The error message is below.
failed to dial \"npipe:////./pipe/containerd-containerd\": connection error: desc = \"transport: error while dialing: open npipe:////./pipe/containerd-containerd: The filename, directory name, 
or volume label syntax is incorrect.\"

I can't imagine it is as simple as the address I am using. The exact same configuration works on 1809, 2004, 20H2 Data Center, but doesn't work on 20H2 standard. If it was an incorrect address I would expect it to at least fail on 20H2 Data Center.

@kevpar
Copy link

kevpar commented Aug 6, 2021

If you're using the containerd Go client code and not customizing what dialer is used (via WithDialOpts), then the default dialer will connect with winio.DialPipe. DialPipe expects the pipe to be in the format \\.\<pipename, and doesn't support the npipe:// prefix. I was able to repro on a Win11 machine that \\.\pipe\containerd-containerd works and npipe:////./pipe/containerd-containerd does not.

To summarize, \\.\pipe\containerd-containerd is the expected format to use with the containerd Go client code. The npipe:// prefixed format is something added by Kubernetes tooling, and not supported by containerd or Windows directly.

What's still unclear to me is how the npipe:// prefixed format would have worked on other OS releases, but I think the answer here is to stick with the \\.\pipe\containerd-containerd path with the containerd Go client instead.

@phillipsj
Copy link
Contributor

That's good to know and interesting that crictl requires the npipe:// prefix.

@phillipsj
Copy link
Contributor

@kevpar could you please try to repo on 20H2 standard vs datacenter? That is where I am seeing the issue.

@phillipsj
Copy link
Contributor

@kevpar I just tested the same release on Windows Server 2022 Data Center preview and it behaves the same as 2019, 2004, and 20H2 Data Center. It seems standard is still the only one that has an issue. I will test 2022 standard out of curiosity.

@kevpar
Copy link

kevpar commented Aug 17, 2021

@phillipsj you saying that the npipe:// prefixed path works on 2022/20H2/2004/2019 Datacenter editions, but doesn't work on 20H2 Standard (and 2022/2004/2019 Standard are as-yet untested), right?

Can you provide the exact Go client code that is making the connection in the cases where npipe:// works?

@phillipsj
Copy link
Contributor

@kevpar I just tested the same release on Windows Server 2022 Data Center preview and it behaves the same as 2019, 2004, and 20H2 Data Center. It seems standard is still the only one that has an issue. I will test 2022 standard out of curiosity. It is line 103 here that is causing the issue.

@kevpar
Copy link

kevpar commented Aug 18, 2021

I set up a 20H2 Standard VM and tested using a simplified test program.

PS C:\Users\Administrator\desktop> cmd /c ver

Microsoft Windows [Version 10.0.19042.1165]
PS C:\Users\Administrator\desktop> ./ctrdconnect
PS C:\Users\Administrator\desktop> $LASTEXITCODE
0

I'm not seeing the same issue repro. Can you try this same test program?

@phillipsj
Copy link
Contributor

2022 standard has been confirmed to work with the same version.

@kevpar
Copy link

kevpar commented Aug 19, 2021

Is everything working as expected then? Or are you still seeing issues on some platform?

@phillipsj
Copy link
Contributor

I still need to retest Windows 20H2 standard, I will retest using the same ISO and one that is up to date to see if I get different results. Several people have independently verified that 20H2 standard edition works. Your initial test showed it wouldn't have worked on Windows 11. So there seems to be something happening.

@kevpar
Copy link

kevpar commented Aug 20, 2021

The scenario is actually a little different than I understood at first. Using the standard containerd.New function to create a connection will not work if given an address with the npipe:// prefix. However, in your case you're using the Kubernetes util.GetAddressAndDialer function and passing it the npipe:// path, which is expected to work as the function will correctly handle the prefix. So everything I have observed so far indicates we are seeing correct behavior.

@kevpar
Copy link

kevpar commented Aug 20, 2021

The two methods of establishing a connection are the equivalent of testing with ctr versus crictl, for what it's worth.

@sirredbeard
Copy link
Contributor Author

@phillipsj Can this issue be closed with the latest Windows Server patches?

@phillipsj
Copy link
Contributor

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

6 participants