Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL: Need to force net.ipv4.ip_forward #5341

Closed
mook-as opened this issue Aug 14, 2023 · 2 comments
Closed

WSL: Need to force net.ipv4.ip_forward #5341

mook-as opened this issue Aug 14, 2023 · 2 comments
Labels
kind/bug Something isn't working platform/windows triage/need-to-repro Needs to be reproduced by dev team
Milestone

Comments

@mook-as
Copy link
Contributor

mook-as commented Aug 14, 2023

Actual Behavior

Sometimes net.ipv4.ip_forward defaults to 0, causing traefik to fail to start up. It's unclear what circumstances lead to this.

Steps to Reproduce

  • Clean install (VM) of Windows 10 with updates
  • WSL installed (inbox) manually, with OpenSUSE Leap 15.5
  • Rancher Desktop installed system-wide
  • Try to run traefik.bat and notice that the test is stuck on waiting for k8s to start

Result

The traefik lb pod goes into CrashLoopBackoff; examining logs indicates that /proc/sys/net/ipv4/ip_forward was set to 0 (instead of the expected 1)

Expected Behavior

Rancher Desktop should do the necessary set up so that the WSL VM is in a state that can run our workloads.

Additional Information

Manually running sysctl -w net.ipv4.ip_forward=1 (in a different WSL distribution) and then restarting Rancher Desktop appears to fix the issue.

Rancher Desktop Version

1.9.1-512-g48956782

Rancher Desktop K8s Version

1.22.7

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

Windows 10 Pro 22H2 (Build 19045.3324)

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

N/A

@mook-as mook-as added kind/bug Something isn't working platform/windows triage/need-to-repro Needs to be reproduced by dev team labels Aug 14, 2023
@jandubois jandubois added this to the 1.15 milestone May 25, 2024
@jandubois
Copy link
Member

jandubois commented May 25, 2024

I've been able to reproduce this issue on Windows 11 while @Nino-K did not observe it on Windows 10. We made sure that we were using the same WSL2 versions and the same kernel version.

net.ipv4.ip_forward is set to 1 inside the WSL distro, and in the regular traefik pod, but is 0 in the svclb pod (at least during startup, the container is stopped right away, so there is no chance to manually inspect it).

Manually running sysctl -w net.ipv4.ip_forward=1 (in a different WSL distribution) and then restarting Rancher Desktop appears to fix the issue.

Given that this is already enabled in the rancher-desktop distro, I'm surprised this makes a difference. Maybe it needs to be enabled in the default namespace?

There is code that forces net.ipv4.ip_forward=1 in svclb in k3s 1.25.3 that has been backported to the corresponding patch releases of 1.23 and 1.24, but not to any earlier versions (part of k3s-io/k3s#6181).

So running Kubernetes 1.25.3+ is a workaround to avoid this problem.

We should still try to find a workaround, e.g. by enabling this option before creating our own separate namespace. Or maybe both before and after?

If this doesn't help, then we should create a diagnostic instead to tell the user that Traefik isn't working, and recommend upgrading to a non-obsolete version of Kubernetes. See also #6342.

In that case we also need to update all BATS tests using Traefik to either require a newer Kubernetes version, or skip the test if the requested version is too old.

We may also want to increase the default version used for testing to something more recent.

@jandubois
Copy link
Member

Fixed by #7110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working platform/windows triage/need-to-repro Needs to be reproduced by dev team
Projects
None yet
Development

No branches or pull requests

2 participants