Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kind-with-registry.sh breaks with podman #3729

Open
wbrefvem opened this issue Sep 2, 2024 · 15 comments
Open

kind-with-registry.sh breaks with podman #3729

wbrefvem opened this issue Sep 2, 2024 · 15 comments
Labels
area/provider/podman Issues or PRs related to podman kind/external upstream bugs kind/support Categorizes issue or PR as a support question.

Comments

@wbrefvem
Copy link

wbrefvem commented Sep 2, 2024

What should be cleaned up or changed:

The script site/static/examples/kind-with-registry.sh uses kind get nodes and then iterates over the raw output in order to run docker exec to update each node's hosts.toml. The output when using podman looks something like this:

$ kind get nodes
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
kind-control-plane
kind-worker2
kind-worker3
kind-worker

This breaks because the script assumes no text in the output other than node names.

On the other hand,

$ kubectl get nodes -o jsonpath='{.items[*].metadata.name}'
kind-control-plane kind-worker kind-worker2 kind-worker3

achieves the desired result without breaking podman users.

Why is this needed:

It's common for podman users to alias docker to podman. This small change allows them to use the script without any fiddling.

@wbrefvem wbrefvem added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 2, 2024
@wbrefvem
Copy link
Author

wbrefvem commented Sep 2, 2024

This is a one-line fix that I'm happy to contribute.

@aojea
Copy link
Contributor

aojea commented Sep 2, 2024

This is a one-line fix that I'm happy to contribute.

please do

@aojea
Copy link
Contributor

aojea commented Sep 3, 2024

I apologize, I didn't read your proposal was to use kubectl, I don't think that is valid, let's keep discussing over the PR

nevermind, let's wait for Ben

@BenTheElder
Copy link
Member

Hmm, the other text output should be to stderr, is it not, or are we capturing both?

@BenTheElder
Copy link
Member

#3731 (comment)

Can you share more details about the script failure, the kind and podman versions, etc?

I'm confused as to why this wouldn't already be working, we should only be capturing stdout and we should be logging to stderr.

@aojea
Copy link
Contributor

aojea commented Sep 4, 2024

it seems there is no problem with the script #3731 (comment)

should we close?

@wbrefvem
Copy link
Author

wbrefvem commented Sep 4, 2024

@aojea It breaks consistently for me with podman, but I'll leave it to the maintainers to decide what to do about it. I now believe the issue is that the script tries to exec into the containers without knowing if they're ready, and sometimes they aren't. It's unclear under what circumstances this is possible, but it could probably be fixed by retrying with a backoff. Again, I'll leave it to you to decide if it's worth doing.

@aojea
Copy link
Contributor

aojea commented Sep 4, 2024

I think with more verbosity you'll get a more meaningful error

@BenTheElder
Copy link
Member

let's insert kind export logs right before the kind get nodes call and upload the log files if you can please, something is wrong if exec is broken following kind create cluster which internally performed many exec after waiting for the containers to be ready.

It sounds like the kubectl call delays things enough that they're healthy again, but that shouldn't be happening and doesn't make much sense, maybe they're restarting?

@wbrefvem
Copy link
Author

wbrefvem commented Sep 6, 2024

@BenTheElder I now suspect that the error is in my config file, but I can't see how. I can't reproduce the error using the script as provided. In any case, I've attached it along with the kind logs. I've also attached a diff of the changes I've made to the script (which should be inconsequential and are just for local debugging).
kind-logs.tar.gz
diff-and-config.tar.gz

@wbrefvem
Copy link
Author

wbrefvem commented Sep 6, 2024

maybe they're restarting?

They're not.

@aojea
Copy link
Contributor

aojea commented Sep 7, 2024

the serial log has this messag

Cannot connect to Podman. Please verify your connection to the Linux system using podman system connection list, or try podman machine init and podman machine start to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: EOF

also I notice this is running in arm64, can this be a problem? is podman creating a VM?

@wbrefvem
Copy link
Author

wbrefvem commented Sep 7, 2024

also I notice this is running in arm64, can this be a problem? is podman creating a VM?

Yeah, this is running on macOS arm64. The standard podman machine VM is up and running (otherwise the script would have failed earlier). That VM is a Fedora CoreOS box that's provisioned with ssh keys by ignition. The fact that the ssh handshake is failing tells me that perhaps the correct private key is not being read.

@BenTheElder
Copy link
Member

The fact that the ssh handshake is failing tells me that perhaps the correct private key is not being read.

I'm not sure what we should do about this, this implementation detail in the podman setup is not something kind is aware of and I don't think it should be.

@wbrefvem
Copy link
Author

Agreed. I'm going to try and find a fix on the podman side.

@BenTheElder BenTheElder added kind/support Categorizes issue or PR as a support question. kind/external upstream bugs area/provider/podman Issues or PRs related to podman and removed kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/podman Issues or PRs related to podman kind/external upstream bugs kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants