-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm panic during phase based init #1382
Comments
i cannot reproduce this problem on a Ubuntu 17.10 x86_64 setup:
execution continues fine after writing this file:
init succeeds:
some questions:
please note that we don't have test signal for ARM, so the support is experimental. |
Hmm. What's the content of the file for you? As for the questions:
|
a properly populated kubeconfig file for the scheduler. |
I am going through the code but so far I just don't see how it could fail writing after https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L93 |
Hmmmmmmmm. It worked on the other RPi
I'll try another fresh install on the other one. |
Still the same on the other one. Crazy!
|
I tried this on centos but execution continues after [kubeconfig] and init suceeded. |
@tcurdt @neolit123 I was not able to reproduce it in my environment. After looking at the traces I can see the difference. It breaks here https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L235 which means that it was able to load configuration file. However, on my machine it always exists here https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L224 which means that configuration file doesn't exist. @tcurdt if you still have this issue reproducible can you do the following:
|
@bart0sh i suspect memory corruption on ARM - but this could be isolated to the RPI CPU. i still have a plan to send one minor patch related to this. |
@neolit123 it could be so, but it doesn't explain the fact that it was able to load config file, does it? |
it loads a zero byte config, which is a valid operation. |
@neolit123 where zero byte config comes from? I don't see it happening in my setup. |
reproduced with empty config:
|
@neolit123 Do you have any idea why there is an empty config in @tcurdt's setup? |
from my earlier comment:
we need to reproduce and debug this on Raspberry PI. if you don't have one, just leave this issue for now. |
@neolit123 Would it make sense to fix it by checking if config.Contexts map has the expectedCtx/currentCtx before using it? we can actually investigate it further with @tcurdt's help and find out where those empty or broken files come from. |
yes, my idea was to apply a similar change across kubeadm (AFAIK we assume a valid config like that in more than one place). but as you can understand this will not fix the problem, only the panic. please feel free to send a patch for the above if you'd like. |
@bart0sh Have a look here #1380 (comment) ...and also read the progress leading up to it. Unfortunately I don't have the full output of Of course there is no way to rule out an ARM memory corruption - but given the fact that it is very reproducible (at least on my RPi) and it works when all phases are run separately, I'd rather bet on some kind of a race condition. I'd just wouldn't expect a memory corruption to be that reproducible. But I don't know the go compiler/runtime well enough to make an educated guess. The time I can spend on this is limited but I am happy to help to further dig into this. |
@tcurdt can you please run kubeadm trough something like delve or your debugger of choice: also please see if you RPI distro has valgrind and run the binary through that. |
@tcurdt Yes, the issue is that after |
Works for me:
|
@gbailey46 can you please also add the exact environment you ran this on. Otherwise "works for me" is not exactly helpful. |
Rpi3B
|
@gbailey46 @tcurdt |
Well, it worked for me on a RPi3B, too. It did not work on a Rpi3B+. |
Yes it works repeatedly. |
so it technically exhibits the same SIGSEGV behavior. i also see the CPUs for the two boards are the same; can someone test on a non-ARM Cortex-A53 board (if there is such a RPI even)? also we still need help with someone debugging the root of the problem. |
I was out of action the past few days but it's on my todo to have another closer look. |
Can you tell when they're created? Was it after running |
They are created when you execute:
|
@gbailey46 thanks. That's very interesting. Looks like a race condition to me. I don't see where in the code that could happen. will look again. |
@neolit123 I was just trying to give it another shot but apparently Anyone willing to pair on this via IRC/discord/whatever? |
hi, we are prepping for the 1.14 release and i won't have time anytime soon. |
@neolit123 too bad. Someone else? Some other suggestion for a debugger? In theory this should just work:
So shall I run this through valgrind
or is the primary objective to find when the file is created with size 0? So I run through all the phases and list the content? |
if the debuggers are not very useful i would start adding fmt.Print(...) calls in a lot of places, until i find where/why that config ends up being zero. |
I'd be happy to help you with this. I'm Ed@kubernetes.slack.com |
Big thanks to @bart0sh. Unfortunately we could no longer reproduce it. Neither with the current version (maybe I should have done a full re-install instead of just a reset) nor with the latest master. |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):Environment:
kubectl version
):Cloud provider or hardware configuration:
RPi3B+
OS (e.g. from /etc/os-release):
uname -a
):What happened?
I was trying to work around the race conditions mentioned in #413 and #1380, executing the
kubeadm init
in phases. Instead it crashed on the second call toinit
.What you expected to happen?
I should see the join information.
How to reproduce it (as minimally and precisely as possible)?
Fresh install of hypriotos-rpi-v1.9.0 then:
Anything else we need to know?
This is output with the panic information
It seems like some nil checks are missing https://github.com/kubernetes/kubernetes/blob/v1.13.3/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L236
But the kubeconfig file looks OK to me
The text was updated successfully, but these errors were encountered: