-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verified workaround for kube-proxy hostnameOverride issue #9663
Conversation
Deploy preview for kubernetes-io-master-staging ready! Built with commit 50625ae https://deploy-preview-9663--kubernetes-io-master-staging.netlify.com |
bd62d1b
to
50625ae
Compare
/sig cluster-lifecycle |
/approve Needs technical LGTM |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mistyhacks The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @detiber |
}, | ||
{ | ||
"op": "add", | ||
"path": "/spec/template/spec/containers/0/command/-", | ||
"value": "--hostname-override=${NODE_NAME}" | ||
"path": "/spec/template/spec/initContainers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the switch to an initContainer here? Is the --hostname-override
command line option no longer taking precedence over the component config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly right. I found that --hostname-override
did not take precedence over the component config. I tested with v1.10.4 and v1.11.1 and reported my findings in kubernetes/kubeadm#857.
Thanks for taking a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlipovetsky @detiber
that's very odd. so for kube-proxy --hostname-override
just overrides the default config:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-proxy/app/server.go#L144
i don't think anything should modify it at that point, right?
@stewart-yu
any ideas about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neolit123 I ran experiments and found that the config overrides flags (Please see my findings in kubernetes/kubeadm#857 (comment)).
Looking at the code, I find that the flags update a config struct, but later the configuration file is unmarshalled into a new config struct, and the pointer to the original config struct is overwritten, effectively discarding all the flag values. (Maybe this is by design, since the flags are deprecated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, even if the flags are deprecated they have a period of N releases where they should still work as expected. if this is by design, then we have a bit of an issue here.
my problem is that i don't know who to poke about kube-proxy.
i usually ask @stewart-yu but let's ask the whole sig-network about this...
/cc @kubernetes/sig-network-bugs
please, have a look at the observations by @dlipovetsky above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config file and flags appear to have been mutually exclusive starting with the commit that introduced support for the config file. That was in v1.7.
} | ||
} | ||
], | ||
"image": "k8s.gcr.io/kube-proxy-amd64:v1.11.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a note that the arch and version is hardcoded here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON doesn't allow comments, so it should go bellow the JSON example.
something in the lines of this perhaps?
Please replace
amd64
andv1.11.1
inkube-proxy-amd64:v1.11.1
with the architecture and version you wish to use for the kube-proxy image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use "busybox" instead, sine all it's doing is invoking sed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to use the same image for the init container and kube-proxy container because the busybox image might not be available. I am applying this patch on an air-gapped cluster where busybox is not present.
Air-gapped clusters are not the common use case, so I think it would be ok to use busybox by default and add a note to replace busybox with the kube-proxy image if busybox is not available.
I can also think of some way to derive the correct kube-proxy image at the time the patch is applied, but that might be over-engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber Do you have an opinion on whether the patch should use the busybox
image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlipovetsky On one hand I like the idea of using a minimal image for the initContainer, but on the other I do like the idea of not requiring an additional image for air-gapped environments. I'm not sure that either approach has a clear advantage for a temporary workaround...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Temporary" to a point: as of now we don't know who will work on making kube-proxy's --hostname-override
flag work again.
Regarding the air-gapped situation, I build my machine images with busybox pulled, and use the "imagePullPolicy" field with value "IfNotPresent," so that the init container here won't pull the image again. We could achieve the same thing by specifying an image tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seh are you able to submit a PR with your proposed changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that tomorrow morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #9878.
"path": "/spec/template/spec/volumes/-", | ||
"value": { | ||
"name": "shared-data", | ||
"mountPath": "/shared-data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this volume definition mention it being of type of "emptyDir?" The presence of the "mountPath" field looks wrong here.
@dlipovetsky i will leave it to @detiber for the LGTM. |
It is unfortunate that we have kube-proxy trying to use a single, common configuration file on all nodes in the cluster, but it gets every one of those node names wrong, so we have to override its use of the hostname—replacing it with the node name (which can't match the hostname in AWS)—but then a flag-handling bug in kube-proxy forces us to make a unique copy of that common configuration file on every node. We're back where we'd be if kube-proxy never read a common configuration file at all, only with the brittleness of trying to patch that file. While we're here, I'll note that kube-proxy's
|
When kube-proxy was refactored to use a configuration file, the ability to use 0 for conntrack min, max, max per core, and tcp timeouts was inadvertently broken; if you specified 0, it would instead apply the default value from defaults.go. This change restores the ability to use 0 to mean 0. Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Add support for configuring kube-proxy via a config file instead of command line flags.
/assign leaving docs approval in place, but it looks to me as though we could use some explanatory words in addition to the examples, whenever and however you decide how to proceed. If you add words, you could tag me and I'll take a look (or I can follow up after merge, either way). |
Since I apply this patch when each master instance starts, I wanted to use an idempotent operation, and hence need to avoid using a JSON patch like the proposed change here does. I found that I could use a strategic merge patch instead, so long as I am willing to repeat the command-line arguments for the "kube-proxy" container. kubectl --namespace=kube-system patch daemonset kube-proxy \
--patch='
spec:
template:
spec:
volumes:
- name: shared-data
emptyDir: {}
initContainers:
- name: update-config-file
image: busybox
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- "/bin/sed \"s/hostnameOverride: \\\"\\\"/hostnameOverride: $(NODE_NAME)/\" /var/lib/kube-proxy/config.conf > /shared-data/config.conf"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
volumeMounts:
- name: kube-proxy
mountPath: /var/lib/kube-proxy
readOnly: true
- name: shared-data
mountPath: /shared-data
containers:
- name: kube-proxy
command:
- /usr/local/bin/kube-proxy
- --config=/shared-data/config.conf
volumeMounts:
- name: shared-data
mountPath: /shared-data
readOnly: true' Omitted here are the other parts of this DaemonSet I've needed to patch to keep up with kubernetes/kubernetes#65931. Notable differences from @dlipovetsky's version:
|
@seh Thanks a lot for contributing your version of the patch. Because applying it should be idempotent, I think it's a better solution than the JSON patch in this PR. However, when I tested your patch, I found it could not be applied more than once, and kubectl exited with a non-zero exit code. Please see the logs here: https://gist.github.com/dlipovetsky/f9d84c20827b025f646e2bf7193df75c |
Yes, @dlipovetsky, I did observe that, even with the previous version of my strategic merge patch version of my workaround for kubernetes/kubernetes#65931. I didn't include it here since I thought it would obscure the focus of the proposal, but here's the additional harness that I used to overcome kubectl patch's annoying exit code choice. patch_output=$( \
kubectl --namespace=kube-system patch daemonset kube-proxy \
--patch='
spec:
template:
# ...
')
if [ $? != 0 ] && [ "${patch_output}" != 'daemonset.extensions/kube-proxy not patched' ]; then
code=$?
echo "Failed to patch kube-proxy DaemonSet: ${patch_output}" 1>&2
exit ${code}
fi The script feels clumsy to me, so I'm open to suggestions for how to improve it. In short, when kubectl patch doesn't apply a patch because it determines that no change is necessary, it exits with code 1, even though it ostensibly succeeded in ensuring the target object conforms to the patch. I'm assuming that if kubectl patch fails for some more grave reason, it will print a different message than the one I've captured there. Note that kubectl delete has the |
@seh Thanks for explaining! Like you, I apply the patch across multiple masters, and I have a similar wrapper that makes applying the patch in the PR idempotent. (I agree that having Both the JSON patch and strategic merge patch require a script in order to make the apply idempotent. If the strategic merge did not require a script, I would prefer it. As it stands, I prefer the JSON patch because it makes very explicit what is changed, added, or removed. If it's ok with you, I will update the JSON patch (in the PR) with your suggestions to use @detiber, @neolit123 : If you prefer the strategic merge patch @seh suggested in #9663 (comment), I am happy to close this PR and perhaps @seh can create a PR with the strategic merge patch. |
@dlipovetsky I'm fine with the current approach as it is only meant to be a workaround. |
@detiber Ok, thanks! |
Closing in favor of #9878. |
Updates the patch that is a work-around to kubernetes/kubeadm#857.