-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Graceful Shutdown #416
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,6 +93,7 @@ spec: | |
volumeMounts: | ||
- name: cnibin | ||
mountPath: /host/opt/cni/bin | ||
terminationGracePeriodSeconds: 900 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need 15 minutes to terminate ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also i think kubelet so in reboot/shutdown we would probably time out and forcefully killed. |
||
volumes: | ||
- name: host | ||
hostPath: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,36 @@ | ||
#!/bin/bash | ||
|
||
chroot_path="/proc/1/root" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why ? this what /host before. |
||
delay_shutdown_path="$chroot_path/tmp/sriov-delay-shutdown" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. prestop shares the same file system as the container its defined on no? |
||
kubelet_config_path="$chroot_path/etc/kubernetes/kubelet.conf" | ||
|
||
# 10 minutes - this should be shorter than the time that is specifed for the | ||
# terminationGracePeriodSeconds in the daemonset's pod spec, so that everything | ||
# else in the preStop hook has time to run and the Pod can be terminated properly. | ||
wait_time=600 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i dont like this assumption, this should be configurable, value provided to config daemon and propagated here. (e.g the controller reads terminationGracePeriod value and adds env var to the daemon to use) |
||
|
||
# If the kubelet is configured to shutdown gracefully (>0s shutdownGracePeriod), we need to wait for | ||
# things to settle before shutting down the node. | ||
if [ -f "$delay_shutdown_path" ]; then | ||
if grep "$kubelet_config_path" -e shutdownGracePeriod | grep -qv \"0s\"; then | ||
start=$(date +%s) | ||
touch "$chroot_path/var/log/sriov-delay-start" | ||
while [ $(( $(date +%s) - $start )) -lt $wait_time ]; do | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its not clear to me why do we need this loop. the file in delay_shutdown_path is created when config daemon needs reboot. |
||
if [ ! -f "$delay_shutdown_path" ]; then # don't have to wait anymore | ||
break | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the meaning of NOT exiting this loop through this break statement ? |
||
fi | ||
sleep 1 | ||
done | ||
rm -f "$delay_shutdown_path" | ||
touch "$chroot_path/var/log/sriov-delay-end" | ||
fi | ||
fi | ||
|
||
if [ "$CLUSTER_TYPE" == "openshift" ]; then | ||
echo "openshift cluster" | ||
exit | ||
fi | ||
|
||
chroot_path="/host" | ||
|
||
function clean_services() { | ||
# Remove switchdev service files | ||
rm -f $chroot_path/etc/systemd/system/switchdev-configuration-after-nm.service | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,9 @@ const ( | |
// maxUpdateBackoff is the maximum time to react to a change as we back off | ||
// in the face of errors. | ||
maxUpdateBackoff = 60 * time.Second | ||
|
||
// the presence of this file indicates that the sriov shutdown should be delayed | ||
delayShutdownPath = "/host/tmp/sriov-delay-shutdown" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not clear to me why we need this file on the host filesystem |
||
) | ||
|
||
type Message struct { | ||
|
@@ -612,6 +615,22 @@ func (dn *Daemon) completeDrain() error { | |
glog.Errorf("completeDrain(): failed to annotate node: %v", err) | ||
return err | ||
} | ||
|
||
if _, err := os.Stat(delayShutdownPath); err != nil { | ||
if os.IsNotExist(err) { | ||
// delayShutdownPath does not exist, so we don't need to do anything | ||
return nil | ||
} | ||
|
||
glog.Errorf("completeDrain(): error checking file status %v: %v", delayShutdownPath, err) | ||
return err | ||
} | ||
|
||
if err := os.Remove(delayShutdownPath); err != nil { | ||
glog.Errorf("completeDrain(): failed to remove file %v: %v", delayShutdownPath, err) | ||
return err | ||
} | ||
|
||
return nil | ||
} | ||
|
||
|
@@ -679,15 +698,16 @@ func rebootNode() { | |
glog.Errorf("rebootNode(): %v", err) | ||
} | ||
defer exit() | ||
// creates a new transient systemd unit to reboot the system. | ||
// We explictily try to stop kubelet.service first, before anything else; this | ||
// way we ensure the rest of system stays running, because kubelet may need | ||
// to do "graceful" shutdown by e.g. de-registering with a load balancer. | ||
// However note we use `;` instead of `&&` so we keep rebooting even | ||
// if kubelet failed to shutdown - that way the machine will still eventually reboot | ||
// as systemd will time out the stop invocation. | ||
// creates a new transient systemd unit to reboot the system that | ||
// reboots the system using `systemctl reboot`` | ||
// by shutting down the system this way instead via `reboot`, | ||
// when kubelet is configured with a shutdownGracePeriod, then it will | ||
// be give some time to pods to run their preStop scripts and respond to | ||
// SIGTERM by terminating gracefully before being forcefully killed via | ||
// SIGKILL. Stopping the kubelet service and then immediately running | ||
// `reboot` just results in all pods being immediately killed | ||
cmd := exec.Command("systemd-run", "--unit", "sriov-network-config-daemon-reboot", | ||
"--description", "sriov-network-config-daemon reboot node", "/bin/sh", "-c", "systemctl stop kubelet.service; reboot") | ||
"--description", "sriov-network-config-daemon reboot node", "/bin/sh", "-c", "systemctl reboot") | ||
|
||
if err := cmd.Run(); err != nil { | ||
glog.Errorf("failed to reboot node: %v", err) | ||
|
@@ -933,6 +953,14 @@ func (dn *Daemon) drainNode() error { | |
return err | ||
} | ||
glog.Info("drainNode(): drain complete") | ||
|
||
file, err := os.Create(delayShutdownPath) | ||
if err != nil { | ||
glog.Errorf("drainNode(): failed to create file %v %v", delayShutdownPath, err) | ||
return err | ||
} | ||
defer file.Close() | ||
|
||
return nil | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run
make manifests bundle
too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not.
Here's what I get when I run
make manifests bundle
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please commit that too in a separate commit.
Please fix
make: *** [Makefile:173: bundle] Error 1
firstThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerpeter1 any news on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I had resolved this and pushed a branch, but that doesn't appear to be the case. Today, I'm getting the same error running
make manifests bundle
from the head of the current master branch (without any of my changes):Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated this a bit, I think we don't support csv in upstream. I like other's chime in too but I think we should remove
make bundle
completely from upstream (only).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SchSeba We need to get this one merged. Please chime in wrt to csv u/s support. I think we should remove
make bundle
to avoid confusion. Then I'm ok to merge this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that going to happen in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @jerpeter1 please prep a PR that does exactly that. We can get that one merged first.