Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Graceful Shutdown #416

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions bindata/manifests/daemon/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ spec:
volumeMounts:
- name: cnibin
mountPath: /host/opt/cni/bin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run make manifests bundle too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not.

Here's what I get when I run make manifests bundle:

[jerpeter@fedora sriov-network-operator-upstream]$ make manifests bundle
/home/jerpeter/src/sriov-network-operator-upstream/bin/controller-gen "crd:crdVersions={v1}" webhook paths="./..." output:crd:artifacts:config=./config/crd/bases
cp ./config/crd/bases/* ./deployment/sriov-network-operator/crds/
operator-sdk generate kustomize manifests --interactive=false -q
cd config/manager && /home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize edit set image controller=controller:latest
/home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 4.7.0
Error: accumulating resources: accumulation err='accumulating resources from '../samples': '/home/jerpeter/src/sriov-network-operator-upstream/config/samples' must resolve to a file': couldn't make target for path '/home/jerpeter/src/sriov-network-operator-upstream/config/samples': unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/jerpeter/src/sriov-network-operator-upstream/config/samples'
INFO[0000] Creating bundle.Dockerfile
INFO[0000] Creating bundle/metadata/annotations.yaml
INFO[0000] Bundle metadata generated suceessfully
make: *** [Makefile:173: bundle] Error 1

[jerpeter@fedora sriov-network-operator-upstream]$ git status
On branch support-graceful-shutdown
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   bundle.Dockerfile
	modified:   config/manager/kustomization.yaml

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	bundle/
	config/manifests/

no changes added to commit (use "git add" and/or "git commit -a")

[jerpeter@fedora sriov-network-operator-upstream]$ git diff
diff --git a/bundle.Dockerfile b/bundle.Dockerfile
index bf295a04..7b73150d 100644
--- a/bundle.Dockerfile
+++ b/bundle.Dockerfile
@@ -1,16 +1,20 @@
 FROM scratch

+# Core bundle labels.
 LABEL operators.operatorframework.io.bundle.mediatype.v1=registry+v1
 LABEL operators.operatorframework.io.bundle.manifests.v1=manifests/
 LABEL operators.operatorframework.io.bundle.metadata.v1=metadata/
 LABEL operators.operatorframework.io.bundle.package.v1=sriov-network-operator
 LABEL operators.operatorframework.io.bundle.channels.v1=alpha
-LABEL operators.operatorframework.io.bundle.channel.default.v1=
-LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.0.1
+LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.12.0+git
 LABEL operators.operatorframework.io.metrics.mediatype.v1=metrics+v1
-LABEL operators.operatorframework.io.metrics.project_layout=go.kubebuilder.io/v2
-LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/
+LABEL operators.operatorframework.io.metrics.project_layout=go.kubebuilder.io/v3
+
+# Labels for testing.
 LABEL operators.operatorframework.io.test.mediatype.v1=scorecard+v1
+LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/

+# Copy files to locations specified by labels.
 COPY bundle/manifests /manifests/
 COPY bundle/metadata /metadata/
+COPY bundle/tests/scorecard /tests/scorecard/
diff --git a/config/manager/kustomization.yaml b/config/manager/kustomization.yaml
index 2bcd3eea..5e793dd1 100644
--- a/config/manager/kustomization.yaml
+++ b/config/manager/kustomization.yaml
@@ -5,6 +5,12 @@ generatorOptions:
   disableNameSuffixHash: true

 configMapGenerator:
-- name: manager-config
-  files:
+- files:
   - controller_manager_config.yaml
+  name: manager-config
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+images:
+- name: controller
+  newName: controller
+  newTag: latest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please commit that too in a separate commit.

Please fix

make: *** [Makefile:173: bundle] Error 1 first

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerpeter1 any news on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had resolved this and pushed a branch, but that doesn't appear to be the case. Today, I'm getting the same error running make manifests bundle from the head of the current master branch (without any of my changes):

[jerpeter@fedora sriov-network-operator-upstream]$ git rev-parse --short HEAD
990648ee
[jerpeter@fedora sriov-network-operator-upstream]$ git reset --hard upstream/master
HEAD is now at 990648ee Merge pull request #451 from zeeke/exclude-topology-conformance-test
[jerpeter@fedora sriov-network-operator-upstream]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
[jerpeter@fedora sriov-network-operator-upstream]$ make manifests bundle
/home/jerpeter/src/sriov-network-operator-upstream/bin/controller-gen "crd:crdVersions={v1}" webhook paths="./..." output:crd:artifacts:config=./config/crd/bases
cp ./config/crd/bases/* ./deployment/sriov-network-operator/crds/
operator-sdk generate kustomize manifests --interactive=false -q
cd config/manager && /home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize edit set image controller=controller:latest
/home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 4.7.0
Error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/jerpeter/src/sriov-network-operator-upstream/config/manifests'
INFO[0000] Creating bundle.Dockerfile
INFO[0000] Creating bundle/metadata/annotations.yaml
INFO[0000] Bundle metadata generated successfully
make: *** [Makefile:180: bundle] Error 1
[jerpeter@fedora sriov-network-operator-upstream]$ operator-sdk version
operator-sdk version: "v1.30.0", commit: "b794fe909abc1affa1f28cfb75ceaf3bf79187e6", kubernetes version: "v1.26.0", go version: "go1.20.5", GOOS: "linux", GOARCH: "amd64"

Any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated this a bit, I think we don't support csv in upstream. I like other's chime in too but I think we should remove make bundle completely from upstream (only).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SchSeba We need to get this one merged. Please chime in wrt to csv u/s support. I think we should remove make bundle to avoid confusion. Then I'm ok to merge this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove make bundle to avoid confusion.

Is that going to happen in another PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @jerpeter1 please prep a PR that does exactly that. We can get that one merged first.

terminationGracePeriodSeconds: 900
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need 15 minutes to terminate ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i think kubelet ShutdownGracePeriod is intended to be in seconds not minutes.

so in reboot/shutdown we would probably time out and forcefully killed.

volumes:
- name: host
hostPath:
Expand Down
28 changes: 26 additions & 2 deletions bindata/scripts/clean-k8s-services.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,36 @@
#!/bin/bash

chroot_path="/proc/1/root"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

this what /host before.

delay_shutdown_path="$chroot_path/tmp/sriov-delay-shutdown"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prestop shares the same file system as the container its defined on no?
if yes, why do you need to write this file on the host file system?

kubelet_config_path="$chroot_path/etc/kubernetes/kubelet.conf"

# 10 minutes - this should be shorter than the time that is specifed for the
# terminationGracePeriodSeconds in the daemonset's pod spec, so that everything
# else in the preStop hook has time to run and the Pod can be terminated properly.
wait_time=600
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont like this assumption, this should be configurable, value provided to config daemon and propagated here.

(e.g the controller reads terminationGracePeriod value and adds env var to the daemon to use)


# If the kubelet is configured to shutdown gracefully (>0s shutdownGracePeriod), we need to wait for
# things to settle before shutting down the node.
if [ -f "$delay_shutdown_path" ]; then
if grep "$kubelet_config_path" -e shutdownGracePeriod | grep -qv \"0s\"; then
start=$(date +%s)
touch "$chroot_path/var/log/sriov-delay-start"
while [ $(( $(date +%s) - $start )) -lt $wait_time ]; do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not clear to me why do we need this loop.

the file in delay_shutdown_path is created when config daemon needs reboot.
after witch the daemon exits.

if [ ! -f "$delay_shutdown_path" ]; then # don't have to wait anymore
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the meaning of NOT exiting this loop through this break statement ?

fi
sleep 1
done
rm -f "$delay_shutdown_path"
touch "$chroot_path/var/log/sriov-delay-end"
fi
fi

if [ "$CLUSTER_TYPE" == "openshift" ]; then
echo "openshift cluster"
exit
fi

chroot_path="/host"

function clean_services() {
# Remove switchdev service files
rm -f $chroot_path/etc/systemd/system/switchdev-configuration-after-nm.service
Expand Down
44 changes: 36 additions & 8 deletions pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ const (
// maxUpdateBackoff is the maximum time to react to a change as we back off
// in the face of errors.
maxUpdateBackoff = 60 * time.Second

// the presence of this file indicates that the sriov shutdown should be delayed
delayShutdownPath = "/host/tmp/sriov-delay-shutdown"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me why we need this file on the host filesystem

)

type Message struct {
Expand Down Expand Up @@ -612,6 +615,22 @@ func (dn *Daemon) completeDrain() error {
glog.Errorf("completeDrain(): failed to annotate node: %v", err)
return err
}

if _, err := os.Stat(delayShutdownPath); err != nil {
if os.IsNotExist(err) {
// delayShutdownPath does not exist, so we don't need to do anything
return nil
}

glog.Errorf("completeDrain(): error checking file status %v: %v", delayShutdownPath, err)
return err
}

if err := os.Remove(delayShutdownPath); err != nil {
glog.Errorf("completeDrain(): failed to remove file %v: %v", delayShutdownPath, err)
return err
}

return nil
}

Expand Down Expand Up @@ -679,15 +698,16 @@ func rebootNode() {
glog.Errorf("rebootNode(): %v", err)
}
defer exit()
// creates a new transient systemd unit to reboot the system.
// We explictily try to stop kubelet.service first, before anything else; this
// way we ensure the rest of system stays running, because kubelet may need
// to do "graceful" shutdown by e.g. de-registering with a load balancer.
// However note we use `;` instead of `&&` so we keep rebooting even
// if kubelet failed to shutdown - that way the machine will still eventually reboot
// as systemd will time out the stop invocation.
// creates a new transient systemd unit to reboot the system that
// reboots the system using `systemctl reboot``
// by shutting down the system this way instead via `reboot`,
// when kubelet is configured with a shutdownGracePeriod, then it will
// be give some time to pods to run their preStop scripts and respond to
// SIGTERM by terminating gracefully before being forcefully killed via
// SIGKILL. Stopping the kubelet service and then immediately running
// `reboot` just results in all pods being immediately killed
cmd := exec.Command("systemd-run", "--unit", "sriov-network-config-daemon-reboot",
"--description", "sriov-network-config-daemon reboot node", "/bin/sh", "-c", "systemctl stop kubelet.service; reboot")
"--description", "sriov-network-config-daemon reboot node", "/bin/sh", "-c", "systemctl reboot")

if err := cmd.Run(); err != nil {
glog.Errorf("failed to reboot node: %v", err)
Expand Down Expand Up @@ -933,6 +953,14 @@ func (dn *Daemon) drainNode() error {
return err
}
glog.Info("drainNode(): drain complete")

file, err := os.Create(delayShutdownPath)
if err != nil {
glog.Errorf("drainNode(): failed to create file %v %v", delayShutdownPath, err)
return err
}
defer file.Close()

return nil
}

Expand Down
Loading