-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Graceful Shutdown #416
base: master
Are you sure you want to change the base?
Support Graceful Shutdown #416
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
1 similar comment
Thanks for your PR,
To skip the vendors CIs use one of:
|
08ff040
to
f947aa4
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 4371184613
💛 - Coveralls |
/test-all |
/assign @bn222 |
/assign @SchSeba |
Thanks for your PR,
To skip the vendors CIs use one of:
|
/test-all |
memory: 10Mi | ||
volumeMounts: | ||
- name: cnibin | ||
mountPath: /host/opt/cni/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run make manifests bundle
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not.
Here's what I get when I run make manifests bundle
:
[jerpeter@fedora sriov-network-operator-upstream]$ make manifests bundle
/home/jerpeter/src/sriov-network-operator-upstream/bin/controller-gen "crd:crdVersions={v1}" webhook paths="./..." output:crd:artifacts:config=./config/crd/bases
cp ./config/crd/bases/* ./deployment/sriov-network-operator/crds/
operator-sdk generate kustomize manifests --interactive=false -q
cd config/manager && /home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize edit set image controller=controller:latest
/home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 4.7.0
Error: accumulating resources: accumulation err='accumulating resources from '../samples': '/home/jerpeter/src/sriov-network-operator-upstream/config/samples' must resolve to a file': couldn't make target for path '/home/jerpeter/src/sriov-network-operator-upstream/config/samples': unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/jerpeter/src/sriov-network-operator-upstream/config/samples'
INFO[0000] Creating bundle.Dockerfile
INFO[0000] Creating bundle/metadata/annotations.yaml
INFO[0000] Bundle metadata generated suceessfully
make: *** [Makefile:173: bundle] Error 1
[jerpeter@fedora sriov-network-operator-upstream]$ git status
On branch support-graceful-shutdown
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: bundle.Dockerfile
modified: config/manager/kustomization.yaml
Untracked files:
(use "git add <file>..." to include in what will be committed)
bundle/
config/manifests/
no changes added to commit (use "git add" and/or "git commit -a")
[jerpeter@fedora sriov-network-operator-upstream]$ git diff
diff --git a/bundle.Dockerfile b/bundle.Dockerfile
index bf295a04..7b73150d 100644
--- a/bundle.Dockerfile
+++ b/bundle.Dockerfile
@@ -1,16 +1,20 @@
FROM scratch
+# Core bundle labels.
LABEL operators.operatorframework.io.bundle.mediatype.v1=registry+v1
LABEL operators.operatorframework.io.bundle.manifests.v1=manifests/
LABEL operators.operatorframework.io.bundle.metadata.v1=metadata/
LABEL operators.operatorframework.io.bundle.package.v1=sriov-network-operator
LABEL operators.operatorframework.io.bundle.channels.v1=alpha
-LABEL operators.operatorframework.io.bundle.channel.default.v1=
-LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.0.1
+LABEL operators.operatorframework.io.metrics.builder=operator-sdk-v1.12.0+git
LABEL operators.operatorframework.io.metrics.mediatype.v1=metrics+v1
-LABEL operators.operatorframework.io.metrics.project_layout=go.kubebuilder.io/v2
-LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/
+LABEL operators.operatorframework.io.metrics.project_layout=go.kubebuilder.io/v3
+
+# Labels for testing.
LABEL operators.operatorframework.io.test.mediatype.v1=scorecard+v1
+LABEL operators.operatorframework.io.test.config.v1=tests/scorecard/
+# Copy files to locations specified by labels.
COPY bundle/manifests /manifests/
COPY bundle/metadata /metadata/
+COPY bundle/tests/scorecard /tests/scorecard/
diff --git a/config/manager/kustomization.yaml b/config/manager/kustomization.yaml
index 2bcd3eea..5e793dd1 100644
--- a/config/manager/kustomization.yaml
+++ b/config/manager/kustomization.yaml
@@ -5,6 +5,12 @@ generatorOptions:
disableNameSuffixHash: true
configMapGenerator:
-- name: manager-config
- files:
+- files:
- controller_manager_config.yaml
+ name: manager-config
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+images:
+- name: controller
+ newName: controller
+ newTag: latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please commit that too in a separate commit.
Please fix
make: *** [Makefile:173: bundle] Error 1
first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerpeter1 any news on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I had resolved this and pushed a branch, but that doesn't appear to be the case. Today, I'm getting the same error running make manifests bundle
from the head of the current master branch (without any of my changes):
[jerpeter@fedora sriov-network-operator-upstream]$ git rev-parse --short HEAD
990648ee
[jerpeter@fedora sriov-network-operator-upstream]$ git reset --hard upstream/master
HEAD is now at 990648ee Merge pull request #451 from zeeke/exclude-topology-conformance-test
[jerpeter@fedora sriov-network-operator-upstream]$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
[jerpeter@fedora sriov-network-operator-upstream]$ make manifests bundle
/home/jerpeter/src/sriov-network-operator-upstream/bin/controller-gen "crd:crdVersions={v1}" webhook paths="./..." output:crd:artifacts:config=./config/crd/bases
cp ./config/crd/bases/* ./deployment/sriov-network-operator/crds/
operator-sdk generate kustomize manifests --interactive=false -q
cd config/manager && /home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize edit set image controller=controller:latest
/home/jerpeter/src/sriov-network-operator-upstream/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 4.7.0
Error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/jerpeter/src/sriov-network-operator-upstream/config/manifests'
INFO[0000] Creating bundle.Dockerfile
INFO[0000] Creating bundle/metadata/annotations.yaml
INFO[0000] Bundle metadata generated successfully
make: *** [Makefile:180: bundle] Error 1
[jerpeter@fedora sriov-network-operator-upstream]$ operator-sdk version
operator-sdk version: "v1.30.0", commit: "b794fe909abc1affa1f28cfb75ceaf3bf79187e6", kubernetes version: "v1.26.0", go version: "go1.20.5", GOOS: "linux", GOARCH: "amd64"
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated this a bit, I think we don't support csv in upstream. I like other's chime in too but I think we should remove make bundle
completely from upstream (only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SchSeba We need to get this one merged. Please chime in wrt to csv u/s support. I think we should remove make bundle
to avoid confusion. Then I'm ok to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove make bundle to avoid confusion.
Is that going to happen in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @jerpeter1 please prep a PR that does exactly that. We can get that one merged first.
If the kubelet where the the sriov pod is running has gracefulShutdown configured, we'll delay in preStop for a while if and while /tmp/sriov-delay-shutdown exists, up to a maximum wait of 10 minutes (less than the 15 minutes which is specified in the daemonset pod's terminationGracePeriodSeconds).
4a4a629
to
f6f872a
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
@@ -93,6 +93,7 @@ spec: | |||
volumeMounts: | |||
- name: cnibin | |||
mountPath: /host/opt/cni/bin | |||
terminationGracePeriodSeconds: 900 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need 15 minutes to terminate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i think kubelet ShutdownGracePeriod
is intended to be in seconds not minutes.
so in reboot/shutdown we would probably time out and forcefully killed.
@@ -1,12 +1,36 @@ | |||
#!/bin/bash | |||
|
|||
chroot_path="/proc/1/root" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ?
this what /host before.
touch "$chroot_path/var/log/sriov-delay-start" | ||
while [ $(( $(date +%s) - $start )) -lt $wait_time ]; do | ||
if [ ! -f "$delay_shutdown_path" ]; then # don't have to wait anymore | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the meaning of NOT exiting this loop through this break statement ?
@@ -1,12 +1,36 @@ | |||
#!/bin/bash | |||
|
|||
chroot_path="/proc/1/root" | |||
delay_shutdown_path="$chroot_path/tmp/sriov-delay-shutdown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prestop shares the same file system as the container its defined on no?
if yes, why do you need to write this file on the host file system?
# 10 minutes - this should be shorter than the time that is specifed for the | ||
# terminationGracePeriodSeconds in the daemonset's pod spec, so that everything | ||
# else in the preStop hook has time to run and the Pod can be terminated properly. | ||
wait_time=600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont like this assumption, this should be configurable, value provided to config daemon and propagated here.
(e.g the controller reads terminationGracePeriod value and adds env var to the daemon to use)
@@ -52,6 +52,9 @@ const ( | |||
// maxUpdateBackoff is the maximum time to react to a change as we back off | |||
// in the face of errors. | |||
maxUpdateBackoff = 60 * time.Second | |||
|
|||
// the presence of this file indicates that the sriov shutdown should be delayed | |||
delayShutdownPath = "/host/tmp/sriov-delay-shutdown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear to me why we need this file on the host filesystem
if grep "$kubelet_config_path" -e shutdownGracePeriod | grep -qv \"0s\"; then | ||
start=$(date +%s) | ||
touch "$chroot_path/var/log/sriov-delay-start" | ||
while [ $(( $(date +%s) - $start )) -lt $wait_time ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not clear to me why do we need this loop.
the file in delay_shutdown_path is created when config daemon needs reboot.
after witch the daemon exits.
info on kubernetes graceful node shutdown (for others if needed ...): https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ |
Hi @jerpeter1 do you still want to push this one? there are some comments from @adrianchiris and this PR needs a rebase |
Thanks for your PR,
To skip the vendors CIs use one of:
|
If the kubelet where the the sriov pod is running has gracefulShutdown configured, we'll delay in preStop for a while if and while /tmp/sriov-delay-shutdown exists, up to a maximum wait that's less than 15 minutes (which is now specified in the daemonset pod's terminationGracePeriodSeconds).
I wrote this as a possible alternative approach to the implementation in this PR, that was reverted last year.