-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl wait
for un-existed resource.
#1516
Comments
/sig cli |
Can take a look into this. /assign @rikatz |
@kvokka just a question: Deleted conditions are also something you're looking for? I imagine a situation where you wan't a deleted condition for an object that wasn't even created. This seems pretty strange to me :) but let me know if this is also a scenario. Tks |
@rikatz Thank you for your response! For me simple wait until timeout is more than enough. If the developer want to control the object persistence/deletion just let him do it. Sounds reasonable? The example scenario of the expected behaviour is described in this article. |
Right. It might be pretty trickier than I thought it was, but I'm already taking a look. The biggest problem is that the function used to 'visit' an object expects it to exists (ResourceFinder.Do().Visit) so I'm taking a look to check if it's possible to 'bypass/loop' into it ;) |
I've made an initial and dirty PR just to see if this is the path to follow :D |
Thank you for the contribution! Will hope the code will be reviewed/merged soon! :) |
OK, so I need some review :/ The dumbest way is what I did...A sleep with 1s. Not sure why ResourceFinder is used here and if something more "flexible" could be used, so need someone with more experience in Kubernetes Code to review that. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale Got some time to resolve some other stuff, but this is still a thing |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Had the same problem, I am using an script to create all my k8s objects, then wait for a particular pod to be ready. I have a race condition, when the wait executes the object apparently is not yet created. This issue force me to sleep for a few seconds before issuing the wait. kubectl apply -f foo.yaml
sleep 5 # Just to avoid the wait err below
kubectl -n ns wait pod --for=condition=ready -l name=pod --timeout=120s |
/remove-lifecycle stale |
Here is my current workaround using Bash. It differs from the other workarounds published here in that it has a max-wait-time. It uses the return code from # Wait for Pods with a certain name prefix to exist.
# The wait is over when at least one Pod exist or the max wait time is reached.
#
# $1 : namespace
# $2 : pod name prefix (wildcards not allowed)
# $3 : the maximum time to wait in seconds
#
# The command is useful in combination with 'kubectl wait' which can wait for a certain condition,
# but cannot wait for existence.
wait_for_pods_to_exist() {
local ns=$1
local pod_name_prefix=$2
local max_wait_secs=$3
local interval_secs=2
local start_time=$(date +%s)
while true; do
current_time=$(date +%s)
if (( (current_time - start_time) > max_wait_secs )); then
echo "Waited for pods in namespace \"$ns\" with name prefix \"$pod_name_prefix\" to exist for $max_wait_secs seconds without luck. Returning with error."
return 1
fi
if kubectl -n $ns describe pod $pod_name_prefix --request-timeout "5s" &> /dev/null; then
break
else
sleep $interval_secs
fi
done
} .. and use like this in combination with # wait up 20 secs for Pod to exist:
wait_for_pods_to_exist "mynamespace" "mypodname" 20
# ...and then wait for state 'Ready' for up to 30 secs:
kubectl --namespace "mynamespace" wait --for=condition=ready pod/mypodname (I haven't found a need to wait for any other resource type than pods, but the above can easily be generalized) I truly believe that 'wait-for-existence' should be something
Programmatically, it is easier done in |
@lbruun thx a lot for the script. A little better could be to use the following instead of waiting for ready state:
This waits for the pod to be completely ready. With waiting only for the ready state, I had problems because it was not completely deployed. |
It seems that there is a high demand for that feature and I'd like to try my chance to propose a viable solution. For the backwards compatibility, it would be better to introduce a new flag for this functionality and keep the current behavior without it; /assign |
/triage accepted |
@ardaguclu - any update? Thanks :) |
Thanks for the reminder because I forgot this issue :). I'll prioritize this. |
this would be great to have. just ran into this and had to cook up some ugly bash scripting. |
@seastco as you can see this issue has been already fixed and closed. Its available in: v1.32.0-alpha.0 v1.31.1 v1.31.0 v1.31.0-rc.1 v1.31.0-rc.0 v1.31.0-beta.0 v1.31.0-alpha.3 as you can see in commit kubernetes/kubernetes@b95fce1. |
Note that kubernetes/kubernetes@e24b9a0 (the actual commit implementing this feature) was reverted with kubernetes/kubernetes#125630 and the follow-up attempt (kubernetes/kubernetes#125632) was closed with the plan to instead expand the So this issue probably should be reopened. |
This kubernetes/kubernetes#125868 was the final decision and can be used. |
In the UDN tests we were using the wait command to pull the condition of the UDN to see if its ready or created. However if the UDN doesn't exist, then wait won't poll and retry it will immediately exit because it assumes the resource exists already. See kubernetes/kubectl#1516 for details. The recommended fix is to use: kubernetes/kubernetes#125868 which has been added to 1.31 Kube. This PR changes that. Found during CI debugging. See sample flake: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-cgroupsv1-techpreview/1877195896546922496 blob:https://prow.ci.openshift.org/b8f5e892-9512-4f54-9176-772d131df102 STEP: create tests UserDefinedNetwork @ 01/09/25 04:54:51.335 I0109 04:54:51.335566 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf create -f /tmp/udn-test1009896207/test-ovn-k-udn-hr84p.yaml' I0109 04:54:51.683868 82622 builder.go:146] stderr: "" I0109 04:54:51.683898 82622 builder.go:147] stdout: "userdefinednetwork.k8s.ovn.org/test-net created\n" I0109 04:54:51.684017 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s' I0109 04:54:52.088420 82622 builder.go:135] rc: 1 [FAILED] in [BeforeEach] - github.com/openshift/origin/test/extended/networking/network_segmentation.go:568 @ 01/09/25 04:54:52.088 STEP: Collecting events from namespace "e2e-test-network-segmentation-e2e-xxcxf". @ 01/09/25 04:54:52.088 STEP: Found 0 events. @ 01/09/25 04:54:52.118 I0109 04:54:52.136664 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.136718 82622 resource.go:178] I0109 04:54:52.199221 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.272872 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-network-segmentation-e2e-xxcxf-user}, err: <nil> I0109 04:54:52.322560 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-network-segmentation-e2e-xxcxf}, err: <nil> I0109 04:54:52.365090 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~s7HMlKu1k8aprv5mU8JP_uzaAAuujg4_SOH_ds56VPk}, err: <nil> STEP: Collecting events from namespace "e2e-test-monitoring-collection-profiles-vcqhj". @ 01/09/25 04:54:52.365 STEP: Found 0 events. @ 01/09/25 04:54:52.399 I0109 04:54:52.432784 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.432808 82622 resource.go:178] I0109 04:54:52.461376 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.487845 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-monitoring-collection-profiles-vcqhj-user}, err: <nil> I0109 04:54:52.536344 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-monitoring-collection-profiles-vcqhj}, err: <nil> I0109 04:54:52.599493 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~hBHLTqI3PVJQevO8vllBEBy8FwOsd3l1fvTZcYz9Xxw}, err: <nil> STEP: Destroying namespace "e2e-test-network-segmentation-e2e-xxcxf" for this suite. @ 01/09/25 04:54:52.599 STEP: Destroying namespace "e2e-test-monitoring-collection-profiles-vcqhj" for this suite. @ 01/09/25 04:54:52.62 • [FAILED] [4.347 seconds] [sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes UserDefinedNetwork [BeforeEach] pod connected to UserDefinedNetwork cannot be deleted when being used [Suite:openshift/conformance/parallel] [BeforeEach] github.com/openshift/origin/test/extended/networking/network_segmentation.go:563 [It] github.com/openshift/origin/test/extended/networking/network_segmentation.go:612 [FAILED] Expected success, but got an error: <exec.CodeExitError>: error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s: Command stdout: stderr: Error from server (NotFound): userdefinednetworks.k8s.ovn.org "test-net" not found error: exit status 1 { Err: <*errors.errorString | 0xc001de3990>{ s: "error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s:\nCommand stdout:\n\nstderr:\nError from server (NotFound): userdefinednetworks.k8s.ovn.org \"test-net\" not found\n\nerror:\nexit status 1", }, Code: 1, } as you can see the UDN was created and applied and yet when tried to be fetched it returned "not found" and consecutively we quit immediately instead of retrying. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
In the UDN tests we were using the wait command to pull the condition of the UDN to see if its ready or created. However if the UDN doesn't exist, then wait won't poll and retry it will immediately exit because it assumes the resource exists already. See kubernetes/kubectl#1516 for details. The recommended fix is to use: kubernetes/kubernetes#125868 which has been added to 1.31 Kube. This PR changes that. Found during CI debugging. See sample flake: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-cgroupsv1-techpreview/1877195896546922496 blob:https://prow.ci.openshift.org/b8f5e892-9512-4f54-9176-772d131df102 STEP: create tests UserDefinedNetwork @ 01/09/25 04:54:51.335 I0109 04:54:51.335566 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf create -f /tmp/udn-test1009896207/test-ovn-k-udn-hr84p.yaml' I0109 04:54:51.683868 82622 builder.go:146] stderr: "" I0109 04:54:51.683898 82622 builder.go:147] stdout: "userdefinednetwork.k8s.ovn.org/test-net created\n" I0109 04:54:51.684017 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s' I0109 04:54:52.088420 82622 builder.go:135] rc: 1 [FAILED] in [BeforeEach] - github.com/openshift/origin/test/extended/networking/network_segmentation.go:568 @ 01/09/25 04:54:52.088 STEP: Collecting events from namespace "e2e-test-network-segmentation-e2e-xxcxf". @ 01/09/25 04:54:52.088 STEP: Found 0 events. @ 01/09/25 04:54:52.118 I0109 04:54:52.136664 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.136718 82622 resource.go:178] I0109 04:54:52.199221 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.272872 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-network-segmentation-e2e-xxcxf-user}, err: <nil> I0109 04:54:52.322560 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-network-segmentation-e2e-xxcxf}, err: <nil> I0109 04:54:52.365090 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~s7HMlKu1k8aprv5mU8JP_uzaAAuujg4_SOH_ds56VPk}, err: <nil> STEP: Collecting events from namespace "e2e-test-monitoring-collection-profiles-vcqhj". @ 01/09/25 04:54:52.365 STEP: Found 0 events. @ 01/09/25 04:54:52.399 I0109 04:54:52.432784 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.432808 82622 resource.go:178] I0109 04:54:52.461376 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.487845 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-monitoring-collection-profiles-vcqhj-user}, err: <nil> I0109 04:54:52.536344 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-monitoring-collection-profiles-vcqhj}, err: <nil> I0109 04:54:52.599493 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~hBHLTqI3PVJQevO8vllBEBy8FwOsd3l1fvTZcYz9Xxw}, err: <nil> STEP: Destroying namespace "e2e-test-network-segmentation-e2e-xxcxf" for this suite. @ 01/09/25 04:54:52.599 STEP: Destroying namespace "e2e-test-monitoring-collection-profiles-vcqhj" for this suite. @ 01/09/25 04:54:52.62 • [FAILED] [4.347 seconds] [sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes UserDefinedNetwork [BeforeEach] pod connected to UserDefinedNetwork cannot be deleted when being used [Suite:openshift/conformance/parallel] [BeforeEach] github.com/openshift/origin/test/extended/networking/network_segmentation.go:563 [It] github.com/openshift/origin/test/extended/networking/network_segmentation.go:612 [FAILED] Expected success, but got an error: <exec.CodeExitError>: error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s: Command stdout: stderr: Error from server (NotFound): userdefinednetworks.k8s.ovn.org "test-net" not found error: exit status 1 { Err: <*errors.errorString | 0xc001de3990>{ s: "error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s:\nCommand stdout:\n\nstderr:\nError from server (NotFound): userdefinednetworks.k8s.ovn.org \"test-net\" not found\n\nerror:\nexit status 1", }, Code: 1, } as you can see the UDN was created and applied and yet when tried to be fetched it returned "not found" and consecutively we quit immediately instead of retrying. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
In the UDN tests we were using the wait command to pull the condition of the UDN to see if its ready or created. However if the UDN doesn't exist, then wait won't poll and retry it will immediately exit because it assumes the resource exists already. See kubernetes/kubectl#1516 for details. The recommended fix is to use: kubernetes/kubernetes#125868 which has been added to 1.31 Kube. This PR changes that. Found during CI debugging. See sample flake: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-cgroupsv1-techpreview/1877195896546922496 blob:https://prow.ci.openshift.org/b8f5e892-9512-4f54-9176-772d131df102 STEP: create tests UserDefinedNetwork @ 01/09/25 04:54:51.335 I0109 04:54:51.335566 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf create -f /tmp/udn-test1009896207/test-ovn-k-udn-hr84p.yaml' I0109 04:54:51.683868 82622 builder.go:146] stderr: "" I0109 04:54:51.683898 82622 builder.go:147] stdout: "userdefinednetwork.k8s.ovn.org/test-net created\n" I0109 04:54:51.684017 82622 builder.go:121] Running '/usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s' I0109 04:54:52.088420 82622 builder.go:135] rc: 1 [FAILED] in [BeforeEach] - github.com/openshift/origin/test/extended/networking/network_segmentation.go:568 @ 01/09/25 04:54:52.088 STEP: Collecting events from namespace "e2e-test-network-segmentation-e2e-xxcxf". @ 01/09/25 04:54:52.088 STEP: Found 0 events. @ 01/09/25 04:54:52.118 I0109 04:54:52.136664 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.136718 82622 resource.go:178] I0109 04:54:52.199221 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.272872 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-network-segmentation-e2e-xxcxf-user}, err: <nil> I0109 04:54:52.322560 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-network-segmentation-e2e-xxcxf}, err: <nil> I0109 04:54:52.365090 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~s7HMlKu1k8aprv5mU8JP_uzaAAuujg4_SOH_ds56VPk}, err: <nil> STEP: Collecting events from namespace "e2e-test-monitoring-collection-profiles-vcqhj". @ 01/09/25 04:54:52.365 STEP: Found 0 events. @ 01/09/25 04:54:52.399 I0109 04:54:52.432784 82622 resource.go:168] POD NODE PHASE GRACE CONDITIONS I0109 04:54:52.432808 82622 resource.go:178] I0109 04:54:52.461376 82622 dump.go:81] skipping dumping cluster info - cluster too large I0109 04:54:52.487845 82622 client.go:638] Deleted {user.openshift.io/v1, Resource=users e2e-test-monitoring-collection-profiles-vcqhj-user}, err: <nil> I0109 04:54:52.536344 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-monitoring-collection-profiles-vcqhj}, err: <nil> I0109 04:54:52.599493 82622 client.go:638] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~hBHLTqI3PVJQevO8vllBEBy8FwOsd3l1fvTZcYz9Xxw}, err: <nil> STEP: Destroying namespace "e2e-test-network-segmentation-e2e-xxcxf" for this suite. @ 01/09/25 04:54:52.599 STEP: Destroying namespace "e2e-test-monitoring-collection-profiles-vcqhj" for this suite. @ 01/09/25 04:54:52.62 • [FAILED] [4.347 seconds] [sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes UserDefinedNetwork [BeforeEach] pod connected to UserDefinedNetwork cannot be deleted when being used [Suite:openshift/conformance/parallel] [BeforeEach] github.com/openshift/origin/test/extended/networking/network_segmentation.go:563 [It] github.com/openshift/origin/test/extended/networking/network_segmentation.go:612 [FAILED] Expected success, but got an error: <exec.CodeExitError>: error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s: Command stdout: stderr: Error from server (NotFound): userdefinednetworks.k8s.ovn.org "test-net" not found error: exit status 1 { Err: <*errors.errorString | 0xc001de3990>{ s: "error running /usr/bin/kubectl --server=https://api.ci-op-i9rwlkcc-1d795.aws-2.ci.openshift.org:6443 --kubeconfig=/tmp/kubeconfig-916539295 --namespace=e2e-test-network-segmentation-e2e-xxcxf wait userdefinednetwork test-net --for condition=NetworkReady=True --timeout 5s:\nCommand stdout:\n\nstderr:\nError from server (NotFound): userdefinednetworks.k8s.ovn.org \"test-net\" not found\n\nerror:\nexit status 1", }, Code: 1, } as you can see the UDN was created and applied and yet when tried to be fetched it returned "not found" and consecutively we quit immediately instead of retrying. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
What happened:
I can not avoid exit with error for uncreated resource, which is misleading by the command name:
kubectl wait --selector=foo=bar --for=condition=complete jobs
kubectl wait --for=condition=complete jobs/foo
Even if this behavior if intentional the user should have an option to continue waiting instead of exit with error code.
In the contrast, if the resource already exists everything works as it should.
What you expected to happen:
Kubectl should at least to have the ability (option) to wait un-existed resource.
Anything else we need to know?:
Connect kubernetes/kubernetes#75227
Environment:
kubectl version
): 1.15.2cat /etc/os-release
): mac Os 10.14.6uname -a
): Darwin Kernel Version 18.7.0The text was updated successfully, but these errors were encountered: