-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update demo #48
base: main
Are you sure you want to change the base?
Update demo #48
Conversation
@@ -15,6 +17,10 @@ kubectl apply -f charts/overrides/kwok/pod-complete.yml | |||
kubectl apply -f https://github.com/${KWOK_REPO}/raw/main/kustomize/stage/pod/chaos/pod-init-container-running-failed.yaml | |||
kubectl apply -f https://github.com/${KWOK_REPO}/raw/main/kustomize/stage/pod/chaos/pod-container-running-failed.yaml | |||
|
|||
# Set up virtual nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to deploy nodes separately if we are using Configure task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you try to rebase/rebuild? I'm not seeing this issue
$ ./bin/knavigator -tasks ./resources/tests/k8s/test-job.yml
I0520 12:08:39.910353 1099652 k8s_config.go:42] "Using external kubeconfig"
I0520 12:08:39.915986 1099652 main.go:84] "Starting test" name="test-k8s-job"
I0520 12:08:39.916034 1099652 engine.go:111] "Creating task" name="RegisterObj" id="register"
I0520 12:08:39.916580 1099652 engine.go:247] "Starting task" id="RegisterObj/register"
I0520 12:08:39.916600 1099652 engine.go:253] "Task completed" id="RegisterObj/register" duration="3.535µs"
I0520 12:08:39.916612 1099652 engine.go:111] "Creating task" name="Configure" id="configure"
I0520 12:08:39.916795 1099652 engine.go:247] "Starting task" id="Configure/configure"
I0520 12:08:40.802256 1099652 engine.go:253] "Task completed" id="Configure/configure" duration="885.42569ms"
I0520 12:08:40.802304 1099652 engine.go:111] "Creating task" name="SubmitObj" id="job"
I0520 12:08:40.802636 1099652 engine.go:247] "Starting task" id="SubmitObj/job"
I0520 12:08:40.850344 1099652 engine.go:253] "Task completed" id="SubmitObj/job" duration="47.67867ms"
I0520 12:08:40.850383 1099652 engine.go:111] "Creating task" name="CheckPod" id="status"
I0520 12:08:40.850559 1099652 engine.go:247] "Starting task" id="CheckPod/status"
I0520 12:08:40.850576 1099652 check_pod_task.go:158] "Create pod informer" #pod=2 timeout="5s"
I0520 12:08:40.971440 1099652 check_pod_task.go:256] "Accounted for all pods"
I0520 12:08:40.971488 1099652 engine.go:253] "Task completed" id="CheckPod/status" duration="120.910655ms"
I0520 12:08:40.971503 1099652 engine.go:259] "Reset Engine"
$ k get no
NAME STATUS ROLES AGE VERSION
test-control-plane Ready control-plane 11d v1.29.2
virtual-dgxa100.80g-0 Ready agent 9d fake
virtual-dgxa100.80g-1 Ready agent 3h58m fake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you've added a Configure
task to the test, but I've noticed the following issues with the latest virtual node configuration. Can you take a look?
- Virtual nodes created in the task are
NotReady
.
$ k get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 2m12s v1.30.0
virtual-dgxa100.80g-0 NotReady agent 118s fake
virtual-dgxa100.80g-1 NotReady agent 118s fake
- The job shows
Running
while the pods arePending
.
$ k get job
NAME STATUS COMPLETIONS DURATION AGE
job1 Running 0/2 15s 15s
$k get pods
NAME READY STATUS RESTARTS AGE
job1-0-254qd 0/1 Pending 0 18s
job1-1-vsgkl 0/1 Pending 0 18s
- Running the test with the
Configure
task will remove the virtual nodes that were created by helm before. - Deleting the test job will make all virtual nodes (from helm and task) become
NotReady
. - Uninstalling the virtual node helm chart will remove all virtual nodes, including the those that were configured in a task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a brand new kind cluster and try it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same. The nodes were Ready
first and then became NotReaday
. The pods are Pending
. The test failed. BTW, I ran the test in minikube before.
$ k get nodes
NAME STATUS ROLES AGE VERSION
test-control-plane Ready control-plane 106s v1.29.2
virtual-dgxa100.80g-0 NotReady agent 48s fake
virtual-dgxa100.80g-1 NotReady agent 48s fake
$ k get pods
NAME READY STATUS RESTARTS AGE
job1-0-892gm 0/1 Pending 0 61s
job1-1-r8gcf 0/1 Pending 0 61s
$ k get jobs
NAME COMPLETIONS DURATION AGE
job1 0/2 65s 65s
Signed-off-by: Yuan Chen <yuanc@nvidia.com>
This PR updates the demo script and recreates the demo svg file using the latest helm chart for creating virtual nodes.