Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout flag for the network command destroy subcommand #815

Closed
5 tasks done
Tracked by #47
jeromy-cannon opened this issue Nov 8, 2024 · 3 comments · Fixed by #821
Closed
5 tasks done
Tracked by #47

Add a timeout flag for the network command destroy subcommand #815

jeromy-cannon opened this issue Nov 8, 2024 · 3 comments · Fixed by #821
Assignees
Labels
P0 An issue impacting production environments or impacting multiple releases or multiple individuals. Requested by Stakeholder Requested by an individual or team that uses Solo

Comments

@jeromy-cannon
Copy link
Contributor

jeromy-cannon commented Nov 8, 2024

Per Alex Kuzmin, when there are pods currently in Pending state, then the following command:

solo network destroy --namespace "${SOLO_NAMESPACE}" --delete-pvcs --delete-secrets --force

will run/hang indefinitely.

Note: he is using the taskfile.yaml clean target in the examples folder. So we should update the [solo:network:destroy](https://github.com/hashgraph/solo/blob/ec63a659ae325ab1631409a76e6894deccdb0ed4/examples/custom-network-config/Taskfile.yml#L110-L110) target with the recommended timeout also. examples/custom-network-config/Taskfile.yml

Tasks

@jeromy-cannon jeromy-cannon added P0 An issue impacting production environments or impacting multiple releases or multiple individuals. Requested by Stakeholder Requested by an individual or team that uses Solo labels Nov 8, 2024
@alex-kuzmin-hg
Copy link
Contributor

New symptom: it is a hanging in cleaning healthy nodes. This step was always working fine before

hashsphere1@s05:~/workspaces/10nodes/solo$ task -t Taskfile.yml clean
task: [solo:node:stop] npm run solo-test -- node stop --namespace "${SOLO_NAMESPACE}" --node-aliases-unparsed node0,node1,node2,node3,node4,node5,node6 
[solo:node:stop] 
[solo:node:stop] > @hashgraph/solo@0.99.0 solo-test
[solo:node:stop] > node --no-deprecation --no-warnings --loader ts-node/esm solo.ts node stop --namespace solo-hashsphere1 --node-aliases-unparsed node0,node1,node2,node3,node4,node5,node6
[solo:node:stop] 
[solo:node:stop] 
[solo:node:stop] ******************************* Solo *********************************************
[solo:node:stop] Version			: 0.99.0
[solo:node:stop] Kubernetes Context	: gke_hashsphere-staging_us-central1_sphere-load-test-us-central
[solo:node:stop] Kubernetes Cluster	: gke_hashsphere-staging_us-central1_sphere-load-test-us-central
[solo:node:stop] Kubernetes Namespace	: solo-hashsphere1
[solo:node:stop] **********************************************************************************
[solo:node:stop] ❯ Initialize
[solo:node:stop] ❯ Acquire lease
[solo:node:stop] ✔ Acquire lease - lease acquired successfully, attempt: 1/10
[solo:node:stop] ✔ Initialize
[solo:node:stop] ❯ Identify network pods
[solo:node:stop] ❯ Check network pod: node0
[solo:node:stop] ❯ Check network pod: node1
[solo:node:stop] ❯ Check network pod: node2
[solo:node:stop] ❯ Check network pod: node3
[solo:node:stop] ❯ Check network pod: node4
[solo:node:stop] ❯ Check network pod: node5
[solo:node:stop] ❯ Check network pod: node6
^\SIGQUIT: quit
PC=0x473721 m=0 sigcode=128

goroutine 7 gp=0xc000133c00 m=0 mp=0x135b960 [syscall]:
runtime.notetsleepg(0x13bc500, 0xffffffffffffffff)
	runtime/lock_futex.go:246 +0x29 fp=0xc00049a7a0 sp=0xc00049a778 pc=0x4105a9
os/signal.signal_recv()
	runtime/sigqueue.go:152 +0x29 fp=0xc00049a7c0 sp=0xc00049a7a0 pc=0x46e589
os/signal.loop()
	os/signal/signal_unix.go:23 +0x13 fp=0xc00049a7e0 sp=0xc00049a7c0 pc=0xa6c093
runtime.goexit({})
	runtime/asm_amd64.s:1695 +0x1 fp=0xc00049a7e8 sp=0xc00049a7e0 pc=0x471921
created by os/signal.Notify.func1.1 in goroutine 1
	os/signal/signal.go:151 +0x1f

goroutine 1 gp=0xc0000061c0 m=3 mp=0xc0000b3008 [syscall]:
syscall.Syscall6(0xf7, 0x1, 0xba98b, 0xc00010d978, 0x1000004, 0x0, 0x0)
	syscall/syscall_linux.go:91 +0x39 fp=0xc00010d940 sp=0xc00010d8e0 pc=0x4886f9
os.(*Process).blockUntilWaitable(0xc0003da3c0)
	os/wait_waitid.go:32 +0x76 fp=0xc00010da18 sp=0xc00010d940 pc=0x4f65b6
os.(*Process).wait(0xc0003da3c0)
	os/exec_unix.go:22 +0x25 fp=0xc00010da78 sp=0xc00010da18 pc=0x4f04a5
os.(*Process).Wait(...)
	os/exec.go:134
os/exec.(*Cmd).Wait(0xc00001e180)
	os/exec/exec.go:906 +0x45 fp=0xc00010dad8 sp=0xc00010da78 pc=0x6e0b45

@JeffreyDallas
Copy link
Contributor

JeffreyDallas commented Nov 13, 2024

So it starts hanging at node stop step, before node destroy step ?

Can you attach ~/.solo/logs/solo.log and also use k9s to check what are status of network pods ?

@alex-kuzmin-hg
Copy link
Contributor

image (1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 An issue impacting production environments or impacting multiple releases or multiple individuals. Requested by Stakeholder Requested by an individual or team that uses Solo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants