-
Notifications
You must be signed in to change notification settings - Fork 837
Description
Describe the bug:
This is tested on Fuji Testnet.
NOTE: A stacktrace generated via kill -SIGABRT $(pgrep -f avalanchego) is attached to the issue.
When wanting to abruptly cancel the process (with Ctrl+C for example) on:
[11-24|14:54:43.361] INFO <X Chain> bootstrap/bootstrapper.go:612 executing vertices
or
[11-25|13:23:02.162] INFO <X Chain> bootstrap/bootstrapper.go:600 executing transactions
the graceful shutdown procedure shown at this line:
[11-24|14:49:41.845] INFO chains/manager.go:1523 shutting down chain manager
takes around 8 minutes to complete. There is no preconfigured timeout and no possible way to cancel executing operations.
Full logs:
[11-24|14:49:40.274] INFO <X Chain> bootstrap/bootstrapper.go:600 executing transactions
^C[11-24|14:49:41.844] INFO node/node.go:1830 shutting down node {"exitCode": 0}
[11-24|14:49:41.845] INFO health/worker.go:115 registered new check and initialized its state to failing {"name": "health", "name": "shuttingDown", "tags": ["application"]}
[11-24|14:49:41.845] INFO chains/manager.go:1523 shutting down chain manager
[11-24|14:50:10.274] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 56359, "numToExecute": 504344, "eta": "3m58s"}
[11-24|14:50:40.275] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 117438, "numToExecute": 504344, "eta": "3m18s"}
[11-24|14:51:10.702] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 179305, "numToExecute": 504344, "eta": "2m44s"}
[11-24|14:51:40.693] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 226314, "numToExecute": 504344, "eta": "2m28s"}
[11-24|14:52:10.689] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 282297, "numToExecute": 504344, "eta": "1m58s"}
[11-24|14:52:40.689] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 331026, "numToExecute": 504344, "eta": "1m34s"}
[11-24|14:53:10.695] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 372588, "numToExecute": 504344, "eta": "1m14s"}
[11-24|14:53:40.696] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 412489, "numToExecute": 504344, "eta": "54s"}
[11-24|14:54:10.697] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 457616, "numToExecute": 504344, "eta": "28s"}
[11-24|14:54:40.698] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 498126, "numToExecute": 504344, "eta": "4s"}
[11-24|14:54:43.361] INFO <X Chain> queue/jobs.go:223 executed operations {"numExecuted": 504344}
[11-24|14:54:43.361] INFO <X Chain> bootstrap/bootstrapper.go:612 executing vertices
[11-24|14:55:13.361] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 115235, "numToExecute": 449999, "eta": "1m27s"}
[11-24|14:55:43.362] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 225740, "numToExecute": 449999, "eta": "1m0s"}
[11-24|14:56:13.362] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 333288, "numToExecute": 449999, "eta": "32s"}
[11-24|14:56:43.363] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 420723, "numToExecute": 449999, "eta": "8s"}
[11-24|14:56:52.678] INFO <X Chain> queue/jobs.go:223 executed operations {"numExecuted": 449999}
[11-24|14:56:52.680] INFO <X Chain> proposervm/vm.go:207 initialized proposervm {"state": "before fork"}
[11-24|14:56:52.680] INFO <X Chain> bootstrap/bootstrapper.go:192 starting bootstrapper {"lastAcceptedID": "2tCdy3dRsf6rYG7jtBNSyANQ6StrAEYpCDMLBQyGRLF8wG5KPK", "lastAcceptedHeight": 0}
[11-24|14:56:52.681] INFO router/chain_router.go:374 shutting down chain router
[11-24|14:56:52.681] INFO <X Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper
[11-24|14:56:52.681] INFO <C Chain> syncer/state_syncer.go:595 shutting down state syncer
[11-24|14:56:52.681] INFO <X Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "35.25µs"}
[11-24|14:56:52.681] INFO <P Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper
[11-24|14:56:52.681] INFO <P Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "344.041µs"}
[11-24|14:56:52.682] INFO <C Chain> core/txpool/legacypool/legacypool.go:449 Transaction pool stopped
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:969 Closing quit channel
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:972 Stopping Acceptor
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:975 Acceptor queue drained t="4.833µs"
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:978 Shutting down sender cacher
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:982 Closing scope
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:986 Waiting for background processes to complete
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1005 Shutting down state manager
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1010 State manager shut down t=166ns
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1015 Blockchain stopped
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:404 Stopped shutdownTracker
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:407 Closed chaindb
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:409 Stopped EventMux
[11-24|14:56:52.682] INFO <C Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "1.194708ms"}
[11-24|14:56:52.682] INFO network/network.go:1142 shutting down the p2p networking
[11-24|14:56:52.683] INFO nat/nat.go:193 Unmapped all ports
[11-24|14:56:52.684] INFO node/node.go:1888 cleaning up plugin runtimes
[11-24|14:56:52.685] INFO node/node.go:1916 finished node shutdown
~/go/src/github.com/ava-labs/avalanchego/build/avalanchego 459.70s user 75.58s system 121% cpu 7:19.52 total
Affected Code:
snow/engine/avalanche/bootstrap/bootstrapper.go:600-622- CallsExecuteAllfor transactions and verticessnow/engine/avalanche/bootstrap/queue/jobs.go:112-283-ExecuteAllfunction that doesn't check context cancellationsnow/engine/avalanche/bootstrap/tx_job.go:63-99- Transaction executionsnow/engine/avalanche/bootstrap/vertex_job.go:84-115- Vertex execution
Operating System
macOS (darwin 24.6.0)
Proposed Solutions:
-
Add timeout mechanism: Implement a timeout for
ExecuteAlloperations to prevent indefinite hangs. -
Respect context in job execution: Ensure that
job.Execute(ctx)operations respect context cancellation and can be interrupted. -
Improve halter mechanism: Ensure the halter is set immediately when shutdown is initiated, or provide a more reliable shutdown signal mechanism.
To best protect the Avalanche community security bugs should be reported in accordance to our Security Policy
Metadata
Metadata
Assignees
Labels
Type
Projects
Status