Skip to content

X-Chain lacks proper cancellation support on abrupt shutdown request #4542

@powerslider

Description

@powerslider

avalanchego-stacktrace.log

Describe the bug:

This is tested on Fuji Testnet.

NOTE: A stacktrace generated via kill -SIGABRT $(pgrep -f avalanchego) is attached to the issue.

When wanting to abruptly cancel the process (with Ctrl+C for example) on:

[11-24|14:54:43.361] INFO <X Chain> bootstrap/bootstrapper.go:612 executing vertices

or

[11-25|13:23:02.162] INFO <X Chain> bootstrap/bootstrapper.go:600 executing transactions

the graceful shutdown procedure shown at this line:

[11-24|14:49:41.845] INFO chains/manager.go:1523 shutting down chain manager

takes around 8 minutes to complete. There is no preconfigured timeout and no possible way to cancel executing operations.

Full logs:

[11-24|14:49:40.274] INFO <X Chain> bootstrap/bootstrapper.go:600 executing transactions
^C[11-24|14:49:41.844] INFO node/node.go:1830 shutting down node {"exitCode": 0}
[11-24|14:49:41.845] INFO health/worker.go:115 registered new check and initialized its state to failing {"name": "health", "name": "shuttingDown", "tags": ["application"]}
[11-24|14:49:41.845] INFO chains/manager.go:1523 shutting down chain manager
[11-24|14:50:10.274] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 56359, "numToExecute": 504344, "eta": "3m58s"}
[11-24|14:50:40.275] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 117438, "numToExecute": 504344, "eta": "3m18s"}
[11-24|14:51:10.702] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 179305, "numToExecute": 504344, "eta": "2m44s"}
[11-24|14:51:40.693] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 226314, "numToExecute": 504344, "eta": "2m28s"}
[11-24|14:52:10.689] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 282297, "numToExecute": 504344, "eta": "1m58s"}
[11-24|14:52:40.689] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 331026, "numToExecute": 504344, "eta": "1m34s"}
[11-24|14:53:10.695] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 372588, "numToExecute": 504344, "eta": "1m14s"}
[11-24|14:53:40.696] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 412489, "numToExecute": 504344, "eta": "54s"}
[11-24|14:54:10.697] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 457616, "numToExecute": 504344, "eta": "28s"}
[11-24|14:54:40.698] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 498126, "numToExecute": 504344, "eta": "4s"}
[11-24|14:54:43.361] INFO <X Chain> queue/jobs.go:223 executed operations {"numExecuted": 504344}
[11-24|14:54:43.361] INFO <X Chain> bootstrap/bootstrapper.go:612 executing vertices
[11-24|14:55:13.361] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 115235, "numToExecute": 449999, "eta": "1m27s"}
[11-24|14:55:43.362] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 225740, "numToExecute": 449999, "eta": "1m0s"}
[11-24|14:56:13.362] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 333288, "numToExecute": 449999, "eta": "32s"}
[11-24|14:56:43.363] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 420723, "numToExecute": 449999, "eta": "8s"}
[11-24|14:56:52.678] INFO <X Chain> queue/jobs.go:223 executed operations {"numExecuted": 449999}
[11-24|14:56:52.680] INFO <X Chain> proposervm/vm.go:207 initialized proposervm {"state": "before fork"}
[11-24|14:56:52.680] INFO <X Chain> bootstrap/bootstrapper.go:192 starting bootstrapper {"lastAcceptedID": "2tCdy3dRsf6rYG7jtBNSyANQ6StrAEYpCDMLBQyGRLF8wG5KPK", "lastAcceptedHeight": 0}
[11-24|14:56:52.681] INFO router/chain_router.go:374 shutting down chain router
[11-24|14:56:52.681] INFO <X Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper
[11-24|14:56:52.681] INFO <C Chain> syncer/state_syncer.go:595 shutting down state syncer
[11-24|14:56:52.681] INFO <X Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "35.25µs"}
[11-24|14:56:52.681] INFO <P Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper
[11-24|14:56:52.681] INFO <P Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "344.041µs"}
[11-24|14:56:52.682] INFO <C Chain> core/txpool/legacypool/legacypool.go:449 Transaction pool stopped
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:969 Closing quit channel
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:972 Stopping Acceptor
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:975 Acceptor queue drained                   t="4.833µs"
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:978 Shutting down sender cacher
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:982 Closing scope
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:986 Waiting for background processes to complete
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1005 Shutting down state manager
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1010 State manager shut down                  t=166ns
[11-24|14:56:52.682] INFO <C Chain> core/blockchain.go:1015 Blockchain stopped
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:404 Stopped shutdownTracker
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:407 Closed chaindb
[11-24|14:56:52.682] INFO <C Chain> eth/backend.go:409 Stopped EventMux
[11-24|14:56:52.682] INFO <C Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "1.194708ms"}
[11-24|14:56:52.682] INFO network/network.go:1142 shutting down the p2p networking
[11-24|14:56:52.683] INFO nat/nat.go:193 Unmapped all ports
[11-24|14:56:52.684] INFO node/node.go:1888 cleaning up plugin runtimes
[11-24|14:56:52.685] INFO node/node.go:1916 finished node shutdown
~/go/src/github.com/ava-labs/avalanchego/build/avalanchego   459.70s user 75.58s system 121% cpu 7:19.52 total

Affected Code:

  • snow/engine/avalanche/bootstrap/bootstrapper.go:600-622 - Calls ExecuteAll for transactions and vertices
  • snow/engine/avalanche/bootstrap/queue/jobs.go:112-283 - ExecuteAll function that doesn't check context cancellation
  • snow/engine/avalanche/bootstrap/tx_job.go:63-99 - Transaction execution
  • snow/engine/avalanche/bootstrap/vertex_job.go:84-115 - Vertex execution

Operating System

macOS (darwin 24.6.0)

Proposed Solutions:

  1. Add timeout mechanism: Implement a timeout for ExecuteAll operations to prevent indefinite hangs.

  2. Respect context in job execution: Ensure that job.Execute(ctx) operations respect context cancellation and can be interrupted.

  3. Improve halter mechanism: Ensure the halter is set immediately when shutdown is initiated, or provide a more reliable shutdown signal mechanism.

To best protect the Avalanche community security bugs should be reported in accordance to our Security Policy

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    Status

    Backlog 🧊

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions