Skip to content

Conversation

@yacovm
Copy link
Contributor

@yacovm yacovm commented Dec 1, 2025

Why this should be merged

Currently, when the X-chain executes transactions while bootstrapping, it cannot halt until it finishes executing the current batch of transactions, because as seen in the stack trace below, the X-chain's bootstrapping is done from within the chain creation goroutine. In order to stop the bootstrapper of the X-chain, the chain router needs to be stopped which in turn stops the various chains. However, before stopping the chain router, the chain manager's shutdown procedure waits for the chain creation procedure to finish, which cannot finish before the bootstrapping is done.

How this works

Fortunately, it's safe to close the chain creation queue first, which prevents new chains from forming, and then proceed to stop the chain router and only afterwards to wait for the chain creation to finish.

The reason it's safe is because the chain router simply calls Stop() on all chains, and each chain's stop procedure simply calls thread safe methods.

How this was tested

When compiling a bootstrapping Fuji node with race detector and shutting down the chain, it doesn't complain:

[12-01|19:37:36.630] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 4157, "numToExecute": 248432, "eta": "29m23s"}
 ^C[12-01|19:37:42.065] INFO node/node.go:1841 shutting down node {"exitCode": 0} 
[12-01|19:37:42.065] INFO health/worker.go:115 registered new check and initialized its state to failing {"name": "health", "name": "shuttingDown", "tags": ["application"]} 
[12-01|19:37:42.066] INFO chains/manager.go:1526 shutting down chain manager 
[12-01|19:37:42.066] INFO router/chain_router.go:374 shutting down chain router 
[12-01|19:37:42.067] INFO <P Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper 
[12-01|19:37:42.068] INFO <P Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "1.737125ms"} [12-01|19:37:42.069] INFO <X Chain> queue/jobs.go:137 interrupted execution {"numExecuted": 4956} 
[12-01|19:37:42.069] INFO <X Chain> bootstrap/bootstrapper.go:307 shutting down Bootstrapper 
[12-01|19:37:42.070] INFO <X Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "3.656708ms"} [12-01|19:37:42.070] INFO network/network.go:1142 shutting down the p2p networking 
[12-01|19:37:42.079] INFO nat/nat.go:193 Unmapped all ports 
[12-01|19:37:42.079] INFO node/node.go:1895 cleaning up plugin runtimes [12-01|19:37:42.090] INFO node/node.go:1923 finished node shutdown
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap/bootstrapper.go:589 +0x4e0
github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap.(*Bootstrapper).Start(0x140049bd6c0, {0x104324158, 0x10617cc80}, 0x0)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap/bootstrapper.go:365 +0x608
github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).Start(0x140069abce0, {0x104324158, 0x10617cc80}, 0x0)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/networking/handler/handler.go:257 +0x2b4
github.com/ava-labs/avalanchego/chains.(*manager).createChain(0x14000774008, {{0xab, 0x68, 0xeb, 0x1e, 0xe1, 0x42, 0xa0, 0x5c, 0xfe, ...}, ...})
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:471 +0x1504
github.com/ava-labs/avalanchego/chains.(*manager).dispatchChainCreator(0x14000774008)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:1517 +0xcc
created by github.com/ava-labs/avalanchego/chains.(*manager).StartChainCreator in goroutine 1
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:1494 +0x108

Need to be documented in RELEASES.md?

…hutting down

Currently, when the X-chain executes transactions while bootstrapping, it cannot halt until it finishes executing the current batch of transactions,
because as seen in the stack trace below, the X-chain's bootstrapping is done from within the chain creation goroutine.
In order to stop the bootstrapper of the X-chain, the chain router needs to be stopped which in turn stops the various chains.
However, before stopping the chain router, the chain manager's shutdown procedure waits for the chain creation procedure to finish, which
cannot finish before the bootstrapping is done.

Fortunately, it's safe to close the chainc creation queue first, which prevents new chains from forming, and then proceed to stop the chain router
and only afterwards to wait for the chain creation to finish.

The reason it's safe is because the chain router simply calls Stop() on all chains, and each chain's stop procedure simply calls thread safe methods.
When compiling the node with race detector and shutting down the chain, it doesn't complain:

[12-01|19:37:36.630] INFO <X Chain> queue/jobs.go:202 executing operations {"numExecuted": 4157, "numToExecute": 248432, "eta": "29m23s"}
^C[12-01|19:37:42.065] INFO node/node.go:1841 shutting down node {"exitCode": 0}
[12-01|19:37:42.065] INFO health/worker.go:115 registered new check and initialized its state to failing {"name": "health", "name": "shuttingDown", "tags": ["application"]}
[12-01|19:37:42.066] INFO chains/manager.go:1526 shutting down chain manager
[12-01|19:37:42.066] INFO router/chain_router.go:374 shutting down chain router
[12-01|19:37:42.067] INFO <P Chain> bootstrap/bootstrapper.go:805 shutting down bootstrapper
[12-01|19:37:42.068] INFO <P Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "1.737125ms"}
[12-01|19:37:42.069] INFO <X Chain> queue/jobs.go:137 interrupted execution {"numExecuted": 4956}
[12-01|19:37:42.069] INFO <X Chain> bootstrap/bootstrapper.go:307 shutting down Bootstrapper
[12-01|19:37:42.070] INFO <X Chain> router/chain_router.go:398 chain shutdown {"shutdownDuration": "3.656708ms"}
[12-01|19:37:42.070] INFO network/network.go:1142 shutting down the p2p networking
[12-01|19:37:42.079] INFO nat/nat.go:193 Unmapped all ports
[12-01|19:37:42.079] INFO node/node.go:1895 cleaning up plugin runtimes
[12-01|19:37:42.090] INFO node/node.go:1923 finished node shutdown

	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap/bootstrapper.go:589 +0x4e0
github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap.(*Bootstrapper).Start(0x140049bd6c0, {0x104324158, 0x10617cc80}, 0x0)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/engine/avalanche/bootstrap/bootstrapper.go:365 +0x608
github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).Start(0x140069abce0, {0x104324158, 0x10617cc80}, 0x0)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/snow/networking/handler/handler.go:257 +0x2b4
github.com/ava-labs/avalanchego/chains.(*manager).createChain(0x14000774008, {{0xab, 0x68, 0xeb, 0x1e, 0xe1, 0x42, 0xa0, 0x5c, 0xfe, ...}, ...})
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:471 +0x1504
github.com/ava-labs/avalanchego/chains.(*manager).dispatchChainCreator(0x14000774008)
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:1517 +0xcc
created by github.com/ava-labs/avalanchego/chains.(*manager).StartChainCreator in goroutine 1
	/Users/avalancheuser/go/src/github.com/ava-labs/avalanchego/chains/manager.go:1494 +0x108

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Copilot AI review requested due to automatic review settings December 1, 2025 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a shutdown race condition where the chain manager would deadlock waiting for chain creation to complete before stopping the chain router, but chain creation could be blocked by bootstrapping chains that needed the router to stop first.

Key changes:

  • Reordered shutdown sequence to stop chain router before waiting for chain creator goroutine to exit
  • This allows bootstrapping chains to be interrupted, preventing the deadlock scenario

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant