Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-Chain FATAL on bootstrap #1581

Closed
jpop32 opened this issue Jun 5, 2023 · 7 comments · Fixed by #1598
Closed

X-Chain FATAL on bootstrap #1581

jpop32 opened this issue Jun 5, 2023 · 7 comments · Fixed by #1598
Assignees
Labels
bug Something isn't working lifecycle/stale
Milestone

Comments

@jpop32
Copy link
Contributor

jpop32 commented Jun 5, 2023

Describe the bug
When doing a clean bootstrap (deleted database, restarted node), P and C finished with no issues. X-Chain ended up with a FATAL error and exited. Relevant bit of log:

Jun 05 13:55:27 node-main avalanchego[2309622]: [06-05|13:55:27.297] INFO <X Chain> queue/jobs.go:203 executing operations {"numExecuted": 4092719, "numToExecute": 4098651, "eta": "7s"}
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.402] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 4098651}
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.405] INFO <X Chain> bootstrap/bootstrapper.go:115 starting bootstrapper
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.428] INFO <X Chain> bootstrap/bootstrapper.go:575 executing transactions
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.428] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 0}
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.428] INFO <X Chain> bootstrap/bootstrapper.go:592 executing vertices
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.428] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 0}
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.428] FATAL <X Chain> handler/handler.go:344 shutting down chain {"reason": "received an unexpected error", "error": "duplicated register while processing sync message: {%!s(*message.inboundMessage=&{[129 130 201 232 153 92 130 67 7 139 58 175 15 206 144 51 209 180 85 29] 19 0xc10043d6e0 {0 9223372036854775807 0x34eabe0} 0xc15dc0 10198}) ENGINE_TYPE_AVALANCHE}"}
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.429] INFO <X Chain> bootstrap/bootstrapper.go:297 shutting down bootstrapper
Jun 05 13:55:35 node-main avalanchego[2309622]: [06-05|13:55:35.429] INFO <X Chain> router/chain_router.go:665 chain shutdown {"shutdownDuration": "362.308µs"}

After restart, chain finished the boostrap with no further issues:

Jun 05 13:56:33 node-main avalanchego[3006958]: [06-05|13:56:33.587] INFO <X Chain> common/bootstrapper.go:244 bootstrapping started syncing {"numVerticesInFrontier": 1}
Jun 05 13:56:37 node-main avalanchego[3006958]: [06-05|13:56:37.667] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 5000, "numTotalBlocks": 31556, "eta": "22s"}
Jun 05 13:56:49 node-main avalanchego[3006958]: [06-05|13:56:49.181] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 10000, "numTotalBlocks": 31556, "eta": "34s"}
Jun 05 13:56:55 node-main avalanchego[3006958]: [06-05|13:56:55.939] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 15000, "numTotalBlocks": 31556, "eta": "25s"}
Jun 05 13:57:09 node-main avalanchego[3006958]: [06-05|13:57:09.792] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 20000, "numTotalBlocks": 31556, "eta": "21s"}
Jun 05 13:57:23 node-main avalanchego[3006958]: [06-05|13:57:23.540] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 25000, "numTotalBlocks": 31556, "eta": "13s"}
Jun 05 13:57:33 node-main avalanchego[3006958]: [06-05|13:57:33.061] INFO <X Chain> bootstrap/bootstrapper.go:495 fetching blocks {"numFetchedBlocks": 30000, "numTotalBlocks": 31556, "eta": "3s"}
Jun 05 13:57:36 node-main avalanchego[3006958]: [06-05|13:57:36.653] INFO <X Chain> bootstrap/bootstrapper.go:554 executing blocks {"numPendingJobs": 31556}
Jun 05 13:58:06 node-main avalanchego[3006958]: [06-05|13:58:06.653] INFO <X Chain> queue/jobs.go:203 executing operations {"numExecuted": 24902, "numToExecute": 31556, "eta": "8s"}
Jun 05 13:58:14 node-main avalanchego[3006958]: [06-05|13:58:14.959] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 31556}
Jun 05 13:58:24 node-main avalanchego[3006958]: [06-05|13:58:24.869] INFO <X Chain> snowman/transitive.go:416 consensus starting {"lastAcceptedBlock": "2DqvKhLKMastK3qZLikAmKHUvSxq7t86RAT9qCZxyeKJV7UQS1"}

Node config:

➜  ~ more .avalanchego/configs/node.json
{
  "http-host": "",
  "public-ip-resolution-service": "opendns"
}
➜  ~ more .avalanchego/configs/chains/C/config.json
{
  "state-sync-enabled": true
}

To Reproduce
Not sure. Just deleted the database and restarted the node.

Expected behavior
Bootstrap finishes without a FATAL error.

Logs
logs.zip

Metrics
I have Grafana dashboards set up so I can provide screenshots if necessary.

Operating System
Ubuntu 20.04.

@jpop32 jpop32 added the bug Something isn't working label Jun 5, 2023
@github-project-automation github-project-automation bot moved this to Backlog 🗄 in Ski Patrol Jun 5, 2023
@StephenButtolph StephenButtolph added this to the v1.10.3 milestone Jun 5, 2023
@StephenButtolph StephenButtolph moved this from Backlog 🗄 to Selected for Development ⏳ in Ski Patrol Jun 5, 2023
@anusha-ctrl anusha-ctrl self-assigned this Jun 5, 2023
@anusha-ctrl
Copy link
Contributor

Replicating issue currently

@anusha-ctrl
Copy link
Contributor

Synced a full Fuji node with db wiped, no FATAL

@ceyonur
Copy link
Contributor

ceyonur commented Jun 7, 2023

I could not see the error in those zipped logs. @jpop32 could you check if those logs were from the same node that had this error?

@jpop32
Copy link
Contributor Author

jpop32 commented Jun 7, 2023

Must have messed something up with transfer and archiving, sorry.

Here's the correct zip.
logs.zip

Error is in the X.log file.

@anusha-ctrl anusha-ctrl linked a pull request Jun 8, 2023 that will close this issue
@StephenButtolph StephenButtolph modified the milestones: v1.10.3, v1.10.4 Jun 15, 2023
@StephenButtolph StephenButtolph moved this from Selected for Development ⏳ to Frozen 🧊 in Ski Patrol Jun 17, 2023
@StephenButtolph
Copy link
Contributor

We've been unable to replicate this report after multiple Fuji and Mainnet syncs.

@StephenButtolph
Copy link
Contributor

StephenButtolph commented Jun 27, 2023

Ran into this again:

[06-27|16:16:37.820] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "RRQddCHdBPL3RQ7NRRGZ8hRDR9cQMewzwxJG3ATW6rhfquMgz"}
[06-27|16:16:37.824] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2euCiU6osz8bqvS3skY98RnQdWPoUWZUwStLqEtCFMcCox9qRb"}
[06-27|16:16:37.828] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "6TdxCdwDRkG3qK8ZAjEvvdPZFwrp7WFHw63T2PTKSaQQ5VuwZ"}
[06-27|16:16:37.832] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2bwoge8M5TNUkwRvxPgjPgfJLohyTgguEgVBkE3AgfNVFb2aot"}
[06-27|16:16:37.837] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "fkdqpWh87cFuYRLtwc6S9cW4jXbfYXeogAVkB4EtKuJjA7WWA"}
[06-27|16:16:37.841] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2EB8Xr8v5wPiytHghACBhEFh2pjLWrCbrHCNG2iPXgcsQPtFQw"}
[06-27|16:16:37.846] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "FzAVUBGH8EWapKwKEeuGPF2VxgDGEBcVsKyVzSmp1z5BY6ugr"}
[06-27|16:16:37.862] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "rYL9gnoZCwt1aWtAqaD8nPaK5157LGZ6zJa5svVicJ9iapetR"}
[06-27|16:16:37.875] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2khfV4E8zxkyhBV5zQAnsQk68TV7W1NpZMyZJnkCHydzN1BLcU"}
[06-27|16:16:37.880] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "vge84gDUCa8SNZSocsSkxvjUJ99CKo4hf7GNLYN5rG5PyHr3k"}
[06-27|16:16:37.884] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2eX5x3BCUZEiVzcBJRBYwtrBfsHTFYrFC1GpriWMtDApWzXtF4"}
[06-27|16:16:37.885] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2LjmhuWHQrs7xQktgcw41jCprjL8DUio8Sh44NBWpgUQ4Aph9L"}
[06-27|16:16:37.886] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "iYuaMRNLG9kTTu7BYUGG8YUcEU5EQkmLYiVqii2j4UV2ekaww"}
[06-27|16:16:37.887] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2j7Bf2s2NXvZRRBHH8gWTC6kaSmxmLMwx1QD5QkfjqMkqqatFk"}
[06-27|16:16:37.888] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "WyDh4wUgCVueNkWQinAiBnY2gsMnXBUJ3za1bAe8s6uvSLRCP"}
[06-27|16:16:37.889] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2uoZ8Cc9aXNNTG57YjPiHaPg92oiS3SWudH6TXnGwFBMhYtN9V"}
[06-27|16:16:37.890] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "EbBuy2r3EuToDATY8njf8Bqog3cHsFCbpQn1ksjqrv5zrLJ4u"}
[06-27|16:16:37.891] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "vjWTcmF2xXA3cf4uyCFQfUJSMtba7ZbbrzMgHzv7Pa4ZSkK9p"}
[06-27|16:16:37.892] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "CYJ4tPSoSdWCuTY1ypPE9MPq4ooBnRMXrvCQqztzRdQ4USpCk"}
[06-27|16:16:37.892] TRACE <X Chain> bootstrap/vertex_job.go:110 accepting vertex in bootstrapping {"vtxID": "2D1cmbiG36BqQMRyHt4kFhWarmatA1ighSpND3FeFgz3vFVtCZ"}
[06-27|16:16:37.892] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 449999}
[06-27|16:16:37.893] INFO <X Chain> bootstrap/bootstrapper.go:115 starting bootstrapper
[06-27|16:16:37.894] INFO <X Chain> states/state.go:646 state already pruned
[06-27|16:16:37.894] INFO <X Chain> bootstrap/bootstrapper.go:575 executing transactions
[06-27|16:16:37.894] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 0}
[06-27|16:16:37.894] INFO <X Chain> bootstrap/bootstrapper.go:592 executing vertices
[06-27|16:16:37.894] INFO <X Chain> queue/jobs.go:224 executed operations {"numExecuted": 0}
[06-27|16:16:37.895] FATAL <X Chain> handler/handler.go:336 shutting down chain {"reason": "received an unexpected error", "error": "attempt to register existing gatherer; &metrics.multiGatherer{lock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:atomic.Int32{_:atomic.noCopy{}, v:0}, readerWait:atomic.Int32{_:atomic.noCopy{}, v:0}}, gatherers:map[string]prometheus.Gatherer{\"\":(*metrics.optionalGatherer)(0x1400354d830), \"metervm\":(*prometheus.Registry)(0x14034eeff40)}} while processing sync message: NodeID-CZmZ9xpCzkWqjAyS7L4htzh5Lg6kf1k18 Op: get_ancestors_failed Message: ChainID: 2JVSBoinj9C2J33VntvzYtVJNZdN2NKiwwKjcumHUWEb5DbBrm RequestID: 1169 EngineType: ENGINE_TYPE_AVALANCHE"}
[06-27|16:16:37.895] INFO <X Chain> bootstrap/bootstrapper.go:297 shutting down bootstrapper
[06-27|16:16:37.895] INFO <X Chain> router/chain_router.go:665 chain shutdown {"shutdownDuration": "46.791µs"}
[06-27|16:16:37.895] INFO node/node.go:1413 shutting down node {"exitCode": 1}
[06-27|16:16:37.895] INFO health/worker.go:111 registered new check and initialized its state to failing {"namespace": "health", "name": "shuttingDown", "tags": ["application"]}
[06-27|16:16:37.895] INFO ipcs/chainipc.go:123 shutting down chain IPCs
[06-27|16:16:37.895] INFO chains/manager.go:1383 shutting down chain manager
[06-27|16:16:37.895] INFO chains/manager.go:1376 stopping chain creator
[06-27|16:16:37.895] INFO router/chain_router.go:361 shutting down chain router
[06-27|16:16:37.895] INFO <P Chain> bootstrap/bootstrapper.go:297 shutting down bootstrapper
[06-27|16:16:37.895] INFO <C Chain> syncer/state_syncer.go:574 shutting down state syncer
[06-27|16:16:37.895] INFO <P Chain> router/chain_router.go:383 chain shutdown {"shutdownDuration": "328.875µs"}
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/plugin/evm/syncervm_client.go:316: state sync: sync finished root=eff275..ee1a38 err="context canceled"
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/plugin/evm/syncervm_client.go:227: stateSync completed, notifying engine err="context canceled"
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/txpool/txpool.go:470: Transaction pool stopped 
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:926: Closing quit channel 
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:929: Stopping Acceptor 
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:932: Acceptor queue drained t="3.75µs"
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:935: Shutting down sender cacher 
INFO [06-27|16:16:37.905] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:939: Closing scope 
INFO [06-27|16:16:37.909] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:943: Waiting for background processes to complete 
INFO [06-27|16:16:37.909] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:952: Shutting down state manager 
INFO [06-27|16:16:37.909] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:957: State manager shut down t=458ns
INFO [06-27|16:16:37.909] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:963: Blockchain stopped 
[06-27|16:16:37.910] INFO <C Chain> router/chain_router.go:383 chain shutdown {"shutdownDuration": "14.747625ms"}
[06-27|16:16:37.910] INFO network/network.go:1306 shutting down the p2p networking
[06-27|16:16:37.915] INFO node/node.go:1466 cleaning up plugin runtimes
[06-27|16:16:37.932] INFO node/node.go:1488 finished node shutdown
[06-27|16:16:37.932] INFO nat/nat.go:178 Unmapped all ports

This was on Fuji when testing #1662.

@github-actions
Copy link

This issue has become stale because it has been open 60 days with no activity. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.

@StephenButtolph StephenButtolph added this to the v1.11.7 milestone May 29, 2024
@StephenButtolph StephenButtolph moved this from Backlog 🗄️ to In Review 👀 in Platform Engineering Group May 29, 2024
@StephenButtolph StephenButtolph moved this from In Review 👀 to In Progress 🏗 in Platform Engineering Group May 29, 2024
@StephenButtolph StephenButtolph removed this from the v1.11.7 milestone Jun 4, 2024
@StephenButtolph StephenButtolph added this to the v1.11.8 milestone Jun 4, 2024
@github-project-automation github-project-automation bot moved this from In Progress 🏗 to Done ✅ in Platform Engineering Group Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale
Projects
Archived in project
Status: Frozen 🧊
Development

Successfully merging a pull request may close this issue.

4 participants