Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Panicked at 'header is last in set and contains standard change signal; must have justification #13168

Closed
2 tasks done
crystalin opened this issue Jan 18, 2023 · 15 comments · Fixed by #13249
Closed
2 tasks done

Comments

@crystalin
Copy link
Contributor

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

We had a report from a collator node (on Moonriver, Substrate v0.9.29) of it crashing (but being able to recover after restarting) with this error:

Jan 16 21:41:08 glmr1 moonriver[821577]: 2023-01-16 21:41:08.179  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (42 peers), best: #3422711 (0x49ca…e281), finalized #3422710 (0xa7bb…0eef), ⬇ 11.1kiB/s ⬆ 12.4kiB/s
Jan 16 21:41:09 glmr1 moonbeam[760298]: 2023-01-16 21:41:09.954  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (59 peers), best: #2747202 (0x55c1…d26c), finalized #2747200 (0x167d…ad90), ⬇ 66.5kiB/s ⬆ 159.8kiB/s
Jan 16 21:41:11 glmr1 moonriver[821577]: 2023-01-16 21:41:11.078  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (50 peers), best: #16225842 (0xf2a6…f5ef), finalized #16225839 (0x0bbd…b851), ⬇ 239.3kiB/s ⬆ 982.5kiB/s
Jan 16 21:41:12 glmr1 moonriver[821577]: 2023-01-16 21:41:12.268  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #16225843 (0x1bed…1e79)
Jan 16 21:41:12 glmr1 moonriver[821577]: 2023-01-16 21:41:12.270  INFO tokio-runtime-worker cumulus-collator: [🌗] Starting collation. relay_parent=0x1bed26ff1b5694e5680daeb8b21ffb27822ed91a14ab1826013c58cfd1e31e79 at=0x605841703347f9b396bf2309c9cc0593366448aab31c964caa9d67dd099c6ec3
Jan 16 21:41:12 glmr1 moonbeam[760298]: 2023-01-16 21:41:12.318  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #13837240 (0xcbbe…469a)
Jan 16 21:41:13 glmr1 moonbeam[760298]: 2023-01-16 21:41:13.001  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (50 peers), best: #13837240 (0xcbbe…469a), finalized #13837237 (0xb14a…1763), ⬇ 218.9kiB/s ⬆ 362.2kiB/s
Jan 16 21:41:13 glmr1 moonriver[821577]: 2023-01-16 21:41:13.179  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (42 peers), best: #3422712 (0x6058…6ec3), finalized #3422710 (0xa7bb…0eef), ⬇ 0.4kiB/s ⬆ 0.4kiB/s
Jan 16 21:41:14 glmr1 moonriver[821577]: 2023-01-16 21:41:14.164  INFO tokio-runtime-worker substrate: [🌗] ✨ Imported #3422713 (0xd28c…1abd)
Jan 16 21:41:14 glmr1 moonbeam[760298]: 2023-01-16 21:41:14.955  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (59 peers), best: #2747202 (0x55c1…d26c), finalized #2747201 (0xdc89…2fce), ⬇ 16.4kiB/s ⬆ 13.9kiB/s
Jan 16 21:41:16 glmr1 moonriver[821577]: 2023-01-16 21:41:16.078  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (50 peers), best: #16225843 (0x1bed…1e79), finalized #16225840 (0xc094…5f24), ⬇ 796.6kiB/s ⬆ 1.3MiB/s
Jan 16 21:41:18 glmr1 moonbeam[760298]: 2023-01-16 21:41:18.001  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (50 peers), best: #13837240 (0xcbbe…469a), finalized #13837237 (0xb14a…1763), ⬇ 202.7kiB/s ⬆ 143.5kiB/s
Jan 16 21:41:18 glmr1 moonriver[821577]: 2023-01-16 21:41:18.180  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (42 peers), best: #3422712 (0x6058…6ec3), finalized #3422710 (0xa7bb…0eef), ⬇ 25.6kiB/s ⬆ 40.5kiB/s
Jan 16 21:41:18 glmr1 moonbeam[760298]: 2023-01-16 21:41:18.248  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #13837241 (0x2e0e…575b)
Jan 16 21:41:18 glmr1 moonbeam[760298]: 2023-01-16 21:41:18.250  INFO tokio-runtime-worker cumulus-collator: [🌗] Starting collation. relay_parent=0x2e0ecdd7672cae3c24eeaa88815bad9a2dfa2d6030be3977c6cde7dcc253575b at=0x4ffde78ea29c68020dc79dfc39855f284b5087cd578a9c79c21bdba96528e347
Jan 16 21:41:18 glmr1 moonriver[821577]: 2023-01-16 21:41:18.260  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #16225844 (0xe742…8489)
Jan 16 21:41:19 glmr1 moonbeam[760298]: 2023-01-16 21:41:19.102  INFO tokio-runtime-worker substrate: [🌗] ✨ Imported #2747204 (0xf3a6…6ccc)
Jan 16 21:41:19 glmr1 moonbeam[760298]: 2023-01-16 21:41:19.955  INFO tokio-runtime-worker substrate: [🌗] 💤 Idle (59 peers), best: #2747203 (0x4ffd…e347), finalized #2747201 (0xdc89…2fce), ⬇ 10.3kiB/s ⬆ 87.0kiB/s
Jan 16 21:41:21 glmr1 moonriver[821577]: 2023-01-16 21:41:21.079  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (50 peers), best: #16225844 (0xe742…8489), finalized #16225841 (0x5ce3…077c), ⬇ 556.1kiB/s ⬆ 800.5kiB/s
Jan 16 21:41:21 glmr1 moonriver[821577]: ====================
Jan 16 21:41:21 glmr1 moonriver[821577]: Version: 0.28.1-a0e935df4e2
Jan 16 21:41:21 glmr1 moonriver[821577]:    0: sp_panic_handler::set::{{closure}}
Jan 16 21:41:21 glmr1 moonriver[821577]:    1: std::panicking::rust_panic_with_hook
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/std/src/panicking.rs:702:17
Jan 16 21:41:21 glmr1 moonriver[821577]:    2: std::panicking::begin_panic_handler::{{closure}}
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/std/src/panicking.rs:588:13
Jan 16 21:41:21 glmr1 moonriver[821577]:    3: std::sys_common::backtrace::__rust_end_short_backtrace
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/std/src/sys_common/backtrace.rs:138:18
Jan 16 21:41:21 glmr1 moonriver[821577]:    4: rust_begin_unwind
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/std/src/panicking.rs:584:5
Jan 16 21:41:21 glmr1 moonriver[821577]:    5: core::panicking::panic_fmt
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/core/src/panicking.rs:142:14
Jan 16 21:41:21 glmr1 moonriver[821577]:    6: core::panicking::panic_display
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/core/src/panicking.rs:72:5
Jan 16 21:41:21 glmr1 moonriver[821577]:    7: core::panicking::panic_str
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/core/src/panicking.rs:56:5
Jan 16 21:41:21 glmr1 moonriver[821577]:    8: core::option::expect_failed
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/core/src/option.rs:1880:5
Jan 16 21:41:21 glmr1 moonriver[821577]:    9: <sc_finality_grandpa::warp_proof::NetworkProvider<Block,Backend> as sc_network_common::sync::warp::WarpSyncProvider<Block>>::generate
Jan 16 21:41:21 glmr1 moonriver[821577]:   10: sc_network_sync::warp_request_handler::RequestHandler<TBlock>::handle_request
Jan 16 21:41:21 glmr1 moonriver[821577]:   11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
Jan 16 21:41:21 glmr1 moonriver[821577]:   12: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
Jan 16 21:41:21 glmr1 moonriver[821577]:   13: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
Jan 16 21:41:21 glmr1 moonriver[821577]:   14: tokio::runtime::task::harness::Harness<T,S>::poll
Jan 16 21:41:21 glmr1 moonriver[821577]:   15: std::thread::local::LocalKey<T>::with
Jan 16 21:41:21 glmr1 moonriver[821577]:   16: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
Jan 16 21:41:21 glmr1 moonriver[821577]:   17: tokio::runtime::scheduler::multi_thread::worker::Context::run
Jan 16 21:41:21 glmr1 moonriver[821577]:   18: tokio::macros::scoped_tls::ScopedKey<T>::set
Jan 16 21:41:21 glmr1 moonriver[821577]:   19: tokio::runtime::scheduler::multi_thread::worker::run
Jan 16 21:41:21 glmr1 moonriver[821577]:   20: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
Jan 16 21:41:21 glmr1 moonriver[821577]:   21: tokio::runtime::task::harness::Harness<T,S>::poll
Jan 16 21:41:21 glmr1 moonriver[821577]:   22: tokio::runtime::blocking::pool::Inner::run
Jan 16 21:41:21 glmr1 moonriver[821577]:   23: std::sys_common::backtrace::__rust_begin_short_backtrace
Jan 16 21:41:21 glmr1 moonriver[821577]:   24: core::ops::function::FnOnce::call_once{{vtable.shim}}
Jan 16 21:41:21 glmr1 moonriver[821577]:   25: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/alloc/src/boxed.rs:1940:9
Jan 16 21:41:21 glmr1 moonriver[821577]:       <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/alloc/src/boxed.rs:1940:9
Jan 16 21:41:21 glmr1 moonriver[821577]:       std::sys::unix::thread::Thread::new::thread_start
Jan 16 21:41:21 glmr1 moonriver[821577]:              at rustc/1d37ed661a6922e7a167609b8cd7eb31e972b19b/library/std/src/sys/unix/thread.rs:108:17
Jan 16 21:41:21 glmr1 moonriver[821577]:   26: start_thread
Jan 16 21:41:21 glmr1 moonriver[821577]:   27: clone
Jan 16 21:41:21 glmr1 moonriver[821577]: Thread 'tokio-runtime-worker' panicked at 'header is last in set and contains standard change signal; must have justification; qed.', /root/.cargo/git/checkouts/substrate-189071a041b0d328/385446f/client/finality-grandpa/src/warp_proof.rs:132
Jan 16 21:41:21 glmr1 moonriver[821577]: This is a bug. Please report it at:
Jan 16 21:41:21 glmr1 moonriver[821577]: #011https://github.com/PureStake/moonbeam/issues/new
Jan 16 21:41:21 glmr1 systemd[1]: moonriver.service: Main process exited, code=exited, status=1/FAILURE
Jan 16 21:41:21 glmr1 systemd[1]: moonriver.service: Failed with result 'exit-code'.

@bkchr do you have an idea if it is worth investigating ?

Steps to reproduce

No response

@andresilva
Copy link
Contributor

andresilva commented Jan 18, 2023

This error was triggered when the node tried to generate a warp proof (to serve some peer on the network that is doing a warp sync). A warp proof consists of finality justifications for all blocks that change the authority set since genesis until the current block (or whatever block we are warping to). The node persists justifications for all these blocks since those are required to securily verify the authority set hand-offs. It seems that this node, for some reason, either didn't have the required justification on its database or it was corrupted and failed to decode. Is this something that you've seen happen more than once?

@bkchr
Copy link
Member

bkchr commented Jan 18, 2023

And do you maybe have the relay chain db?

@perltk45
Copy link

Here is my moonriver.service file, if that helps.

[Unit]
Description="Moonriver systemd service"
After=network.target
StartLimitIntervalSec=0

[Service]
LimitNOFILE=50000
Type=simple
Restart=on-failure
RestartSec=10
User=moonriver_service
SyslogIdentifier=moonriver
SyslogFacility=local7
KillSignal=SIGHUP
ExecStart=/pruned/pruned-moonriver-data/moonriver
--collator
--no-hardware-benchmarks
--execution=wasm
--wasm-execution compiled
--runtime-cache-size 32
--no-private-ipv4
--port 6060
--rpc-port 6633
--ws-port 6644
--trie-cache-size 32
--state-pruning 1000
--blocks-pruning 1000
--log rpc=info
--log author-filter=debug
--base-path /pruned/pruned-moonriver-data
--chain moonriver
--in-peers 25
--out-peers 25
--in-peers-light 5
--name "TrueStakingAMS"
--db-cache 4096
--no-mdns
--no-prometheus
--
--no-mdns
--no-prometheus
--no-private-ipv4
--execution=wasm
--port 6061
--in-peers 25
--in-peers-light 5
--out-peers 25
--rpc-port 6634
--ws-port 6645
--state-pruning 100
--blocks-pruning 100
--name="TrueStakingAMS (Embedded Relay)"

[Install]
WantedBy=multi-user.target

@bkchr
Copy link
Member

bkchr commented Jan 18, 2023

@perltk45 can you provide the relay chain database?

@perltk45
Copy link

perltk45 commented Jan 18, 2023

Do you want all 30gb?

@bkchr
Copy link
Member

bkchr commented Jan 18, 2023

Sadly there is no better way 🙈

@perltk45
Copy link

curl --output ksmcc3.tar.zst https://eu-central-1.unitedbloc.com://ksmcc3.tar.zst

It is only 8.9G after compression.

@perltk45
Copy link

Let me know if you want the parachain db also.

@perltk45
Copy link

I thought this was due to the archive setting on the parachain, but I went back to the prior server running with archive mode on the parachain and found the same error!

"Jan 10 18:15:35 Moon1 moonriver[453678]: Thread 'tokio-runtime-worker' panicked at 'header is last in set and contains standard change signal; must have justification; qed.', /root/.cargo/git/checkouts/substrate-189071a041b0d328/385446f/client/finality-grandpa/src/warp_proof.rs:132"

@bkchr
Copy link
Member

bkchr commented Jan 21, 2023

@perltk45 but you reused the same relay chain db? This issue is not related to the parachain db.

@perltk45
Copy link

Yes, used the same relay chain db.
Good to know that the problem wasn't the pruned parachain. The storage costs on gen4 nvme are very high -- so being able to prune the parachain is fantastic!

@stakeworld
Copy link

@bkchr @perltk45 , i'm also having these on a non validator paritydb polkadot snapshot node, i'm running with --state-pruning 1000 --blocks-pruning 1000 just like @perltk45 and wondering if the double pruning setting is causing this. Previous there was pruning 1000 which I think was only state-pruning, since the command line changed and thing needed to be explicit changed to state-pruning AND block-pruning, or should only the state pruning be used? Looking at the logs i had these pannicks every other day but recently more often. After the restart everything is functioning normal. Was thinking of rebuilding the snaphot but maybe something with the pruning settings?

@bkchr
Copy link
Member

bkchr commented Jan 27, 2023

Ahh, good that you mentioned this! The best for now would be to use state-pruning, because block-pruning indeed will prune the justifications right now.

@stakeworld
Copy link

Ahh, good that you mentioned this! The best for now would be to use state-pruning, because block-pruning indeed will prune the justifications right now.

I have removed the block-pruning from the settings, will report if the errors go away. So to be clear there is no real usage of block-pruning for now? Or is only block-pruning without state-pruning also a meaningfull option?

@bkchr
Copy link
Member

bkchr commented Jan 27, 2023

So to be clear there is no real usage of block-pruning for now?

Currently, not really, besides cleaning up the db.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants