Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

The substrate node restarts and crashes #11222

Open
2 tasks done
tylerztl opened this issue Apr 14, 2022 · 15 comments
Open
2 tasks done

The substrate node restarts and crashes #11222

tylerztl opened this issue Apr 14, 2022 · 15 comments
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@tylerztl
Copy link

tylerztl commented Apr 14, 2022

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

./target/debug/substrate --dev --base-path=./node-data

2022-04-14 05:42:42 Low open file descriptor limit configured for the process. Current value: 4096, recommended value: 10000.
2022-04-14 05:42:42 Substrate Node
2022-04-14 05:42:42 ✌️  version 3.0.0-dev-32af9fc
2022-04-14 05:42:42 ❤️  by Parity Technologies <admin@parity.io>, 2017-2022
2022-04-14 05:42:42 📋 Chain specification: Development
2022-04-14 05:42:42 🏷  Node name: past-fireman-7121
2022-04-14 05:42:42 👤 Role: AUTHORITY
2022-04-14 05:42:42 💾 Database: RocksDb at ./node-data/chains/dev/db/full
2022-04-14 05:42:42 ⛓  Native runtime: node-268 (substrate-node-0.tx2.au10)

====================

Version: 3.0.0-dev-32af9fc

   0: sp_panic_handler::panic_hook
             at primitives/panic-handler/src/lib.rs:166:18
   1: sp_panic_handler::set::{{closure}}
             at primitives/panic-handler/src/lib.rs:62:12
   2: std::panicking::rust_panic_with_hook
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:610:17
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
   5: rust_begin_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
   6: core::panicking::panic_fmt
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
   7: sp_database::kvdb::handle_err
             at primitives/database/src/kvdb.rs:29:4
   8: <sp_database::kvdb::DbAdapter<D> as sp_database::Database<H>>::get
             at primitives/database/src/kvdb.rs:112:3
   9: sc_client_db::utils::check_database_type
             at client/db/src/utils.rs:360:8
  10: sc_client_db::utils::open_database_at
             at client/db/src/utils.rs:219:2
  11: sc_client_db::utils::open_database
             at client/db/src/utils.rs:196:2
  12: sc_client_db::Backend<Block>::new
             at client/db/src/lib.rs:1015:12
  13: sc_service::builder::new_db_backend
             at client/service/src/builder.rs:331:14
  14: sc_service::builder::new_full_parts
             at client/service/src/builder.rs:269:17
  15: node_cli::service::new_partial
             at bin/node/cli/src/service.rs:168:3
  16: node_cli::service::new_full_base
             at bin/node/cli/src/service.rs:336:6
  17: node_cli::service::new_full
             at bin/node/cli/src/service.rs:560:2
  18: node_cli::command::run::{{closure}}::{{closure}}
             at bin/node/cli/src/command.rs:89:5
  19: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/future/mod.rs:80:19
  20: tokio::park::thread::CachedParkThread::block_on::{{closure}}
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/park/thread.rs:263:54
  21: tokio::coop::with_budget::{{closure}}
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:102:9
  22: std::thread::local::LocalKey<T>::try_with
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/thread/local.rs:399:16
  23: std::thread::local::LocalKey<T>::with
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/thread/local.rs:375:9
  24: tokio::coop::with_budget
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:95:5
      tokio::coop::budget
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:72:5
      tokio::park::thread::CachedParkThread::block_on
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/park/thread.rs:263:31
  25: tokio::runtime::enter::Enter::block_on
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/enter.rs:151:13
  26: tokio::runtime::thread_pool::ThreadPool::block_on
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/thread_pool/mod.rs:73:9
  27: tokio::runtime::Runtime::block_on
             at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/mod.rs:477:43
  28: sc_cli::runner::Runner<C>::run_node_until_exit
             at client/cli/src/runner.rs:148:26
  29: node_cli::command::run
             at bin/node/cli/src/command.rs:88:4
  30: substrate::main
             at bin/node/cli/bin/main.rs:24:2
  31: core::ops::function::FnOnce::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:227:5
  32: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:123:18
  33: std::rt::lang_start::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:145:18
  34: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:259:13
      std::panicking::try::do_call
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406:40
      std::panicking::try
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370:19
      std::panic::catch_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:128:48
      std::panicking::try::do_call
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406:40
      std::panicking::try
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370:19
      std::panic::catch_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133:14
      std::rt::lang_start_internal
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:128:20
  35: std::rt::lang_start
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:144:17
  36: main
  37: __libc_start_main
  38: <unknown>


Thread 'main' panicked at 'Critical database error: Custom { kind: Other, error: Error { message: "Corruption: Corrupted compressed block contents: Snappy" } }', /root/substrate/primitives/database/src/kvdb.rs:29

This is a bug. Please report it at:

	https://github.com/paritytech/substrate/issues/new

The substrate v0.9.18 version is used, and the v0.9.17 version is normal.

Steps to reproduce

Environment:
Linux version 3.10.0-1127.19.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Aug 25 17:23:54 UTC 2020

  1. git clone https://github.com/paritytech/substrate.git
  2. cargo build
  3. ./target/debug/substrate --dev --base-path=./node-data
  4. Wait 30 seconds and stop the node
  5. restart the node, ./target/debug/substrate --dev --base-path=./node-data
@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Apr 14, 2022
@bkchr
Copy link
Member

bkchr commented Apr 14, 2022

So you can reproduce this every time?

@bkchr
Copy link
Member

bkchr commented Apr 14, 2022

Have you tried to use a different storage device?

@tylerztl
Copy link
Author

Yes, tried several new virtual machines will appear.

@tylerztl
Copy link
Author

Have you tried to use a different storage device?

I guess it is caused by upgrading the version of rocksdb, it is normal to use paritydb.

@bkchr
Copy link
Member

bkchr commented Apr 14, 2022

What you mean by upgrading rocksdb?

@tylerztl
Copy link
Author

What you mean by upgrading rocksdb?

#11144

@clearloop
Copy link
Contributor

clearloop commented Apr 15, 2022

What you mean by upgrading rocksdb?

Hi @bkchr,

we have compiled the node-template on branch @polkadot-v0.9.18 in both centos 7 and debian 11

reproducing instructions

# centos7

git clone https://github.com/paritytech/substrate.git -b polkadot-v0.9.18
cd substrate 
cargo b --release
./target/release/node-template --dev --base-path node 

# stop the program and restart it 
./target/release/node-template --dev --base-path node 

# then it'll be crashed with the panicing log like @tylerztl metioned above

but, if we go with parity-db

./target/release/node-template --dev --base-path node  --database paritydb

everything works perfectly

NOTE:

  • debian 11 doesn't have this issue
  • centos 7 on branch polkadot-v0.9.17 doesn't have this issue

@ggwpez
Copy link
Member

ggwpez commented Apr 15, 2022

I cannot reproduce this 🤔 Can you repduce it with this python script?

import random
import subprocess
import time

CMD = ["target/release/node-template", "--dev", "-d", "node_data"]

while True:
	# Delete the data folder with a probability of 10%
	if random.randint(0, 100) < 10:
		print("Removing data folder")
		subprocess.run(["rm", "-rf", "node_data"])

	print("Spawning node")
	p = subprocess.Popen(CMD)

	s = random.uniform(1.0, 10.0)
	print("Sleeping for %d seconds" % s)
	time.sleep(s)

	print("Killing node")
	p.terminate()

	if p.wait() != 0:
		print("Node exited with non-zero exit code")
		break

@bkchr
Copy link
Member

bkchr commented Apr 18, 2022

@ggwpez did you tried CentOS?

@bkchr
Copy link
Member

bkchr commented Apr 22, 2022

Here someone has experienced a similar crash with rocksdb: paritytech/cumulus#1194

@tylerztl did you tried to revert the rocksdb upgrade?

@bkchr
Copy link
Member

bkchr commented Apr 22, 2022

Okay, the mentioned rocksdb update only changed some features, so probably not our problem here.

@abhath-labs
Copy link

abhath-labs commented Apr 22, 2022

Regarding paritytech/cumulus#1194, I tried using --database paritydb and the same issue happens. Using Ubuntu 16.04.7 LTS
Update : the issue does not exist on Debian11 and MacOS 12.3

@bkchr
Copy link
Member

bkchr commented Apr 25, 2022

What error do you get with paritydb?

@abhath-labs
Copy link

Same as before, adding the --database paritydb did not make a difference.

2022-04-20 16:47:13 [Relaychain] DB corrupted: Corruption: Corrupted compressed block contents: Snappy. Repair will be triggered on next restart    
2022-04-20 16:47:13 [Relaychain] Block import error: Database    
2022-04-20 16:47:13 [Relaychain] 💔 Error importing block 0xf70389c0893eb2603c86c087d5472bcdf63e401f9cd9d46e8ac4678009c652fb: consensus error: Import failed: Import failed: Database    
2022-04-20 16:47:13 [Relaychain] 💔 Error importing block 0x2474ce5dbf0075dd96a6af41b3d7e846473ce571ecae9b74d268d198294b8801: block has an unknown parent    
2022-04-20 16:47:13 [Relaychain] 💔 Error importing block 0xaf3e7e3e4429cf3dae316fa074d688d660c45eed5d18ad2cfa6a72ac87f4830f: block has an unknown parent    

@bkchr
Copy link
Member

bkchr commented May 4, 2022

So you have resynced with ParityDb?

Can you maybe reproduce this in some docker image?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

No branches or pull requests

5 participants