Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key #13864

Closed
2 tasks done
jasl opened this issue Apr 9, 2023 · 38 comments
Closed
2 tasks done

Comments

@jasl
Copy link
Contributor

jasl commented Apr 9, 2023

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Hi, I'm from Phala team, previously, our all collators (based on Polkadot 0.9.34) sudden looping panic with this error

here's a log
khala_collator_1.log.zip

At that time, I tried many ways but our collators still can't back to work, luckily I'm testing our new one based on Polkadot v0.9.39 node, so I try to use that binary, and they finally work again, so I was wondering the bug has been fixed.

But recently, a user reported this error with our latest node based on Polkadot v0.9.40, so I believe there still has a bug

I ask him for backup DB, but it's too big (~1T)

Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x4806be757d6e6f3be9fed0cdf4d5b03f8563e10beef7cc64994d6ddc3ce19448"', /root/.cargo/git/checkouts/substrate-7e08433d4c370a21/76fed9b/primitives/state-machine/src/ext.rs:192

====================

Version: 0.1.23-dev-ae1f5251e7e

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/alloc/src/boxed.rs:2002:9
      std::panicking::rust_panic_with_hook
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/std/src/panicking.rs:579:13
   3: std::sys_common::backtrace::rust_end_short_backtrace
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/core/src/panicking.rs:64:14
   6: core::result::unwrap_failed
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/core/src/result.rs:1790:5
   7: <sp_state_machine::ext::Ext<H,B> as sp_externalities::Externalities>::storage
   8: sp_io::storage::get_version_1
   9: sp_io::storage::ExtStorageGetVersion1::call
  10: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller<T>,A1),R>>::into_func::wasm_to_host_shim
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: wasmtime_runtime::traphandlers::catch_traps::call_closure
  21: wasmtime_setjmp
  22: sc_executor_wasmtime::runtime::perform_call
  23: <sc_executor_wasmtime::runtime::WasmtimeInstance as sc_executor_common::wasm_runtime::WasmInstance>::call_with_allocation_stats
  24: sc_executor_common::wasm_runtime::WasmInstance::call_export
  25: sc_executor::native_executor::WasmExecutor<H>::with_instance::{{closure}}
  26: <sc_executor::native_executor::WasmExecutor<H> as sp_core::traits::CodeExecutor>::call
  27: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_aux
  28: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_using_consensus_failure_handler
  29: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  30: <khala_parachain_runtime::RuntimeApiImpl<__SrApiBlock,RuntimeApiImplCall> as sp_api::Core<SrApiBlock>>::__runtime_api_internal_call_api_at
  31: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  32: <alloc::sync::Arc<T> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  33: <cumulus_client_consensus_common::ParachainBlockImport<Block,BI,BE> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  34: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+Transaction = Transaction+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  35: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  36: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  37: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  38: tokio::runtime::task::raw::poll
  39: std::sys_common::backtrace::__rust_begin_short_backtrace
  40: core::ops::function::FnOnce::call_once{{vtable.shim}}
  41: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/alloc/src/boxed.rs:1988:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/alloc/src/boxed.rs:1988:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/f3126500f25114ba4e0ac3e76694dd45a22de56d/library/std/src/sys/unix/thread.rs:108:17
  42: <unknown>
  43: <unknown>

Steps to reproduce

No reproduce step, it's occurred randomly and very rare

@bkchr
Copy link
Member

bkchr commented Apr 9, 2023

Do you use paritydb?

@jasl
Copy link
Contributor Author

jasl commented Apr 9, 2023

Do you use paritydb?

Yes, and our latest node using the latest paritydb

@bkchr
Copy link
Member

bkchr commented Apr 9, 2023

Latest patch release?

@jasl
Copy link
Contributor Author

jasl commented Apr 9, 2023

Latest patch release?

Yes, 0.4.6, I forgot the previous collators' used version

For 0.4.6, our users reported another issue paritytech/parity-db#198 but I think it is unrelated to this one...

@bkchr
Copy link
Member

bkchr commented Apr 9, 2023

Can you provide the db?

CC @arkpar

@jasl
Copy link
Contributor Author

jasl commented Apr 9, 2023

I need to ask the user, left message to him

@jasl
Copy link
Contributor Author

jasl commented Apr 9, 2023

Can you provide the db?

CC @arkpar

Ah sorry, that user says he doesn't back up the DB that's too big to upload

Now I have dozens of nodes using paritydb, not sure I can meet them again

@bkchr
Copy link
Member

bkchr commented Apr 9, 2023

Without a db it will probably be not possible to find out what is wrong.

@jasl
Copy link
Contributor Author

jasl commented Apr 9, 2023

Without a db it will probably be not possible to find out what is wrong.

I understand... sorry...

I shall ensure the reporter keeps the broken DB in next time

@bkchr
Copy link
Member

bkchr commented Apr 9, 2023

Thank you!

@jasl
Copy link
Contributor Author

jasl commented Apr 10, 2023

@bkchr In your knowledge, does this error only affect ParityDB?

@bkchr
Copy link
Member

bkchr commented Apr 10, 2023

I can not say for sure, but I think so.

@jasl
Copy link
Contributor Author

jasl commented Apr 15, 2023

@bkchr @arkpar Hi, could you provide your SSH pub key?

we success trigger this again when we sync Polkadot, and my colleague catch a weird case that finalized > best, and paritytech/parity-db#198
We upload these bad databases to a GCP server and they're > 100G

@arkpar
Copy link
Member

arkpar commented Apr 15, 2023

Sure, I'd take a look

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDCZ0p00e9LEqvwLYAqRwDd258JweLKoVP+3dwHIUtO6Dn7eUntlHJ4sVTwuc/0cgCc3ol/ZYpWZEpY2V9Wa7B76+XQThWgOWeUQiD/0GUOWtfVybtBBeGwb7ZoZNxgVpgiyedJQ1VGv3KvmxCCrY2ajDM7Rc++6vVyV2V7+M7gbVivK7/DpwMAdhHPsDQYfwJ0Gb6yCfgzGSY9UAQNhm+rXHQHyZU2/ij44603fNweU2JN90reEYbdOimf0wYjOjmd6PtJWGNYeEcwCDX3sy660sol6RGc22sZB968GhoXR+vHdpK1IRdoSPsx7b8jffR6R6RZV9yoRW4Gy2yQdZyJXajomVdMxXs1rirWzaWUFPt2fyo8acklZjLEdZ+S7f9kZJ+CMUqL/cqL3VXaKYpf5n/SUAsqpxK1AbAPffCfUPzHZ7PpvMW+iuh0zFwvLuabS56uERWVhCOcnH4Ei0KpOOtEQ9EeJ8tX2m2mn5prbpZlb51pgEyA4+BqXHLprtc= arkadiy@localhost

Please also specify a how to run a node that uses this database

@jasl
Copy link
Contributor Author

jasl commented Apr 15, 2023

Sure, I'd take a look

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDCZ0p00e9LEqvwLYAqRwDd258JweLKoVP+3dwHIUtO6Dn7eUntlHJ4sVTwuc/0cgCc3ol/ZYpWZEpY2V9Wa7B76+XQThWgOWeUQiD/0GUOWtfVybtBBeGwb7ZoZNxgVpgiyedJQ1VGv3KvmxCCrY2ajDM7Rc++6vVyV2V7+M7gbVivK7/DpwMAdhHPsDQYfwJ0Gb6yCfgzGSY9UAQNhm+rXHQHyZU2/ij44603fNweU2JN90reEYbdOimf0wYjOjmd6PtJWGNYeEcwCDX3sy660sol6RGc22sZB968GhoXR+vHdpK1IRdoSPsx7b8jffR6R6RZV9yoRW4Gy2yQdZyJXajomVdMxXs1rirWzaWUFPt2fyo8acklZjLEdZ+S7f9kZJ+CMUqL/cqL3VXaKYpf5n/SUAsqpxK1AbAPffCfUPzHZ7PpvMW+iuh0zFwvLuabS56uERWVhCOcnH4Ei0KpOOtEQ9EeJ8tX2m2mn5prbpZlb51pgEyA4+BqXHLprtc= arkadiy@localhost

Please also specify a how to run a node that uses this database

Sorry for late response.

SSH key added, please try ssh ubuntu@34.65.46.56 you can do anything you want, the VM is used for debugging.

I just validated all uploaded databases can reproduce problem again...

In home /home/ubuntu. there are 3 cases:

Case 1:
broken1.sh shall start a polkadot v0.9.41 with database polkadot.broken1, this case shows
Syncing, target=#15104383 (12 peers), best: #662528 (0xdc93…5db5), finalized #699904 (0xbb2d…84cd), ⬇ 904.6kiB/s ⬆ 27.0kiB/s
Finalized higher than best

Case 2:
broken2.sh shall start a polkadot v0.9.41 with database polkadot.broken2, this case shows
Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x2ef32d4e98fc876924394ebec5841a0d49d08d236a9e6a65eb21249fa5ba0a03"'
But in the server, it stuck on booting for at least an hour, so I can't validate it

Case 3:

cd stuck-on-booting && docker compose up, this case is paritytech/parity-db#198 the DB provided by our user, the node is based on polkadot-v0.9.41 branch, Phala-Network/khala-parachain@c6a82f2 this commit

I hope these sample could help.

I also backup original uploaded artifects in ~/tmp to avoid validating process broken the case.

@arkpar
Copy link
Member

arkpar commented Apr 16, 2023

Case 1:

This one is broken indeeed. Somehow an older commit was written on top of the new state. I'm going to add some changes to the database that will prevent this.

Case 2:

Opens ans sync fine for me. This is probably the issue that was fixed in parity-db 0.4.6.
It also has about 5Gb of queued commits so it takes a while to open on a slow disk. The commit queue may only grow this large if the disk is slow to perform fsync. I'd like to know what kind of hardware and OS this was originally running on? Was that an SSD at least?

Case 3:

Thils looks more of a cumulus issue.

2023-04-16 10:44:16.939 DEBUG main db: [Parachain] Opened blockchain db, fetched best = 0x49d94deb5f43067ba6783ef33102aaa58d42d561412f693a3c559f51d9f9b501 (2290332)
2023-04-16 10:44:16.939 DEBUG main db: [Parachain] Opened blockchain db, fetched final = 0x74a8f4665c44ed0aa9f16352673d778ac989221ae63cbb04b2d479f5da271a39 (1697672)

There are ~600000 blocks on top of the last finalized block, so cumulus tries to query a huge tree route on startup here:
https://github.com/paritytech/cumulus/blame/master/client/consensus/common/src/level_monitor.rs#L114
and gets stuck.

@bkchr I think we've ran into similar issues before. tree_route must not be called when the difference in block number for the start and end block in the route is too high.

@bkchr
Copy link
Member

bkchr commented Apr 16, 2023

@bkchr I think we've ran into similar issues before. tree_route must not be called when the difference in block number for the start and end block in the route is too high.

Yeah we had this issue in the transaction pool.
But yeah we should handle this, especially as this could happen after syncing a node and the relay chain hasn't finished syncing yet.

@davxy can you please handle this?

@jasl
Copy link
Contributor Author

jasl commented Apr 16, 2023

Opens ans sync fine for me. This is probably the issue that was fixed in parity-db 0.4.6.
It also has about 5Gb of queued commits so it takes a while to open on a slow disk. The commit queue may only grow this large if the disk is slow to perform fsync. I'd like to know what kind of hardware and OS this was originally running on? Was that an SSD at least?

This is my colleague's home server
hardware:
2x 9300 LSI 16 port controller
32x 1TB in 4x 8 RAIDZ2
Os TrueNAS (with 2 VM;s on it running the nodes)

so it's ZFS

@jasl
Copy link
Contributor Author

jasl commented Apr 16, 2023

For case 3, AFAIK they use SATA SSD too, our users already know HDD can't run a node well, no matter RocksDB or ParityDB.

There are ~600000 blocks on top of the last finalized block

I just report we might facing #9360 this issue, not sure if they're the same thing

@jasl
Copy link
Contributor Author

jasl commented Apr 16, 2023

My colleague won't give up, he is uploading a new case which he just confirm has the "Trie lookup error"

We're using Polkadot v0.9.41 official build which we can't change the ParityDB version, so we don't know if this issue is fixed in the newer version

but our Khala node is using 0.4.6, and he triggered the error last week, sadly we don't backup the DB, and it's not easy to reproduce

once he uploaded it, I shall validate myself to ensure it can reproduce the error

@arkpar
Copy link
Member

arkpar commented Apr 16, 2023

IIRC ZFS is configured to use 128k IO pages by default, which is terrible for DB performance. Cosider using much smaller record size or a different FS.

@jasl
Copy link
Contributor Author

jasl commented Apr 16, 2023

IIRC ZFS is configured to use 128k IO pages by default, which is terrible for DB performance. Cosider using much smaller record size or a different FS.

Ok, I'm letting him discuss here, he is an experienced sysops not developer

@jasl
Copy link
Contributor Author

jasl commented Apr 16, 2023

@bkchr For this issue, because we use ink, our off-chain node uses the same Runtime infra with Substrate,
last week we fixed a bug in our off-chain computing node which also panicked with

state-machine/src/ext.rs:Externalities not allowed to fail within runtime: "Trie lookup error: Decoding failed for hash 0x2a2e4115bccf59eb99e5b7400da19d57a2e7ec0b8a7ebcf34db228a2bccc8027; err: Decode(Error { cause: None, desc: "out of data" })"', /root/.cargo/git/checkouts/substrate-7e08433d4c370a21/ba87188/primitives/state-machine/src/ext.rs:237:237:61
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

The fix is simple (it's fixed on our side) Phala-Network/phala-blockchain#1246

The trie framework will pass an empty value when the rc is negative. So, the value would be mis-cleared when decreasing the rc.
This PR fixes it by setting the value only when rc > 0.

I'm thinking if there's a bug inside Substrate that not well maintain the RC?

@PHA-SYSOPS
Copy link

PHA-SYSOPS commented Apr 16, 2023

Please for reference, @jasl send you issues designated as problem1 and problem2. After sending them to him, i removed them and started two new syncs:

So lets recap:

  • I start 2 nodes in a VM (Qemu) (which is ext4 btw) one with polkadot the other kusuma to sync. The VM's directly run on the TrueNAS system, which yes runs ZFS, but as block devices served to the VM, as such block sizes shouldnt matter. I also looked up trough documentation and did not find any tuning/specs on this area btw. But lets circle back on this issue.
  • they sync for 2/3 days and reaches 7202158 and 7676973 resp.
  • Then 2 different trie errors appear. Both errors seems to resolve about a missing key, but their behavior is different, one just keeps spitting out errors and stop sync (designation problem3) and the other crashes and spits out the trie error (designation problem4)

problem3 partial log (as the same error keeps looping):

2023-04-15 21:45:38 ⚙️ Syncing 0.0 bps, target=#15106642 (100 peers), best: #7202158 (0x751b…aa0c), finalized #7201792 (0x01b8…90a5), ⬇ 309.7kiB/s ⬆ 311.7kiB/s 2023-04-15 21:45:39 Failed to write to trie: Database missing expected key: 0xec8604a2b16587a2fe61d76c4dfd079f866b5ff262fec5fef19505442050bae5 2023-04-15 21:45:39 panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/91061a7/frame/executive/src/lib.rs:473:9 2023-04-15 21:45:39 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm unreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block 2023-04-15 21:45:44 💔 Error importing block 0xb845970efd3e40564de3a1bcadd16b209f2a5743372e51784d0e1f50c5045e0e: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block 2023-04-15 21:45:44 💔 Error importing block 0xfbde0a5e93072626a7b0d5f80d80026d6c450b5d34ca4538aba2ede1d51cd7d3: block has an unknown parent 2023-04-15 21:45:44 💔 Error importing block 0x2df9d1bead7bafc3bbe673510202f27e53868faaa689803d3c72c6790859eaa7: block has an unknown parent 2023-04-15 21:45:44 💔 Error importing block 0xa0d56ab5624d36a028d861879178a7cba041ba075e726046c1acd1f2304f0002: block has an unknown parent 2023-04-15 21:45:44 Failed to write to trie: Database missing expected key: 0xec8604a2b16587a2fe61d76c4dfd079f866b5ff262fec5fef19505442050bae5 2023-04-15 21:45:44 panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/91061a7/frame/executive/src/lib.rs:473:9 2023-04-15 21:45:44 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block 2023-04-15 21:45:44 💔 Error importing block 0xb845970efd3e40564de3a1bcadd16b209f2a5743372e51784d0e1f50c5045e0e: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block 2023-04-15 21:45:44 Failed to write to trie: Database missing expected key: 0xec8604a2b16587a2fe61d76c4dfd079f866b5ff262fec5fef19505442050bae5 2023-04-15 21:45:44 panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/91061a7/frame/executive/src/lib.rs:473:9 2023-04-15 21:45:44 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block 2023-04-15 21:45:44 💔 Error importing block 0xb845970efd3e40564de3a1bcadd16b209f2a5743372e51784d0e1f50c5045e0e: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x235d - <unknown>!rust_begin_unwind 1: 0x20aa - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b 2: 0x173f - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a 3: 0x1a81ec - <unknown>!Core_execute_block

and for problem4:

`2023-04-15 22:02:38 ⚙️ Syncing 0.0 bps, target=#17498449 (75 peers), best: #7676973 (0x726c…2a34), finalized #7676928 (0xf6c1…c683), ⬇ 19.3kiB/s ⬆ 14.7kiB/s

====================

Version: 0.9.41-e203bfb396e
0: sp_panic_handler::set::{{closure}}
1: <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
std::panicking::rust_panic_with_hook
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
2: std::panicking::begin_panic_handler::{{closure}}
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13
3: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
4: rust_begin_unwind
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
5: core::panicking::panic_fmt
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
6: core::result::unwrap_failed
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/result.rs:1790:5
7: <sp_state_machine::ext::Ext<H,B> as sp_externalities::Externalities>::storage
8: sp_io::storage::get_version_1
9: sp_io::storage::ExtStorageGetVersion1::call
10: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller,A1),R>>::into_func::wasm_to_host_shim
11:
12:
13:
14:
15:
16:
17:
18:
19: wasmtime_runtime::traphandlers::catch_traps::call_closure
20: wasmtime_setjmp
21: sc_executor_wasmtime::runtime::perform_call
22: <sc_executor_wasmtime::runtime::WasmtimeInstance as sc_executor_common::wasm_runtime::WasmInstance>::call_with_allocation_stats
23: sc_executor_common::wasm_runtime::WasmInstance::call_export
24: sc_executor::native_executor::WasmExecutor::with_instance::{{closure}}
25: <sc_executor::native_executor::NativeElseWasmExecutor as sp_core::traits::CodeExecutor>::call
26: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_aux
27: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_using_consensus_failure_handler
28: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt>::call_api_at
29: <kusama_runtime::RuntimeApiImpl<SrApiBlock,RuntimeApiImplCall> as sp_api::Core<SrApiBlock>>::__runtime_api_internal_call_api_at
30: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport>::import_block::{{closure}}
31: <sc_consensus_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport>::import_block::{{closure}}
32: <sc_consensus_beefy::import::BeefyBlockImport<Block,BE,Runtime,I> as sc_consensus::block_import::BlockImport>::import_block::{{closure}}
33: <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sc_consensus::block_import::BlockImport>::import_block::{{closure}}
34: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport+Transaction = Transaction+Error = sp_consensus::error::Error+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport>::import_block::{{closure}}
35: sc_consensus::import_queue::basic_queue::BlockImportWorker::new::{{closure}}
36: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
37: <tracing_futures::Instrumented as core::future::future::Future>::poll
38: tokio::runtime::task::raw::poll
39: std::sys_common::backtrace::__rust_begin_short_backtrace
40: core::ops::function::FnOnce::call_once{{vtable.shim}}
41: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:1988:9
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:1988:9
std::sys::unix::thread::Thread::new::thread_start
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys/unix/thread.rs:108:17
42:
43:

Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x8d911c369321efa0dc5d5079d12430281450c42bfbb7658febaf1c761444e71f"', /usr/local/cargo/git/checkouts/substrate-7e08433d4c370a21/980eb16/primitives/state-machine/src/ext.rs:192

This is a bug. Please report it at:

    https://github.com/paritytech/polkadot/issues/new

`

Until the crash it starts and works just fine and the speed looks fine too. Although i am seeing extreme I/O expecially when using khala-node (syncing phala + polkadot together) which looks like a whole different issue (looks a lot like #9360), but lets skip that for now and focus on the DB issue.

I have 6 systems, 2 ZFS systems and 4 HW-RAID Dells (2x R450 and 2x R730xd) all showing the same issue. And its not that i am doing weird magic with startup:

`#!/bin/sh

WORK_PATH=$(dirname $(readlink -f "$0"))

NODE_NAME="${NODE_NAME:-"polkadot-node"}"
DATA_PATH="/opt/phala-node/polkadot/"

$WORK_PATH/polkadot
--chain polkadot
--base-path $DATA_PATH
--database paritydb
--name $NODE_NAME
--out-peers 50
--in-peers 50
--no-mdns
--blocks-pruning archive-canonical
--state-pruning archive-canonical
--max-runtime-instances 64
--runtime-cache-size 64
--no-hardware-benchmarks
--no-telemetry
`

I am now retrying the sync with @jasl original script (max-run 16, cache 8, out-peers 4, inpeers 20), but i doubt this should impact the DB at this level.

I also noticed nodes sync attempts sometimes just stopped syncing (no errors, nothing, just no block movement) where the finalized block is beyond the best block, which would be kinda weird as well? But unsure if that is related to the trie errors

So for me, i can't get beyond the point where it would sync to 7202158 and 7676973 (lets call this 7-8M range, cause all attempts so far crash in this area) and is pretty fast about it. If i run benchmarks with e.g. Elastic or MySQL the system (also on long term looped querying) is just fast and responsive. If the underlying storage is at fault, it would not sync for such a long time and the issues would start much sooner.

Ill be happy to switch block sizes in ZFS if you feel that helps ... however i just started 2 new attempts of syncing and they are each syncing in about :

`
polkadot:
2023-04-16 09:57:50 ⚙️ Syncing 439.0 bps, target=#15113639 (24 peers), best: #681423 (0x479b…d39a), finalized #680960 (0x53d1…c240), ⬇ 58.3kiB/s ⬆ 2.7kiB/s

kusuma:
2023-04-16 09:56:53 ⚙️ Syncing 550.4 bps, target=#17505252 (24 peers), best: #814400 (0x3ea8…7e1f), finalized #814080 (0x3267…bd12), ⬇ 189.2kiB/s ⬆ 7.2kiB/s

`

In the stats i do not get any messages at the I/O is full, there are moments where it is more busy than others, mainly with a lot of random read I/O. During that time the block sync does go to 0 bps for a few lines, then returns back to the original speeds and starts to write with 83MB/s and 107MB/s (funny enough the kusuma one always seems faster). but i would hardly say this I/O would be slow. We have always known node-software is poor on I/O when syncing (which is why HDD's are no-go's), but if i look on how much I/O it does on SSD's it is soo much (e.g. when i sync 200GB, the writes on the SSD increased with 3839GB) even factoring the ZFS parity, having more then 20x the write than data seems weird for me, and this does not happen when i copy/rsync or use other applications with heavy write (e.g. during the MySQL/Elastic testing).

That all said, i am uploading the problem3 and problem4 folders to @jasl at this moment, which are about 200GB each so will take some time to be send over. I'll be happy to see what i can aid in debugging this, but i am not a developer.

@davxy
Copy link
Member

davxy commented Apr 18, 2023

@arkpar I've tried to locally replicate the issue 3

There are ~600000 blocks on top of the last finalized block, so cumulus tries to query a huge tree route on startup here:
https://github.com/paritytech/cumulus/blame/master/client/consensus/common/src/level_monitor.rs#L114
and gets stuck.

Unfortunately I'm not able to locally replicate the issue. E.g. the level monitor initialization is smooth even with 600_000 blocks between the finalized and the leaf block (maybe the problem manifests when the db is a bit more populated?

Before patching some hacky code with a completely arbitrary bound I'd like to:

  1. be sure that is a performance degradation and what is the order (and is not some kind of bug, cause if I got it right you said it gets stuck)
  2. (if 1 is true) find out what is the bound and the reason

Is this scenario replicated using paritydb as well?
Can I have access to the db ?

@arkpar
Copy link
Member

arkpar commented Apr 18, 2023

@davxy

I've checked again, and it does get stuck in tree_route. I've put some traces there, and it looks like a single call to tree_route only takes a couple of seconds indeed. But it is called for every leaf. And there a lot of leaves because finality is long behind. Each small fork created since final block adds a leaf. So it go to a few hundred thousand leaves. Which is an issue of its own, I guess.

@davxy
Copy link
Member

davxy commented Apr 18, 2023

Got the DB. I'm going to try with an optimization I made. Ty

@jasl
Copy link
Contributor Author

jasl commented Apr 19, 2023

Weird, my colleague upload a new case polkadot.broken3 to that server, and he claims he reproduced Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x294d588cf096b9a001ef2d5817558b007fb67577c41bf3094b6578e93560a941"

but when I validate his upload database, I see the error is

====================

Version: 0.9.41-e203bfb396e

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
      std::panicking::rust_panic_with_hook
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:577:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
   6: core::panicking::panic
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:114:5
   7: parity_db::file::TableFile::read_at
   8: parity_db::column::Column::get_value
   9: parity_db::column::HashColumn::get_in_index
  10: parity_db::column::HashColumn::get
  11: parity_db::db::Db::get
  12: <sc_client_db::parity_db::DbAdapter as sp_database::Database<H>>::get
  13: sc_service::builder::new_db_backend
  14: polkadot_service::new_full
  15: polkadot_cli::command::run_node_inner::{{closure}}::{{closure}}
  16: sc_cli::runner::Runner<C>::run_node_until_exit
  17: polkadot_cli::command::run
  18: polkadot::main
  19: std::sys_common::backtrace::__rust_begin_short_backtrace
  20: main
  21: <unknown>
  22: __libc_start_main
  23: _start


Thread 'main' panicked at 'called Option::unwrap() on a None value', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/parity-db-0.4.4/src/file.rs:116

@PHA-SYSOPS
Copy link

PHA-SYSOPS commented Apr 20, 2023

Hi,

So far the problem keeps popping up, between different hardware systems and configurations. I also retested two Dell R450 with DC SSD's in RAID0 which also gave this error. Also all machines have been extra tested with CPU burn, memtest, I/O tester as well as a ZFS scan/fsck but zero errors found.

So far it was not possible to sync any of time till the current height. That said these are all high-end systems, i also have a Dell R730XS which is not low-end but it only has 10K drives in two RAID6 sets, so I/O wise pretty slow. Interesting enough this node just passed block 10M which is the furthest i have gotten any of the nodes. The only difference is that this machine is very slowly syncing (like barely 1 bps). Which started me to think if this might be triggered because it is syncing too fast.

We always had problems with node syncing, specifically I/O bound (which i see some fixes might be in that area now by @davxy ) which led to me thinking in last night about old notes i have on this topic. My idea so far:

  • We know old nodes have some bad blocks, in the past the node would 'just step over it' and even that broken block is served out to clients (which refused them obviously), and people switched their node to another, sync over it, then switch back. This happened on random blocks and never found a cause.
  • The trie error seems to be similar to this behavior, but it handles this error bricking the node and crashing it. Although it is logical behavior there is no way (for me?) to recover the database. I would think that up to a certain block the database is still fine. Having some kind of recovery would be a nice-to-have. There is a fsck tool that can scan a running node for bad block/chains, but only report IF there is an issue, not fix it.
  • We know that over time not only I/O affects syncing (gets worse) but also the virtual memory usage. This stops once the node is syncs and returns to 'normal' after the first restart of a synced node. (which might be solved with @davxy fix) ... either way, there is a build-up. If @davxy has a build for the patched version, i'd be happy to try this.

During my testing time a few DB's got lost and removed before fully sending to @jasl, which is totally my fault and caused some delays. Currently i have 3 broken DB's with Trie errors (2 crashing type, 1 looping type) for which the first (problem6) is currently being uploaded to @jasl to double check this. The other two are larger (140 problem7 and 180GB problem8) which take more time to upload, but i can provide a reverse shell to the system if you want to take a look.

So far i have tested them with either TrueNAS on ZFS directly, or inside a VM with ext4. My next step is to reinstall the machines (after uploading all the data ofc) with Ubuntu 22.04 LTS and MDADM raid and retry syncing. I am also retrying a sync but with RocksDB instead of ParityDB on the same setup where the problem6 happened.

extra:
so i started a polkadot and kusuma in a tmux together on 1 systems (diff path db) where kusama crashed with trie error and the polkadot too, but 2-3 hours later. During the last 4 hours there was only 1,8% I/O wait, which is not bad at all, no FS errors, etc.

@jasl
Copy link
Contributor Author

jasl commented Apr 20, 2023

image

FINALLY! Trie lookup error triggered, and can reproduce on that VM

this is database polkadot.broken6

@jasl
Copy link
Contributor Author

jasl commented Apr 21, 2023

@bkchr @arkpar My new Polkadot node in a new VM triggered this too, it has the same backtrace just like broken6 on that server

@arkpar
Copy link
Member

arkpar commented Apr 22, 2023

@jasl Would it be possible to share the whole VM?

@jasl
Copy link
Contributor Author

jasl commented Apr 22, 2023

@jasl Would it be possible to share the whole VM?

Sure!
added your ssh pub key, ssh ubuntu@141.95.34.133
it's still running Polkadot v0.9.41, start script ~/bin/start_polkadot_node.sh, data folder ~/data/polkadot-node

This is a dedicated server in German, Xeon E2386G, 32G ECC mem, 2T x 2 RAID0 NVMe SSD, newly installed Ubuntu 22.04 without configuration (just apt update), so it's a perfect sample I think, please do what you want, hope it could help to identify the issue

ubuntu@gr-phat-cluster-main:~$ bin/start_polkadot_node.sh
2023-04-22 09:39:04 Parity Polkadot
2023-04-22 09:39:04 ✌️  version 0.9.41-e203bfb396e
2023-04-22 09:39:04 ❤️  by Parity Technologies <admin@parity.io>, 2017-2023
2023-04-22 09:39:04 📋 Chain specification: Polkadot
2023-04-22 09:39:04 🏷  Node name: polkadot-node
2023-04-22 09:39:04 👤 Role: FULL
2023-04-22 09:39:04 💾 Database: ParityDb at //home/ubuntu/data/polkadot-node/chains/polkadot/paritydb/full
2023-04-22 09:39:04 ⛓  Native runtime: polkadot-9401 (parity-polkadot-0.tx22.au0)
2023-04-22 09:39:07 🏷  Local node identity is: 12D3KooWGBLzN3g6VbpcV1fp9RidjWBH7cqUEeJqpmnEP3iPpnub
2023-04-22 09:39:07 💻 Operating system: linux
2023-04-22 09:39:07 💻 CPU architecture: x86_64
2023-04-22 09:39:07 💻 Target environment: gnu
2023-04-22 09:39:07 💻 CPU: Intel(R) Xeon(R) E-2386G CPU @ 3.50GHz
2023-04-22 09:39:07 💻 CPU cores: 6
2023-04-22 09:39:07 💻 Memory: 31492MB
2023-04-22 09:39:07 💻 Kernel: 5.15.0-70-generic
2023-04-22 09:39:07 💻 Linux distribution: Ubuntu 22.04.2 LTS
2023-04-22 09:39:07 💻 Virtual machine: no
2023-04-22 09:39:07 📦 Highest known block at #7347283
2023-04-22 09:39:07 〽️ Prometheus exporter started at 127.0.0.1:9615
2023-04-22 09:39:07 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]
2023-04-22 09:39:07 Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]

====================

Version: 0.9.41-e203bfb396e

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
      std::panicking::rust_panic_with_hook
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
   6: core::result::unwrap_failed
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/result.rs:1790:5
   7: <sp_state_machine::ext::Ext<H,B> as sp_externalities::Externalities>::storage
   8: sp_io::storage::get_version_1
   9: sp_io::storage::ExtStorageGetVersion1::call
  10: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller<T>,A1),R>>::into_func::wasm_to_host_shim
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: wasmtime_runtime::traphandlers::catch_traps::call_closure
  17: wasmtime_setjmp
  18: sc_executor_wasmtime::runtime::perform_call
  19: <sc_executor_wasmtime::runtime::WasmtimeInstance as sc_executor_common::wasm_runtime::WasmInstance>::call_with_allocation_stats
  20: sc_executor_common::wasm_runtime::WasmInstance::call_export
  21: sc_executor::native_executor::WasmExecutor<H>::with_instance::{{closure}}
  22: <sc_executor::native_executor::NativeElseWasmExecutor<D> as sp_core::traits::CodeExecutor>::call
  23: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_aux
  24: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_using_consensus_failure_handler
  25: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  26: <polkadot_runtime::RuntimeApiImpl<__SrApiBlock__,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at
  27: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  28: <sc_consensus_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  29: <sc_consensus_beefy::import::BeefyBlockImport<Block,BE,Runtime,I> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  30: <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  31: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Transaction = Transaction+Error = sp_consensus::error::Error+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  32: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  33: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  34: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  35: tokio::runtime::task::raw::poll
  36: std::sys_common::backtrace::__rust_begin_short_backtrace
  37: core::ops::function::FnOnce::call_once{{vtable.shim}}
  38: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:1988:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:1988:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys/unix/thread.rs:108:17
  39: <unknown>
  40: <unknown>


Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x14f3857fd6a01a2bdb49617734950a21e47804875c81d3953d26c0fe69a6a99e"', /usr/local/cargo/git/checkouts/substrate-7e08433d4c370a21/980eb16/primitives/state-machine/src/ext.rs:192

This is a bug. Please report it at:

        https://github.com/paritytech/polkadot/issues/new

@PHA-SYSOPS
Copy link

I am nearly completed syncing 4 different nodes with rocksdb, 2 of them are nearly in the end-zone ... but the DB speed is still terrible after block 8M.

@arkpar
Copy link
Member

arkpar commented Apr 24, 2023

Sure!
added your ssh pub key, ssh ubuntu@141.95.34.133
it's still running Polkadot v0.9.41, start script ~/bin/start_polkadot_node.sh, data folder ~/data/polkadot-node

I've ran a latest polkadot master on this machine and it synced with no issues (/home/ubuntu/tempdb).

@jasl
Copy link
Contributor Author

jasl commented Apr 24, 2023

Sure!
added your ssh pub key, ssh ubuntu@141.95.34.133
it's still running Polkadot v0.9.41, start script ~/bin/start_polkadot_node.sh, data folder ~/data/polkadot-node

I've ran a latest polkadot master on this machine and it synced with no issues (/home/ubuntu/tempdb).

It's not stable to reproduce, and unpredictable which block will fail, @PHA-SYSOPS makes a 6 machines cluster with different hardware to sync round and round and can reproduce a few times.

Do you find something in the broken DB?

@arkpar
Copy link
Member

arkpar commented Apr 24, 2023

Polkadot master branch can also continue syncing the database in /home/ubuntu/data/polkadot-node. So it is not really broken. As it was mentioned before in this thread, this issue is fixed in parity-db 0.4.6 and polkadot master.

@jasl
Copy link
Contributor Author

jasl commented Apr 24, 2023

Polkadot master branch can also continue syncing the database in /home/ubuntu/data/polkadot-node. So it is not really broken. As it was mentioned before in this thread, this issue is fixed in parity-db 0.4.6 and polkadot master.

OK, thank you for your time, I'll close this issue now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants