-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key #13864
Comments
Do you use paritydb? |
Yes, and our latest node using the latest paritydb |
Latest patch release? |
Yes, 0.4.6, I forgot the previous collators' used version For 0.4.6, our users reported another issue paritytech/parity-db#198 but I think it is unrelated to this one... |
Can you provide the db? CC @arkpar |
I need to ask the user, left message to him |
Ah sorry, that user says he doesn't back up the DB that's too big to upload Now I have dozens of nodes using paritydb, not sure I can meet them again |
Without a db it will probably be not possible to find out what is wrong. |
I understand... sorry... I shall ensure the reporter keeps the broken DB in next time |
Thank you! |
@bkchr In your knowledge, does this error only affect ParityDB? |
I can not say for sure, but I think so. |
@bkchr @arkpar Hi, could you provide your SSH pub key? we success trigger this again when we sync Polkadot, and my colleague catch a weird case that finalized > best, and paritytech/parity-db#198 |
Sure, I'd take a look
Please also specify a how to run a node that uses this database |
Sorry for late response. SSH key added, please try I just validated all uploaded databases can reproduce problem again... In home Case 1: Case 2: Case 3:
I hope these sample could help. I also backup original uploaded artifects in |
This one is broken indeeed. Somehow an older commit was written on top of the new state. I'm going to add some changes to the database that will prevent this.
Opens ans sync fine for me. This is probably the issue that was fixed in parity-db 0.4.6.
Thils looks more of a cumulus issue.
There are ~600000 blocks on top of the last finalized block, so cumulus tries to query a huge tree route on startup here: @bkchr I think we've ran into similar issues before. |
Yeah we had this issue in the transaction pool. @davxy can you please handle this? |
This is my colleague's home server so it's ZFS |
For case 3, AFAIK they use SATA SSD too, our users already know HDD can't run a node well, no matter RocksDB or ParityDB.
I just report we might facing #9360 this issue, not sure if they're the same thing |
My colleague won't give up, he is uploading a new case which he just confirm has the "Trie lookup error" We're using Polkadot v0.9.41 official build which we can't change the ParityDB version, so we don't know if this issue is fixed in the newer version but our Khala node is using 0.4.6, and he triggered the error last week, sadly we don't backup the DB, and it's not easy to reproduce once he uploaded it, I shall validate myself to ensure it can reproduce the error |
IIRC ZFS is configured to use 128k IO pages by default, which is terrible for DB performance. Cosider using much smaller record size or a different FS. |
Ok, I'm letting him discuss here, he is an experienced sysops not developer |
@bkchr For this issue, because we use ink, our off-chain node uses the same Runtime infra with Substrate,
The fix is simple (it's fixed on our side) Phala-Network/phala-blockchain#1246
I'm thinking if there's a bug inside Substrate that not well maintain the RC? |
Please for reference, @jasl send you issues designated as problem1 and problem2. After sending them to him, i removed them and started two new syncs: So lets recap:
problem3 partial log (as the same error keeps looping):
and for problem4:`2023-04-15 22:02:38 ⚙️ Syncing 0.0 bps, target=#17498449 (75 peers), best: #7676973 (0x726c…2a34), finalized #7676928 (0xf6c1…c683), ⬇ 19.3kiB/s ⬆ 14.7kiB/s ==================== Version: 0.9.41-e203bfb396e Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0x8d911c369321efa0dc5d5079d12430281450c42bfbb7658febaf1c761444e71f"', /usr/local/cargo/git/checkouts/substrate-7e08433d4c370a21/980eb16/primitives/state-machine/src/ext.rs:192 This is a bug. Please report it at:
` Until the crash it starts and works just fine and the speed looks fine too. Although i am seeing extreme I/O expecially when using khala-node (syncing phala + polkadot together) which looks like a whole different issue (looks a lot like #9360), but lets skip that for now and focus on the DB issue. I have 6 systems, 2 ZFS systems and 4 HW-RAID Dells (2x R450 and 2x R730xd) all showing the same issue. And its not that i am doing weird magic with startup: `#!/bin/sh WORK_PATH=$(dirname $(readlink -f "$0")) NODE_NAME="${NODE_NAME:-"polkadot-node"}" $WORK_PATH/polkadot I am now retrying the sync with @jasl original script (max-run 16, cache 8, out-peers 4, inpeers 20), but i doubt this should impact the DB at this level. I also noticed nodes sync attempts sometimes just stopped syncing (no errors, nothing, just no block movement) where the finalized block is beyond the best block, which would be kinda weird as well? But unsure if that is related to the trie errors So for me, i can't get beyond the point where it would sync to 7202158 and 7676973 (lets call this 7-8M range, cause all attempts so far crash in this area) and is pretty fast about it. If i run benchmarks with e.g. Elastic or MySQL the system (also on long term looped querying) is just fast and responsive. If the underlying storage is at fault, it would not sync for such a long time and the issues would start much sooner. Ill be happy to switch block sizes in ZFS if you feel that helps ... however i just started 2 new attempts of syncing and they are each syncing in about : ` kusuma: ` In the stats i do not get any messages at the I/O is full, there are moments where it is more busy than others, mainly with a lot of random read I/O. During that time the block sync does go to 0 bps for a few lines, then returns back to the original speeds and starts to write with 83MB/s and 107MB/s (funny enough the kusuma one always seems faster). but i would hardly say this I/O would be slow. We have always known node-software is poor on I/O when syncing (which is why HDD's are no-go's), but if i look on how much I/O it does on SSD's it is soo much (e.g. when i sync 200GB, the writes on the SSD increased with 3839GB) even factoring the ZFS parity, having more then 20x the write than data seems weird for me, and this does not happen when i copy/rsync or use other applications with heavy write (e.g. during the MySQL/Elastic testing). That all said, i am uploading the problem3 and problem4 folders to @jasl at this moment, which are about 200GB each so will take some time to be send over. I'll be happy to see what i can aid in debugging this, but i am not a developer. |
@arkpar I've tried to locally replicate the issue 3
Unfortunately I'm not able to locally replicate the issue. E.g. the level monitor initialization is smooth even with 600_000 blocks between the finalized and the leaf block (maybe the problem manifests when the db is a bit more populated? Before patching some hacky code with a completely arbitrary bound I'd like to:
Is this scenario replicated using paritydb as well? |
I've checked again, and it does get stuck in tree_route. I've put some traces there, and it looks like a single call to tree_route only takes a couple of seconds indeed. But it is called for every leaf. And there a lot of leaves because finality is long behind. Each small fork created since final block adds a leaf. So it go to a few hundred thousand leaves. Which is an issue of its own, I guess. |
Got the DB. I'm going to try with an optimization I made. Ty |
Weird, my colleague upload a new case but when I validate his upload database, I see the error is
|
Hi, So far the problem keeps popping up, between different hardware systems and configurations. I also retested two Dell R450 with DC SSD's in RAID0 which also gave this error. Also all machines have been extra tested with CPU burn, memtest, I/O tester as well as a ZFS scan/fsck but zero errors found. So far it was not possible to sync any of time till the current height. That said these are all high-end systems, i also have a Dell R730XS which is not low-end but it only has 10K drives in two RAID6 sets, so I/O wise pretty slow. Interesting enough this node just passed block 10M which is the furthest i have gotten any of the nodes. The only difference is that this machine is very slowly syncing (like barely 1 bps). Which started me to think if this might be triggered because it is syncing too fast. We always had problems with node syncing, specifically I/O bound (which i see some fixes might be in that area now by @davxy ) which led to me thinking in last night about old notes i have on this topic. My idea so far:
During my testing time a few DB's got lost and removed before fully sending to @jasl, which is totally my fault and caused some delays. Currently i have 3 broken DB's with Trie errors (2 crashing type, 1 looping type) for which the first (problem6) is currently being uploaded to @jasl to double check this. The other two are larger (140 problem7 and 180GB problem8) which take more time to upload, but i can provide a reverse shell to the system if you want to take a look. So far i have tested them with either TrueNAS on ZFS directly, or inside a VM with ext4. My next step is to reinstall the machines (after uploading all the data ofc) with Ubuntu 22.04 LTS and MDADM raid and retry syncing. I am also retrying a sync but with RocksDB instead of ParityDB on the same setup where the problem6 happened. extra: |
@jasl Would it be possible to share the whole VM? |
Sure! This is a dedicated server in German, Xeon E2386G, 32G ECC mem, 2T x 2 RAID0 NVMe SSD, newly installed Ubuntu 22.04 without configuration (just apt update), so it's a perfect sample I think, please do what you want, hope it could help to identify the issue
|
I am nearly completed syncing 4 different nodes with rocksdb, 2 of them are nearly in the end-zone ... but the DB speed is still terrible after block 8M. |
I've ran a latest polkadot master on this machine and it synced with no issues ( |
It's not stable to reproduce, and unpredictable which block will fail, @PHA-SYSOPS makes a 6 machines cluster with different hardware to sync round and round and can reproduce a few times. Do you find something in the broken DB? |
Polkadot master branch can also continue syncing the database in |
OK, thank you for your time, I'll close this issue now |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Hi, I'm from Phala team, previously, our all collators (based on Polkadot 0.9.34) sudden looping panic with this error
here's a log
khala_collator_1.log.zip
At that time, I tried many ways but our collators still can't back to work, luckily I'm testing our new one based on Polkadot v0.9.39 node, so I try to use that binary, and they finally work again, so I was wondering the bug has been fixed.
But recently, a user reported this error with our latest node based on Polkadot v0.9.40, so I believe there still has a bug
I ask him for backup DB, but it's too big (~1T)
Steps to reproduce
No reproduce step, it's occurred randomly and very rare
The text was updated successfully, but these errors were encountered: