Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync issue for big databases (archive nodes) #215

Open
kogeler opened this issue Jul 19, 2023 · 14 comments
Open

sync issue for big databases (archive nodes) #215

kogeler opened this issue Jul 19, 2023 · 14 comments
Assignees

Comments

@kogeler
Copy link

kogeler commented Jul 19, 2023

role: full(archive)
binary: docker pull parity/polkadot:v0.9.42
instance: GCP - t2d-standard-4
disk: GCP - SSD persistent disk
OS: Container-Optimized OS from Google
kernel: 5.10.162+
CLI flags:

--name=${POD_NAME} \
--base-path=/chain-data \
--keystore-path=/keystore \
--chain=${CHAIN} \
--database=paritydb \
--pruning=archive \
--prometheus-external \
--prometheus-port 9615 \
--unsafe-rpc-external \
--unsafe-ws-external \
--rpc-cors=all \
--in-peers 75 \
--out-peers 25 \
--public-addr=/ip4/${EXTERNAL_IP}/tcp/${RELAY_CHAIN_P2P_PORT} \
 --listen-addr=/ip4/0.0.0.0/tcp/30333 \

I'm trying to sync backup nodes from scratch. I have 8 nodes (Kusama, Polkadot, archive, prune, rocksdb, paritydb).

I use the same instances, regions, and CLI flags.
All nodes have 100 peers (in 75/out 25).

2 archive rocksdb nodes (Kusama, Pollkadot) synced in a couple of days.
But 2 archive paritydb nodes (Kusama, Pollkadot) have been syncing for 1,5 weeks. At some point (around 15M blocks), the sync rate decreased quickly. Now it is less than 0 blocks/second. Restars don't help.
It looks like an issue of paritydb.

The current state is:
Kusama - target=#18852634 (100 peers), best: #15387848 (0x117e…c8a0), finalized #15387648 (0x065b…4684), ⬇ 705.8kiB/s ⬆ 461.9kiB/s
Polkadot - target=#16463661 (100 peers), best: #15045441 (0x9fc3…0db0), finalized #15045402 (0xfef9…401b), ⬇ 134.9kiB/s ⬆ 125.2kiB/s

image

The disk sub-system is overloaded: 15k iops and 100MB/s by reading.

@kogeler
Copy link
Author

kogeler commented Jul 19, 2023

@kogeler
Copy link
Author

kogeler commented Jul 19, 2023

ref: #212

@arkpar
Copy link
Member

arkpar commented Jul 19, 2023

@kogeler Woud it be possible to get SSH access to the machine as well?

@arkpar arkpar self-assigned this Jul 20, 2023
@kogeler
Copy link
Author

kogeler commented Jul 20, 2023

@arkpar I think it is not possible because it's just a pod in a k8s cluster. Yes, it uses a dedicated k8s node, but it isn't trivial to connect to the runtime environment. I shutdown the pod and made a snapshot of the GCP disk to upload the copy of the DB.

@arkpar
Copy link
Member

arkpar commented Jul 21, 2023

Once the index file stops fitting in memory, hash index search becomes a major bottleneck. There will be some improvenets in the next parity-db release, but ultimately this will be resolve with #199, which will remove index lookups for trie nodes.

@arkpar
Copy link
Member

arkpar commented Jul 21, 2023

@kogeler Could you then upload a copy of rocksdb database as well? For some reference testing.

@kogeler
Copy link
Author

kogeler commented Jul 24, 2023

@arkpar Do you need a full-synced archive copy of rocksdb?

@arkpar
Copy link
Member

arkpar commented Jul 24, 2023

@kogeler Fully synced or around the same block as the parity-db snapshot (15m -ish). If it is too much trouble, I can probably sync one myself, even though it will take a few days.

@kogeler
Copy link
Author

kogeler commented Aug 2, 2023

@arkpar You can download our public periodic DB snapshots using the manual

@kogeler
Copy link
Author

kogeler commented Aug 29, 2023

@arkpar Are there any updates?

@arkpar
Copy link
Member

arkpar commented Aug 29, 2023

We are working on a major feature (#199) that will resolve this. It will take a few weaks to land in substrate/polkadot.

@kogeler
Copy link
Author

kogeler commented Mar 25, 2024

@arkpar Are there any updates about this issue?

@arkpar
Copy link
Member

arkpar commented Mar 25, 2024

Substrate integration is a work in progress that can be tracked here:
paritytech/polkadot-sdk#3386

@BulatSaif
Copy link

I tested the Rococo chain.

To sync a Rococo paritydb archive node, it took 13 days, whereas it took just 40 hours on rocksdb. The hardware is the same.

Block height paritydb:
image

Block height rocksdb:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants