Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degrading a lot with high number of keys #212

Open
crystalin opened this issue Jun 21, 2023 · 24 comments
Open

Performance degrading a lot with high number of keys #212

crystalin opened this issue Jun 21, 2023 · 24 comments

Comments

@crystalin
Copy link

crystalin commented Jun 21, 2023

Running Strorage Benchmark on 3 different networks with significant state size/content results in incoherent results.
We have beeen using Moonbeam v0.32.1 which is based on substrate 0.9.40.
The network Alphanet and Moonriver have similar state/usage overall, but Moonbeam had a project that generate a huge amount of storage (all of the same size, 42 bytes IIRC).

As you can see, the Moonbeam read and write using paritydb are way off the expected result that we see in alphanet and moonriver.

Configuration of the disk is AWS gp3 | 1000 GiB | 3000 IOPS and each network/db has its own disk (total of 6 disks).
The blocks and state are pruned to avoid having a huge disk space.

Running the storage benchmark (on c6i.4xlarge AWS):

/home/ubuntu/projects/moonbeam/target/release/moonbeam
   benchmark
   storage
   --db=${DB}
   --state-version=0
   --mul=1.1
   --weight-path  /home/ubuntu/projects/moonbeam/weights-${DB}-${NETWORK}.rs
   --chain ${NETWORK}
   --base-path /var/lib/${DB}-${NETWORK}-data

for each chain

Alphanet (~20M keys):

pub const RocksDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 65_167 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 114_721 * constants::WEIGHT_REF_TIME_PER_NANOS,
};
pub const ParityDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 16_290 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 65_374 * constants::WEIGHT_REF_TIME_PER_NANOS,
};

Moonriver (~30M keys state):

pub const RocksDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 66_865 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 114_947 * constants::WEIGHT_REF_TIME_PER_NANOS,
};
pub const ParityDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 14_483 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 64_545 * constants::WEIGHT_REF_TIME_PER_NANOS,
};

Moonbeam (~110M keys state):

pub const RocksDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 33_439 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 86_828 * constants::WEIGHT_REF_TIME_PER_NANOS,
};
pub const ParityDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 177_320 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 69_450 * constants::WEIGHT_REF_TIME_PER_NANOS,
};

Additionally to the paritydb numbers, we can also see that RocksDB average read is 50% on Moonbeam (110M keys) than Moonriver (30M keys), which might be related to the size of the data on Moonbeam being on average smaller than on Moonriver.

Details about the Benchmark output can be found there:
https://gist.github.com/crystalin/8e790a554b246e077c83ad04c04f330c

@crystalin
Copy link
Author

Additionally, it took something like 40h to generate the moonbeam storage benchmark

@ggwpez
Copy link
Member

ggwpez commented Jun 22, 2023

@cheme do you have an idea why the ParityDB time for read on the 110M keys DB is so much slower than Rocks when it is normally faster?

@crystalin
Copy link
Author

You can download a recent Moonbeam state: https://s3.console.aws.amazon.com/s3/object/alan-stuff?region=us-east-1&prefix=moonbeam-state-3631095.json.lz4 (10GB) if you want to check it

@cheme
Copy link
Collaborator

cheme commented Jun 22, 2023

That is definitely not expected. Could imagine worst access on big mmap memory, but not something in these proportions. Can also think of the data not being correctly build (there is a reindexing running in background every N values, but this is flushed on exit/start).

" based on substrate 0.9.40." : is it substrate version (looks old)?
Would be interesting to have the parity-db version listed in the Cargo.lock (a version from a few month ago did have an issue that could explan some bad behavior cc\ @arkpar ).

Edit: https://github.com/PureStake/moonbeam/blob/6ed87ceeb65db27a9b2ce7ff32b90d062540bd67/Cargo.lock#L8942 parity db version is 0.4.6 which do not have #206 but I don't expect it to be related.

@crystalin
Copy link
Author

I'm happy to cherry-pick some changes on top of it if you want to test few things. You can also probably reproduce by using the snapshot I provided

@cheme
Copy link
Collaborator

cheme commented Jun 22, 2023

I'm happy to cherry-pick some changes on top of it if you want to test few things. You can also probably reproduce by using the snapshot I provided

Would be using latest version of parity db (cargo update -p parity-db), but then it only really would make sense if synching the snapshot from scratch.

Something I am thinking right now, did the memory cusumption stay correct during the process (looking at the bench code I suspect it could put many items in memory)?

Edit: just realize the snapshot is in json format so no need to resynch.

@cheme
Copy link
Collaborator

cheme commented Jun 22, 2023

actually would be better patched parity-db master to include #211

@crystalin
Copy link
Author

Ok I'll try that if I find time (also be aware that the benchmark took 40 hours so I won't get result quickly)

@ggwpez
Copy link
Member

ggwpez commented Jun 22, 2023

I cant even import the snap on a 64GB server… do you use 128?

@arkpar
Copy link
Member

arkpar commented Jun 27, 2023

I've tried using warp sync on moonbeam. The sync went fine, although peak RAM usage was over 130GB. However the parachain is not finalizing blocks. Final block is still at zero. Is this a known issue? Unfinalized blocks are stored differently in the DB and this may affect performance.

@arkpar
Copy link
Member

arkpar commented Jun 27, 2023

As for possible performance issues, it could be affected by how the benchmark is implemented. RocksDB uses its own caching, while ParityDb relies on the OS cache. IIRC the benchmark warmup touches a few of the keys, and for RocksDB this causes a lot more data to be pre-cached.

@crystalin
Copy link
Author

@arkpar warp is not fully supported yet. We are still working on it.
I also suspect the benchmark implementation is the reason for those unexpected values, but it is hard/long to verify

@crystalin
Copy link
Author

@arkpar were you able to reproduce? Let me know if I can help otherwise

@arkpar
Copy link
Member

arkpar commented Jul 5, 2023

I could not access the snapshot linked above. It requires AWS registration and asks for my credit card number. I've started regular sync instead and it looks like it will take 3-4 days.

@arkpar
Copy link
Member

arkpar commented Jul 21, 2023

@crystalin Could you give it a test with parity-db 0.4.10?
cargo update -p parity-db should do it

@crystalin
Copy link
Author

I'm running it now.
This time I looked at the CPU load and IO load, and during the benchmark:

  • IOPS: ~1300 (max is 3000)
  • CPU: 2.5%
  • Memory: 10% (max: 32Gb)

@ggwpez
Copy link
Member

ggwpez commented Jul 21, 2023

If the DB benchmark time is a major problem then we could add a flag to only read 10% or 1% of the total keys (randomly selected). That way you would have some preliminary results for faster iterating. Do you think that would help?

@crystalin
Copy link
Author

That could make sense yes a % flag

@crystalin
Copy link
Author

crystalin commented Jul 21, 2023

Warmup round just finished, I might get result this WE
(Also memory jump to (95%)

@crystalin
Copy link
Author

crystalin commented Jul 24, 2023

I was able to run it (with substrate 0.9.43 and paritydb 0.4.10). It took 3 days to finish:

pub const ParityDbWeight: RuntimeDbWeight = RuntimeDbWeight {
  read: 182_722 * constants::WEIGHT_REF_TIME_PER_NANOS,
  write: 60_176 * constants::WEIGHT_REF_TIME_PER_NANOS,
};

(No improvement at all)

@cheme
Copy link
Collaborator

cheme commented Jul 31, 2023

I did check a bit more how to switch the chainspec loading to something that do not load all state in memory, but it is a bit more work than I did expect (break a lot the genesis build api since we need to do multiple commit while using a streaming json parser), so I postpone doing this myself for now.
Still I got a better understanding of the benchmarking process and it just use the standard chainspec loading, which means that the full state is send in parity db but the bench run on a db that just got a lot of key injected.
So the db may still be doing one or two levels of table reindexing when doing its benchmark, which would explain the performance issue.

This can be check by doing "ls" on the db directory and looking at the file for the state column:
if it is still reindexing the state there will multiple file named paritydb/full/index_01_xx with xx being the index sizes.

If this is the case I do not have of a simple way of ensuring reindexing (changing default index size to paritydb can be a a hacky solution).

The following change in substrate would allow flushing the logs but would not force all reindexing to finish.

--- a/bin/node/cli/src/command.rs
+++ b/bin/node/cli/src/command.rs
@@ -127,6 +127,8 @@ pub fn run() -> Result<()> {
                                        ),
                                        #[cfg(feature = "runtime-benchmarks")]
                                        BenchmarkCmd::Storage(cmd) => {
+                                               // load once first to ensure db is flushed.
+                                               new_partial(&config)?;
                                                // ensure that we keep the task manager alive
                                                let partial = new_partial(&config)?;
                                                let db = partial.backend.expose_db();

but it would need to keep db open for a while until everything is reindex too.

Maybe simply doing the bench in two steps:

  • step 1 load chainspec (eg just start the bin with no connection to ensure only chainspec loading will progress, or have a specific command to do so). Then wait until there is no more reindexing in paritydb (single index_01_xx file) until exiting.
  • step 2 run benchmark on existing db.

Or implement a primitive that ensure all reindexing is finished in paritydb and use it before calling new_partial a second time (but it will not be very elegant as the code at this level do not assume a specific db).

@crystalin
Copy link
Author

Thank you,

I think we did run the node with no connection (we often do for other profiling parts) before running the benchmark, but I can try again to see if that helps.

I think having substrate support the storage benchmark on a substate of the state would probably be more effective in that case.

@ggwpez
Copy link
Member

ggwpez commented Aug 21, 2023

Yes I hope to get paritytech/polkadot-sdk#146 to some newcomer to solve. Forwarded it to a PBA student now.

@crystalin
Copy link
Author

Outside of the storage benchmark, the performances of paritydb are also generally worse than rocksdb when the state is large (100M+ keys) and doing archive (I don't know how to measure to total number of keys in the db itself):

ParityDb:

2023-09-12T15:34:57.659Z utils:storage-query Queried 55384 keys @ 2769 keys/sec, 34 MB heap used
2023-09-12T15:35:02.659Z utils:storage-query Queried 82671 keys @ 3307 keys/sec, 46 MB heap used
2023-09-12T15:35:07.659Z utils:storage-query Queried 103743 keys @ 3458 keys/sec, 27 MB heap used
2023-09-12T15:35:12.659Z utils:storage-query Queried 130776 keys @ 3736 keys/sec, 21 MB heap used
2023-09-12T15:35:17.659Z utils:storage-query Queried 159459 keys @ 3986 keys/sec, 33 MB heap used
2023-09-12T15:35:22.659Z utils:storage-query Queried 184760 keys @ 4106 keys/sec, 18 MB heap used

RocksDb:

2023-09-12T15:36:44.978Z utils:storage-query Queried 520850 keys @ 17358 keys/sec, 30 MB heap used
2023-09-12T15:36:49.978Z utils:storage-query Queried 638850 keys @ 18249 keys/sec, 17 MB heap used
2023-09-12T15:36:54.979Z utils:storage-query Queried 784850 keys @ 19618 keys/sec, 15 MB heap used
2023-09-12T15:36:59.979Z utils:storage-query Queried 894850 keys @ 19882 keys/sec, 20 MB heap used
2023-09-12T15:37:04.981Z utils:storage-query Queried 975850 keys @ 19514 keys/sec, 24 MB heap used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants