Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Make reading json genesis file faster #11868

Merged

Conversation

crystalin
Copy link
Contributor

@crystalin crystalin commented Jul 20, 2022

Optimize the time to load the genesis from json file. Fixes #11867

I loads a genesis file of 5GB in 5 minutes instead of 1h30.

@ggwpez ggwpez added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Jul 20, 2022
@ggwpez
Copy link
Member

ggwpez commented Jul 20, 2022

Thanks!
There is also a json::from_reader, which looks less hacky than the mmap. Have you considered using that?

Out of interest: How are you creating the forked chain-specs of your network? Do you have own tooling?

@bkchr bkchr requested a review from koute July 20, 2022 11:44
Copy link
Contributor

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstakhov
Copy link
Contributor

There is also a json::from_reader, which looks less hacky than the mmap. Have you considered using that?

Maybe using of the BufReader can be a decent option: with a large buffer it won't be much different from mmap and it seems to be less hacky.

@koute
Copy link
Contributor

koute commented Jul 20, 2022

Thanks! There is also a json::from_reader, which looks less hacky than the mmap. Have you considered using that?

Isn't the json::from_reader what the code used originally here? Is there a different one somewhere?

Anyway, from what I can see the main reason why this was so slow was that the file was read completely unbuffered, which meant that every time read was called it always resulted in a syscall; simply wrapping it in BufReader would most likely made it significantly faster.

However, just doing an mmap like it was done here is in theory even better since the kernel can just read and buffer the file asynchronously for us. In practice it depends case-by-case - in general using mmap should never be slower (unless you're doing a lot of random accesses, but this doesn't apply here), but it's not always going to be significantly faster.

I'm fine with leaving it to just use mmap; we can consider using BufReader instead, but in that case I'd like to see some numbers as to what the performance difference is in this particular case.

@vstakhov
Copy link
Contributor

However, just doing an mmap like it was done here is in theory even better

The only concern that always bothers me when using mmap is if there is even a small/theoretical chance that a file might be changed (truncated) during reading. BufReader is always on the safe side in this case.

@koute
Copy link
Contributor

koute commented Jul 20, 2022

However, just doing an mmap like it was done here is in theory even better

The only concern that always bothers me when using mmap is if there is even a small/theoretical chance that a file might be changed (truncated) during reading. BufReader is always on the safe side in this case.

Yeah, but in this case we're doing this essentially only at startup, and if the file is changed underneath then it'll most likely either result in a read error (because the JSON will be garbage) or in a SIGBUS being sent and the process will be killed (if the file's truncated enough to cross a page boundary).

We're already using mmap in e.g. paritydb for indexes, and there any shenanigans could be considerably more catastrophic, so I wouldn't worry about it in this particular case.

@crystalin
Copy link
Contributor Author

crystalin commented Jul 20, 2022

Benchmark: (~5.5GB file, 5950X/64GB, SSD)

from_reader (default): ~42:30 min

└--╼ time ./target/release/moonbeam --tmp --log=info --chain moonriver-state.json
2022-07-20 10:12:35 Moonbeam Parachain Collator
2022-07-20 10:12:35 ✌️  version 0.25.0-dc9d72b668a
2022-07-20 10:12:35 ❤️  by PureStake, 2019-2022
2022-07-20 10:12:35 📋 Chain specification: Moonriver
2022-07-20 10:12:35 🏷  Node name: tightfisted-slave-2527
2022-07-20 10:12:35 👤 Role: FULL
2022-07-20 10:12:35 💾 Database: RocksDb at /tmp/substratevKzGJB/chains/moonriver/db/full
2022-07-20 10:12:35 ⛓  Native runtime: moonriver-1700 (moonriver-0.tx2.au3)
2022-07-20 10:31:28 Parachain id: Id(2023)
2022-07-20 10:31:28 Parachain Account: 5Ec4AhPZYgv9Q1KUajtv2RieJJmhPdn9cnvKxjpxuJHVoGFt
2022-07-20 10:31:28 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000001bc6099aa912157433f6b269dc7ff2159ae83cbe556a9504ebd20b33a907e09303170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2022-07-20 10:50:32 [🌗] 🔨 Initializing Genesis block/state (state: 0x1bc6…e093, header-hash: 0x1cc6…7dcb)
2022-07-20 10:55:00 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xb000…ef6b, header-hash: 0xb0a8…dafe)
2022-07-20 10:55:00 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.

mmap : ~6:30 min

2022-07-20 09:29:36 Moonbeam Parachain Collator    
2022-07-20 09:29:36 ✌️  version 0.25.0-654e47ea208    
2022-07-20 09:29:36 ❤️  by PureStake, 2019-2022    
2022-07-20 09:29:36 📋 Chain specification: Moonriver    
2022-07-20 09:29:36 🏷  Node name: sour-skin-8115    
2022-07-20 09:29:36 👤 Role: FULL    
2022-07-20 09:29:36 💾 Database: RocksDb at /tmp/substratenLO8Im/chains/moonriver/db/full    
2022-07-20 09:29:36 ⛓  Native runtime: moonriver-1700 (moonriver-0.tx2.au3)    
2022-07-20 09:30:26 Parachain id: Id(2023)    
2022-07-20 09:30:26 Parachain Account: 5Ec4AhPZYgv9Q1KUajtv2RieJJmhPdn9cnvKxjpxuJHVoGFt    
2022-07-20 09:30:26 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000001bc6099aa912157433f6b269dc7ff2159ae83cbe556a9504ebd20b33a907e09303170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400    
2022-07-20 09:32:07 [🌗] 🔨 Initializing Genesis block/state (state: 0x1bc6…e093, header-hash: 0x1cc6…7dcb)    
2022-07-20 09:36:03 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xb000…ef6b, header-hash: 0xb0a8…dafe)    
2022-07-20 09:36:03 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.

BufReader (8kB): ~8:00 min

2022-07-20 10:57:52 Moonbeam Parachain Collator
2022-07-20 10:57:52 ✌️  version 0.25.0-dc9d72b668a
2022-07-20 10:57:52 ❤️  by PureStake, 2019-2022
2022-07-20 10:57:52 📋 Chain specification: Moonriver
2022-07-20 10:57:52 🏷  Node name: poised-baseball-2523
2022-07-20 10:57:52 👤 Role: FULL
2022-07-20 10:57:52 💾 Database: RocksDb at /tmp/substratenHRYuD/chains/moonriver/db/full
2022-07-20 10:57:52 ⛓  Native runtime: moonriver-1700 (moonriver-0.tx2.au3)
2022-07-20 10:59:25 Parachain id: Id(2023)
2022-07-20 10:59:25 Parachain Account: 5Ec4AhPZYgv9Q1KUajtv2RieJJmhPdn9cnvKxjpxuJHVoGFt
2022-07-20 10:59:25 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000001bc6099aa912157433f6b269dc7ff2159ae83cbe556a9504ebd20b33a907e09303170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2022-07-20 11:01:54 [🌗] 🔨 Initializing Genesis block/state (state: 0x1bc6…e093, header-hash: 0x1cc6…7dcb)
2022-07-20 11:06:02 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xb000…ef6b, header-hash: 0xb0a8…dafe)
2022-07-20 11:06:02 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.

BufReader (2MB): ~7:50 min

2022-07-20 17:02:33 Moonbeam Parachain Collator
2022-07-20 17:02:33 ✌️  version 0.25.0-dc9d72b668a
2022-07-20 17:02:33 ❤️  by PureStake, 2019-2022
2022-07-20 17:02:33 📋 Chain specification: Moonriver
2022-07-20 17:02:33 🏷  Node name: animated-knot-6831
2022-07-20 17:02:33 👤 Role: FULL
2022-07-20 17:02:33 💾 Database: RocksDb at /tmp/substratedRIMs9/chains/moonriver/db/full
2022-07-20 17:02:33 ⛓  Native runtime: moonriver-1700 (moonriver-0.tx2.au3)
2022-07-20 17:04:04 Parachain id: Id(2023)
2022-07-20 17:04:04 Parachain Account: 5Ec4AhPZYgv9Q1KUajtv2RieJJmhPdn9cnvKxjpxuJHVoGFt
2022-07-20 17:04:04 Parachain genesis state: 0x0000000000000000000000000000000000000000000000000000000000000000001bc6099aa912157433f6b269dc7ff2159ae83cbe556a9504ebd20b33a907e09303170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2022-07-20 17:06:30 [🌗] 🔨 Initializing Genesis block/state (state: 0x1bc6…e093, header-hash: 0x1cc6…7dcb)
2022-07-20 17:10:19 [Relaychain] 🔨 Initializing Genesis block/state (state: 0xb000…ef6b, header-hash: 0xb0a8…dafe)
2022-07-20 17:10:19 [Relaychain] 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.

Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!
The CI still needs a format.

@vstakhov
Copy link
Contributor

BufReader: ~8:00 min

Just curious, what buffer size have you tested? I would suggest to use about 1-2Mb.

@bkchr
Copy link
Member

bkchr commented Jul 20, 2022

There is also a json::from_reader, which looks less hacky than the mmap. Have you considered using that?

Maybe using of the BufReader can be a decent option: with a large buffer it won't be much different from mmap and it seems to be less hacky.

See the discussion in the old pr: #10137

@crystalin
Copy link
Contributor Author

Just curious, what buffer size have you tested? I would suggest to use about 1-2Mb.

I've updated the results to include BufReader with 2Mb capacity. It doesn't change much

Crystalin added 2 commits July 20, 2022 17:16
@crystalin
Copy link
Contributor Author

Thanks for the test! The CI still needs a format.

should be good now

@bkchr bkchr merged commit 5b3d404 into paritytech:master Jul 20, 2022
DaviRain-Su pushed a commit to octopus-network/substrate that referenced this pull request Aug 23, 2022
* Make reading json genesis file faster

* Formatting

* fmt
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
* Make reading json genesis file faster

* Formatting

* fmt
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deserializing big JSON genesis state is very slow
5 participants