Speed up big chainspec json(~1.5 GB) load #10137

icodezjb · 2021-11-01T03:38:33Z

We export our chainspec to fork.json by export-state of substrate sub cmd,
The fork.json is a 1.5GB json file.
In ChainSpec::from_json_file,
First use serde_json::from_file, load fork.json, takes ~15 minutes.
While using serde_json::from_slice, only takes ~2 s,

See serde-rs/json#160

cla-bot-2021 · 2021-11-01T03:38:39Z

User @icodezjb, please sign the CLA here.

client/chain-spec/src/chain_spec.rs

koute · 2021-11-01T10:15:52Z

client/chain-spec/src/chain_spec.rs

-			.map_err(|e| format!("Error opening spec file `{}`: {}", path.display(), e))?;
+		// We read the entire file into memory first, as this is *a lot* faster than using
+		// `serde_json::from_reader`. See https://github.com/serde-rs/json/issues/160
+		let bytes = std::fs::read(&path).map_err(|e| format!("Error reading spec file: {}", e))?;


It'd most likely be better to mmap the file than to read it whole into memory (especially if it can be a few gigs in size). Could you try mmaping it through the memmap2 crate (we already use it transitively as a dependency through parity-db anyway) instead? (It should be roughly as fast, but at a significantly lower memory usage.)

I mean having multiple gigabytes in this file is rather rare, however I'm fine with doing these change if we are going to change this here anyway. @koute could you maybe push the required changes to this pr?

Yep, I know. Still, apparently some people do it, as evidenced by this PR. (:

Sure; done!

Ty, if you now approve your own changes @koute we should be ready with this pr :D

tomaka · 2021-11-01T12:01:26Z

Does wrapping the File around a std::io::BufReader not solve the problem, rather than using unsafe code?

bkchr · 2021-11-01T12:21:02Z

From this thread: serde-rs/json#160 it seems that BufReader is 10x slower than reading the entire file into memory.

With Mmap I would assume we should be between BufReader and reading everything into memory before (highly depends on how the operating system reads the file)

koute · 2021-11-01T12:53:52Z

The unsafe is indeed unfortunate, but that's just how mmap fundamentally works, and AFAIK there is no way around it if you want to use mmap. (Even if you'd use mandatory locking races are still possible.) But again, it should not be a big deal in practice as 1) something would have to be actively modifying the chainspec file while we're trying to read it for anything unsavory to happen, and 2) at most this should result only in a crash if it happens.

With Mmap I would assume we should be between BufReader and reading everything into memory before (highly depends on how the operating system reads the file)

It could even be faster in certain cases, since loading the file can then run in parallel with parsing it, but yeah, it highly depends on the situation.

koute · 2021-11-01T12:58:09Z

bot merge

* Speed up chainspec json load * Update client/chain-spec/src/chain_spec.rs * Update client/chain-spec/src/chain_spec.rs * Update client/chain-spec/src/chain_spec.rs * Load the chainspec through `mmap` Co-authored-by: icodezjb <icodezjb@users.noreply.github.com> Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com> Co-authored-by: Jan Bujak <jan@parity.io>

Speed up chainspec json load

3457a6b

bkchr approved these changes Nov 1, 2021

View reviewed changes

client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved

client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved

bkchr added 2 commits November 1, 2021 10:40

Update client/chain-spec/src/chain_spec.rs

4b7db48

Update client/chain-spec/src/chain_spec.rs

3b53ac7

bkchr added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 1, 2021

bkchr reviewed Nov 1, 2021

View reviewed changes

client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved

Update client/chain-spec/src/chain_spec.rs

a2c7abd

koute suggested changes Nov 1, 2021

View reviewed changes

Load the chainspec through mmap

91d2a48

bkchr approved these changes Nov 1, 2021

View reviewed changes

koute approved these changes Nov 1, 2021

View reviewed changes

paritytech-processbot bot merged commit 5fa8dc0 into paritytech:master Nov 1, 2021

bkchr mentioned this pull request Jul 20, 2022

Make reading json genesis file faster #11868

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up big chainspec json(~1.5 GB) load #10137

Speed up big chainspec json(~1.5 GB) load #10137

icodezjb commented Nov 1, 2021

cla-bot-2021 bot commented Nov 1, 2021

koute Nov 1, 2021

bkchr Nov 1, 2021

koute Nov 1, 2021

bkchr Nov 1, 2021

tomaka commented Nov 1, 2021

bkchr commented Nov 1, 2021

koute commented Nov 1, 2021

koute commented Nov 1, 2021

Speed up big chainspec json(~1.5 GB) load #10137

Speed up big chainspec json(~1.5 GB) load #10137

Conversation

icodezjb commented Nov 1, 2021

cla-bot-2021 bot commented Nov 1, 2021

koute Nov 1, 2021

Choose a reason for hiding this comment

bkchr Nov 1, 2021

Choose a reason for hiding this comment

koute Nov 1, 2021

Choose a reason for hiding this comment

bkchr Nov 1, 2021

Choose a reason for hiding this comment

tomaka commented Nov 1, 2021

bkchr commented Nov 1, 2021

koute commented Nov 1, 2021

koute commented Nov 1, 2021