Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Speed up big chainspec json(~1.5 GB) load #10137

Merged

Conversation

icodezjb
Copy link
Contributor

@icodezjb icodezjb commented Nov 1, 2021

We export our chainspec to fork.json by export-state of substrate sub cmd,
The fork.json is a 1.5GB json file.
In ChainSpec::from_json_file,
First use serde_json::from_file, load fork.json, takes ~15 minutes.
While using serde_json::from_slice, only takes ~2 s,

See serde-rs/json#160

@cla-bot-2021
Copy link

cla-bot-2021 bot commented Nov 1, 2021

User @icodezjb, please sign the CLA here.

client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved
client/chain-spec/src/chain_spec.rs Outdated Show resolved Hide resolved
@bkchr bkchr added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 1, 2021
.map_err(|e| format!("Error opening spec file `{}`: {}", path.display(), e))?;
// We read the entire file into memory first, as this is *a lot* faster than using
// `serde_json::from_reader`. See https://github.com/serde-rs/json/issues/160
let bytes = std::fs::read(&path).map_err(|e| format!("Error reading spec file: {}", e))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd most likely be better to mmap the file than to read it whole into memory (especially if it can be a few gigs in size). Could you try mmaping it through the memmap2 crate (we already use it transitively as a dependency through parity-db anyway) instead? (It should be roughly as fast, but at a significantly lower memory usage.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean having multiple gigabytes in this file is rather rare, however I'm fine with doing these change if we are going to change this here anyway. @koute could you maybe push the required changes to this pr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I know. Still, apparently some people do it, as evidenced by this PR. (:

Sure; done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty, if you now approve your own changes @koute we should be ready with this pr :D

@tomaka
Copy link
Contributor

tomaka commented Nov 1, 2021

Does wrapping the File around a std::io::BufReader not solve the problem, rather than using unsafe code?

@bkchr
Copy link
Member

bkchr commented Nov 1, 2021

From this thread: serde-rs/json#160 it seems that BufReader is 10x slower than reading the entire file into memory.

With Mmap I would assume we should be between BufReader and reading everything into memory before (highly depends on how the operating system reads the file)

@koute
Copy link
Contributor

koute commented Nov 1, 2021

The unsafe is indeed unfortunate, but that's just how mmap fundamentally works, and AFAIK there is no way around it if you want to use mmap. (Even if you'd use mandatory locking races are still possible.) But again, it should not be a big deal in practice as 1) something would have to be actively modifying the chainspec file while we're trying to read it for anything unsavory to happen, and 2) at most this should result only in a crash if it happens.

With Mmap I would assume we should be between BufReader and reading everything into memory before (highly depends on how the operating system reads the file)

It could even be faster in certain cases, since loading the file can then run in parallel with parsing it, but yeah, it highly depends on the situation.

@koute
Copy link
Contributor

koute commented Nov 1, 2021

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 5fa8dc0 into paritytech:master Nov 1, 2021
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Speed up chainspec json load

* Update client/chain-spec/src/chain_spec.rs

* Update client/chain-spec/src/chain_spec.rs

* Update client/chain-spec/src/chain_spec.rs

* Load the chainspec through `mmap`

Co-authored-by: icodezjb <icodezjb@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Jan Bujak <jan@parity.io>
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
* Speed up chainspec json load

* Update client/chain-spec/src/chain_spec.rs

* Update client/chain-spec/src/chain_spec.rs

* Update client/chain-spec/src/chain_spec.rs

* Load the chainspec through `mmap`

Co-authored-by: icodezjb <icodezjb@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Jan Bujak <jan@parity.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants