fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555

pvdz · 2020-02-18T14:55:33Z

We are using v8.serialize to write and read the redux state. This is faster than JSON.parse. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.

The solution is to shard the nodes property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically Infinity). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.

The serialization process is updated to no longer put the nodes in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.

Also fixes the TS type for a page node in redux and exports it so we can reference it explicitly.

packages/gatsby/src/redux/persist.ts

blainekasten

I like this approach a lot. Great job Peter 🚢

pvdz · 2020-02-19T13:46:32Z

Added the mechanism for transactional state. This required a bit of refactoring, please have a look.

In general it works as explained before: a tmp folder is created, the new cache is written to that folder, the old cache folder is renamed to a backup location, the tmp folder is moved into the cache folder position, and then an attempt is made to drop the old folder.

Legacy files are properly read and deleted when a new cache is created. Upgrading should work transparently. Downgrading will not work, because an older version will not know to find the new redux cache folder, so it will just miss the cache. I think that's fine, both for us and our users.

Updated some tests to take new things into account. Feels like I've been mocking half the fs-extra lib now :p

pvdz · 2020-02-19T14:09:26Z

(Shuffled the functions around to make TS lint happy. It does not seem to like function calls in sibling functions on the same level unless the called function is declared before it ... weird in the JS world)

pvdz · 2020-02-19T15:09:43Z

Ignore the next few pushes. I'm adding some logging to debug the windows run through circleci 🤡

packages/gatsby/src/redux/persist.ts

pvdz · 2020-02-20T11:32:14Z

Ok. All tests are passing. Ready to go

pieh

The code looks good.

It would be nice to have some tests that would assert behaviour when some fs operation fails, but leaving decision about feasibility and long term maintainance of them to you. Additionally I left one additional inline comment about some potential extra low effort sanity check that could be made

packages/gatsby/src/redux/persist.ts

pvdz · 2020-02-21T14:21:21Z

Ok I've added the else case. I've not added more tests because it feels like that's more about the mocks than actual fs operations at that point.

pvdz · 2020-02-21T14:22:29Z

The return type is incorrect but after quickly trying to fix that and propagate the type just led into a deep rabbit hole so I'm doing the same as the parent calling code, for now.

pieh

Looks good!

pvdz · 2020-02-21T15:03:02Z

Ah, that else now breaks the tests because they omit the nodes property due to loki ... ok ok.

pieh · 2020-02-21T15:06:44Z

Ah, that else now breaks the tests because they omit the nodes property due to loki ... ok ok.

Ehh, didn't think about loki ... but also great that there were tests that caught it (uff).

I don't know if it's worth to try to keep that code path then. We could make it not to show warning / return empty state for loki, but it's get convoluted

pieh · 2020-02-21T15:08:09Z

Different and maybe more robust (still not super robust) test could be to just store number of nodes in rest file and compare it to number of nodes that were read from chunked nodes files

pvdz · 2020-02-21T15:19:16Z

The breakage is artificial; it's because the test removes the nodes property before testing it.

I can just add a loki if-branch for the test and have the redux branch not remove those props. I think that'd be fine?

packages/gatsby/src/redux/persist.ts

wardpeet

I only have nitpicks and questions, no need to block this PR on merging.

packages/gatsby/src/redux/persist.ts

wardpeet · 2020-02-24T07:43:26Z

packages/gatsby/src/redux/persist.ts

+
+function safelyRenameToBak(reduxCacheFolder: string): string {
+  // Basically try to work around the potential of previous renamed caches
+  // not being removed for whatever reason. _That_ should not be a blocker.


why not try to remove the bak file? Why would we want to keep other bak files around if they didn't get deleted? (permissions issues would be weird because we're creating them)

We do try to remove it later on (check // Now try to yolorimraf the old cache folder section). If something wrong happens during writing out new cache, we could potentially try to restore previous one (instead of blowing entire cache away). Tho I don't think we try to restore bak cache now in this case (and also don't know if it's safe to do - it should be).

It's not only permission issues here - fs can be finicky (I found multiple fs problems, but mostly on windows and they were intermittent - just restarting something was "solving" the problem)

Isn't that why we use fs-extra to fix windows retries?

As mentioned above, the only reason is to try and do the least amount of dangerous work until the new cache is persisted.

I'm not entirely convinced this makes much of a difference because if a fatal happens the cache is in a questionable state, but I think it helps?

The drawback is temporarily requiring double the space. That much is true.

I guess nothing was depending on this yet but the `nodes` property is definitely a `Map`, not an array. Also exporting the Node type because we'll need this a few times later.

We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages. The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB. The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible. Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.

pieh

Verified with testing .org

Making 3 builds ( https://github.com/gatsbyjs/gatsby/compare/test-fix-v8-serialize-on-dot-org ):
First one - cold cache
Second one - make a change to react component (so reuse existing cache)
Third one - make a change to gatsby-node.js (invalidating cache)

pvdz added the scaling label Feb 18, 2020

pvdz requested a review from a team as a code owner February 18, 2020 14:55

pvdz mentioned this pull request Feb 18, 2020

V8 serialize error on build with huge number of pages(100k+) #17233

Closed

pvdz force-pushed the fix-v8-serialize branch from 1957eca to d5ac02b Compare February 18, 2020 15:38

blainekasten reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

blainekasten reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

blainekasten reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

blainekasten reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

pieh reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

blainekasten reviewed Feb 18, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

blainekasten previously approved these changes Feb 18, 2020

View reviewed changes

pvdz dismissed blainekasten’s stale review via 7b9670f February 19, 2020 11:59

pvdz force-pushed the fix-v8-serialize branch 3 times, most recently from 5cbb5f4 to fc19973 Compare February 19, 2020 13:42

pvdz force-pushed the fix-v8-serialize branch from fc19973 to 6a9a48d Compare February 19, 2020 14:08

pvdz force-pushed the fix-v8-serialize branch from 6a9a48d to 1ccd87f Compare February 19, 2020 14:46

pieh reviewed Feb 20, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

pieh reviewed Feb 20, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

pieh reviewed Feb 20, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Outdated Show resolved Hide resolved

pvdz force-pushed the fix-v8-serialize branch 2 times, most recently from a32a3bb to 938eca5 Compare February 20, 2020 10:49

pieh previously approved these changes Feb 21, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

pvdz dismissed pieh’s stale review via 4f14793 February 21, 2020 14:20

pvdz force-pushed the fix-v8-serialize branch from 938eca5 to 4f14793 Compare February 21, 2020 14:20

pvdz force-pushed the fix-v8-serialize branch from 4f14793 to 9948996 Compare February 21, 2020 14:36

pieh previously approved these changes Feb 21, 2020

View reviewed changes

pvdz dismissed pieh’s stale review via 2b79805 February 21, 2020 16:44

pvdz commented Feb 22, 2020

View reviewed changes

packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

wardpeet reviewed Feb 24, 2020

View reviewed changes

pvdz added 5 commits February 24, 2020 14:23

fix(gatsby): fix redux Node type

f625527

I guess nothing was depending on this yet but the `nodes` property is definitely a `Map`, not an array. Also exporting the Node type because we'll need this a few times later.

Add warning and ditch cache if there are no nodes

0cec1c8

Fix mock for glob, fix infinite loop

12874cf

Rename things

cbc1b77

pvdz force-pushed the fix-v8-serialize branch from 2b79805 to cbc1b77 Compare February 24, 2020 13:25

pieh approved these changes Feb 25, 2020

View reviewed changes

pvdz merged commit c944aae into master Feb 25, 2020

delete-merged-branch bot deleted the fix-v8-serialize branch February 25, 2020 16:01

TAGraves mentioned this pull request Mar 29, 2020

GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES is broken for static queries #22641

Closed

me4502 mentioned this pull request Apr 30, 2020

Node buffer allocation crash #23627

Closed

pieh mentioned this pull request Nov 27, 2020

fix(gatsby): add pages to saved redux state #28316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555

fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555

pvdz commented Feb 18, 2020

blainekasten left a comment

pvdz commented Feb 19, 2020

pvdz commented Feb 19, 2020

pvdz commented Feb 19, 2020

pvdz commented Feb 20, 2020

pieh left a comment

pvdz commented Feb 21, 2020

pvdz commented Feb 21, 2020

pieh left a comment

pvdz commented Feb 21, 2020

pieh commented Feb 21, 2020

pieh commented Feb 21, 2020

pvdz commented Feb 21, 2020

wardpeet left a comment

wardpeet Feb 24, 2020

pieh Feb 24, 2020

wardpeet Feb 24, 2020

pvdz Feb 24, 2020

pieh left a comment

fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555

fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555

Conversation

pvdz commented Feb 18, 2020

blainekasten left a comment

Choose a reason for hiding this comment

pvdz commented Feb 19, 2020

pvdz commented Feb 19, 2020

pvdz commented Feb 19, 2020

pvdz commented Feb 20, 2020

pieh left a comment

Choose a reason for hiding this comment

pvdz commented Feb 21, 2020

pvdz commented Feb 21, 2020

pieh left a comment

Choose a reason for hiding this comment

pvdz commented Feb 21, 2020

pieh commented Feb 21, 2020

pieh commented Feb 21, 2020

pvdz commented Feb 21, 2020

wardpeet left a comment

Choose a reason for hiding this comment

wardpeet Feb 24, 2020

Choose a reason for hiding this comment

pieh Feb 24, 2020

Choose a reason for hiding this comment

wardpeet Feb 24, 2020

Choose a reason for hiding this comment

pvdz Feb 24, 2020

Choose a reason for hiding this comment

pieh left a comment

Choose a reason for hiding this comment