-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gatsby): Chunk nodes when serializing redux to prevent OOM #21555
Conversation
1957eca
to
d5ac02b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach a lot. Great job Peter 🚢
5cbb5f4
to
fc19973
Compare
Added the mechanism for transactional state. This required a bit of refactoring, please have a look. In general it works as explained before: a tmp folder is created, the new cache is written to that folder, the old cache folder is renamed to a backup location, the tmp folder is moved into the cache folder position, and then an attempt is made to drop the old folder. Legacy files are properly read and deleted when a new cache is created. Upgrading should work transparently. Downgrading will not work, because an older version will not know to find the new redux cache folder, so it will just miss the cache. I think that's fine, both for us and our users. Updated some tests to take new things into account. Feels like I've been mocking half the fs-extra lib now :p |
fc19973
to
6a9a48d
Compare
(Shuffled the functions around to make TS lint happy. It does not seem to like function calls in sibling functions on the same level unless the called function is declared before it ... weird in the JS world) |
6a9a48d
to
1ccd87f
Compare
Ignore the next few pushes. I'm adding some logging to debug the windows run through circleci 🤡 |
a32a3bb
to
938eca5
Compare
Ok. All tests are passing. Ready to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good.
It would be nice to have some tests that would assert behaviour when some fs
operation fails, but leaving decision about feasibility and long term maintainance of them to you. Additionally I left one additional inline comment about some potential extra low effort sanity check that could be made
938eca5
to
4f14793
Compare
Ok I've added the |
The return type is incorrect but after quickly trying to fix that and propagate the type just led into a deep rabbit hole so I'm doing the same as the parent calling code, for now. |
4f14793
to
9948996
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Ah, that |
Ehh, didn't think about loki ... but also great that there were tests that caught it (uff). I don't know if it's worth to try to keep that code path then. We could make it not to show warning / return empty state for loki, but it's get convoluted |
Different and maybe more robust (still not super robust) test could be to just store number of nodes in |
The breakage is artificial; it's because the test removes the I can just add a loki if-branch for the test and have the redux branch not remove those props. I think that'd be fine? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have nitpicks and questions, no need to block this PR on merging.
|
||
function safelyRenameToBak(reduxCacheFolder: string): string { | ||
// Basically try to work around the potential of previous renamed caches | ||
// not being removed for whatever reason. _That_ should not be a blocker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not try to remove the bak file? Why would we want to keep other bak files around if they didn't get deleted? (permissions issues would be weird because we're creating them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do try to remove it later on (check // Now try to yolorimraf the old cache folder
section). If something wrong happens during writing out new cache, we could potentially try to restore previous one (instead of blowing entire cache away). Tho I don't think we try to restore bak
cache now in this case (and also don't know if it's safe to do - it should be).
It's not only permission issues here - fs
can be finicky (I found multiple fs
problems, but mostly on windows and they were intermittent - just restarting something was "solving" the problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that why we use fs-extra to fix windows retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, the only reason is to try and do the least amount of dangerous work until the new cache is persisted.
I'm not entirely convinced this makes much of a difference because if a fatal happens the cache is in a questionable state, but I think it helps?
The drawback is temporarily requiring double the space. That much is true.
I guess nothing was depending on this yet but the `nodes` property is definitely a `Map`, not an array. Also exporting the Node type because we'll need this a few times later.
We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages. The solution is to shard the `nodes` property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basically `Infinity`). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB. The serialization process is updated to no longer put the `nodes` in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible. Because the write is no longer atomized, the process will now write the redux cache to its own `redux` folder. When writing a new cache it will prepare the new cache in a tmp folder first, then move the existing `redux` folder to a temp location, move the new folder to `redux`, and then try to drop the old folder. This is about as transactional as you can get and should leave the cache in either a stale, empty, or updated state. But never in a partial state.
2b79805
to
cbc1b77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified with testing .org
Making 3 builds ( https://github.com/gatsbyjs/gatsby/compare/test-fix-v8-serialize-on-dot-org ):
First one - cold cache
Second one - make a change to react component (so reuse existing cache)
Third one - make a change to gatsby-node.js (invalidating cache)
We are using
v8.serialize
to write and read the redux state. This is faster thanJSON.parse
. Unfortunately, as reported in #17233, this can lead to a fatal when the contents of the redux state is too big to be serialized to a Buffer (hard max of 2GB). Alternatively, we also hit this problem on large site like a million small md pages.The solution is to shard the
nodes
property, which holds all the page data. In this change I've added a simple heuristic to determine the max chunk size (mind you, currently that's basicallyInfinity
). It will serialize about 11 individual nodes, measure their size, and based on the biggest node determine how many nodes would fit in 1.5GB.The serialization process is updated to no longer put the
nodes
in the main redux file, but rather sharded over a few specific files. When reading the state from cache, these files are all read and their contents are put together in a single Map again. If there were no nodes files this part does nothing so it's even backwards compatible.Also fixes the TS type for a page node in redux and exports it so we can reference it explicitly.