Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: compile cache #52696

Open
2 of 9 tasks
joyeecheung opened this issue Apr 25, 2024 · 5 comments
Open
2 of 9 tasks

Tracking issue: compile cache #52696

joyeecheung opened this issue Apr 25, 2024 · 5 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.

Comments

@joyeecheung
Copy link
Member

joyeecheung commented Apr 25, 2024

Follow up to #47472 . Some items that can be investigated:

  • Exposing an API for user code to control the caching module: implement NODE_COMPILE_CACHE for automatic on-disk code caching #52535 (comment)
  • An API for flushing the cache: module: implement flushCompileCache()  #54971
  • Idle-time cache serialization like what Blink does, to avoid penalizing the first load
  • Other hashing algorithm (CRC32 may be good enough for our use case. In the initial implementation, it was chosen because it can be used on no-crypto builds and fast enough. For reference, ccache has used md4 and later BLAKE2b -> BLAKE3)
  • Other directory layout (splitting the cache for each file and read on the fly seems to be fast enough and I don't really see I/O showing up in the profile anyway) or using a db (if/when we implement Web Storage?)
  • Embedder API for configuring the storage
  • Inode caching like Inode cache for file hashes ccache/ccache#577 (note that CRC32 also barely shows up in the profile, it may not worth the complexity).
  • Avoid UTF8 transcoding by directly reading the source code as buffer from disk (this needs to dance with CJS loader monkey patching)
  • Move the flushing operations off-thread so that they can be done as soon as the code cache is ready and can be done concurrently https://github.com/joyeecheung/node/tree/parallel-compile-cache
@joyeecheung joyeecheung added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Apr 25, 2024
@benjamingr
Copy link
Member

Avoid UTF8 transcoding by directly reading the source code as buffer from disk (this needs to dance with CJS loader monkey patching)

FWIW I think loaders/require hooks are probably very common in dev and pretty rare in production (where compile cache has the most value) but that's just intuition.

@joyeecheung
Copy link
Member Author

and pretty rare in production

I would think it's the opposite for tracing agents - although they usually don't care about the source code (except the current loaders built on top of the off-thread hooks like import-in-the-middle that are forced to do a hacky analysis of the source code, which is why I am proposing a in-thread link() hook for them in nodejs/loaders#198 to not have to do this).

@joyeecheung
Copy link
Member Author

Also, speaking of loader hooks, I think we need to convert the CJS loader to pass buffers around regardless for future binary file loading support (for example if the custom loader wants to support loading wasm, or zip, or anything that's not stored as uh, bytes encoded in UTF8 on disk).

@joyeecheung
Copy link
Member Author

(Now I am spamming this tracking issue but) after some looks into existing monkey patching usages in popular packages (or I did a GitHub code search) I think the most prioritized item should be an API for packages to turn this on programmably. I don't have a great idea about how this API should look like though, so ideas welcomed. (Maybe process.enableCompileCache(dir) with some re-entrancy guards would be good enough, or maybe it's a terrible idea to make it per-thread because packages can step on each other's toes?).

@tannal
Copy link
Contributor

tannal commented Sep 6, 2024

Other hashing algorithm

xxhash by the creator of ztsd seems a good no-crypto hash algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants