Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk caching in transpile-only via new project "typescript-cached-transpile" #908

Closed
cspotcode opened this issue Nov 11, 2019 · 26 comments · May be fixed by #1364
Closed

Disk caching in transpile-only via new project "typescript-cached-transpile" #908

cspotcode opened this issue Nov 11, 2019 · 26 comments · May be fixed by #1364

Comments

@cspotcode
Copy link
Collaborator

cspotcode commented Nov 11, 2019

I'm sharing this here in case anyone else is having speed issues even with transpile-only.

I wrote a new npm module, typescript-cached-transpile, which monkey-patches typescript's transpileModule function to use a disk cache.

Here's the relevant code: https://github.com/cspotcode/personal-monorepo/blob/master/packages/typescript-cached-transpile/src/create.ts#L21-L88

It's very conservative. It doesn't look at filesystem atimes or mtimes. The cache key is computed from sha1 sums of both the configuration (typescript version, typescript-cached-transpile version, and all compiler flags) and the filename and source text. It doesn't work with transformers, and it doesn't cache when the compiler returns diagnostics, because I don't cache the diagnostics. (theoretically we could cache them as JSON pretty easily)

I tested on my team's codebase, where ts-node --transpile-only was adding a whopping 13 second(!) to startup. Even with the overhead of computing sha1 sums, caching reduced the overhead to about a second.

I understand the reasons for not introducing an incomplete caching solution to ts-node. However, in my case, I need things to be fast, which means I already need transpile-only mode, so it makes perfect sense to use a caching solution that requires it.

If you use this, and it works well or you find bugs, please let me know.

@blakeembrey
Copy link
Member

@cspotcode For --transpile-only mode, this makes sense 👍 I'd be happy to add this back as a feature to ts-node, it was only unreliable for type checking.

@cspotcode
Copy link
Collaborator Author

cspotcode commented Nov 11, 2019

@blakeembrey Oh, excellent! Are there any other features needed to get this merged?

  • Where should cache live by default? In cwd somewhere? As a peer to wherever .ts-node is located?

  • Should it cache diagnostics? I'm actually not sure how common it is for the compiler to return diagnostics in transpile-only mode.

  • Is it ok that caching will never work with transformers? I can't think of a way to make that work, since I can't serialize transformers to JSON so I can't hash them.

  • To avoid issues with concurrent processes stepping on each other in the cache, I append a sequence of 10 0xff bytes to each cache entry. If we don't find that when reading the cache, we act like cache doesn't exist. I figure that byte sequence will never exist in a utf8 string. (?) Does that behavior seem sensible to you?

@blakeembrey
Copy link
Member

@cspotcode So the code was removed in https://github.com/TypeStrong/ts-node/releases/tag/v8.0.0, didn't have any issues with race conditions at the time either. Specially, here's the commit that removed it in case you want to work backward: b61c745.

Where should cache live by default?

It used to live in tmp, probably the best place for it again.

Should it cache diagnostics?

No, it should just error on bad diagnostics and we don't need to worry about anything else.

Is it ok that caching will never work with transformers?

Sounds reasonable. We can just throw an exception when --cache is used with type checking or transformers. It's a bit unfortunate, maybe we can accept paths to transformers that would work as we could get the dependency versions or something. Happy to omit for now, but let me know if you have other ideas.

Does that behavior seem sensible to you?

It's reasonable, what I had before was checking the Base64 output looked valid. This should do the same job so feel free to revive that code.

@cspotcode
Copy link
Collaborator Author

cspotcode commented Dec 15, 2019

@blakeembrey

Re caching with transformers

I wonder if we can embrace how this works with ttypescript. It lets users specify transformers in their tsconfig file's "plugins" array.

Users who want transformers will set compiler: "ttypescript" and add the transformers to their tsconfig file. When we figure out the cache directory, we'll check for a plugins array. If found, and if we can locate the package.json for the transformers, we'll include their version numbers in the cache key.

https://github.com/cevek/ttypescript#tsconfigjson

Adding disk caching to the existing in-memory cache

I see there's already an in-memory cache. Can I extend the API of MemoryCache so it writes to disk as well? The original implementation does not do that, so I'm wondering if you hit problems with that approach.

@cspotcode
Copy link
Collaborator Author

Should caching obey the readFile and fileExists overrides? I'm still not sure how people use readFile and fileExists in the wild.

@cspotcode
Copy link
Collaborator Author

I'd like to see if we can support the use-case where ts-node is the only compile step a team needs. So you can use ts-node for local execution, testing, etc. And somehow ts-node also emits the result of compilation to disk, so you know that the ts-node-ed code used in testing is the same code used at runtime. You get sourcemap support that "just works", etc.

It could be that the cache is portable, so you can bundle it into an npm module or docker image, and it'll be loaded on another machine at another filesystem path.

Or it could be that ts-node emits compiled code. Technically this compiled code would have different paths and you'd need to enable source-map support manually so I kinda like the former better.

I know there's a performance hit to load ts-node and the compiler at startup, but I think teams might be ok with that.

The former solution -- portable cache -- is pretty simple. Make sure that all paths in the cache key are relative paths, relative from the cache directory to the tsconfig. That's what I implemented in typescript-cached-transpile. We could have a "--portable-cache" flag.

@despairblue
Copy link

I'd be interested in this feature too as we're currently running ts-node in production while migrating our JS codebase to TS gradually. Starting the server with transpile-only takes currently 3 seconds, but eventually when everything will be transpiled it would take 20 seconds.

The cache would greatly reduce that time. I'd also be interested if it's possible to have the cache inside cwd so we can put a warm cache into a docker image for deployment. I know I could compile the project before putting it into a docker image, but I'd like to keep production and dev environments as close as possible.

@blakeembrey @cspotcode Can I help you here in any way?

@cspotcode
Copy link
Collaborator Author

@despairblue Portable, pre-computed cache is one of the design goals of typescript-cached-transpile

Specifically: https://www.npmjs.com/package/typescript-cached-transpile#portable--pre-compiled-cache

Please let me know if it does or doesn't work for you.

@blakeembrey
Copy link
Member

We could have a "--portable-cache" flag.

Happy to support this, I think it's like how it used to work with --cache-dir.

I see there's already an in-memory cache. Can I extend the API of MemoryCache so it writes to disk as well? The original implementation does not do that, so I'm wondering if you hit problems with that approach.

Possibly, the in-memory cache was mostly scoped to support source maps and the cache to read from disk. I'm not sure it'll cover what you want today.

Should caching obey the readFile and fileExists overrides?

The way this previously worked would support this. It used hashes of the content as cache keys (instead of paths). We could do that, but if we'd prefer to skip reading anything from the file system I think you're correct - we should ignore this feature and just use the cache directly.

@cspotcode
Copy link
Collaborator Author

We could have a "--portable-cache" flag.

Happy to support this, I think it's like how it used to work with --cache-dir.

Ok, I'll have to familiarize myself with the code. Do you know if a .ts file's path/filename ever affects the emitted .js or sourcemap? I wasn't sure, so I was including the paths in the cache key. This meant a tiny bit extra work to make them relative to the location of the cache.

Should caching obey the readFile and fileExists overrides?

The way this previously worked would support this. It used hashes of the content as cache keys (instead of paths). We could do that, but if we'd prefer to skip reading anything from the file system I think you're correct - we should ignore this feature and just use the cache directly.

I think we'll have to read the source .ts from disk every time. Otherwise we don't know if the contents changed, so we don't know if the cache is valid.

@blakeembrey
Copy link
Member

Do you know if a .ts file's path/filename ever affects the emitted .js or sourcemap?

It would, good point. It would have the path in the source map for debugging.

Otherwise we don't know if the contents changed, so we don't know if the cache is valid.

Good point 👍

@cspotcode
Copy link
Collaborator Author

I've been toying around with a way to avoid loading the typescript compiler until it's needed, since this adds at least 100ms to startup time. The solution may be more complex than we want in ts-node, but I think it's do-able. The upshot is a fully populated cache completely avoids loading typescript.

Typescript only uses fileExists and readFile to discover and load tsconfig.json. Discovering source files uses readDirectory, but not if you set files: [], include: [], compilerOptions: {types: []}. It's relatively straightforward to keep a list of fileExists and readFile operations in the cache. We can later load the cache, re-perform the list of operations, and verify the results are all identical, all before we require("typescript").

When compiling individual .ts files, it's much simpler: the cache key is a hash of the file's path and contents.

The only issue is where to store the cache. We need to discover and load it before discovering the tsconfig file. Ideally we want to support a local cache (not in $TMP) so it can be pre-built.

We could use package.json as an anchor for the local cache. Search upwards through directories till we find a package.json, then save / load cache as a peer to that. The benefit is that package.json is a node convention and must always be plain JSON, so we can very quickly parse it. However, this means we can't put flags controlling the location of the cache in tsconfig.json, because we need to load the cache before reading tsconfig.json.


Roughly, a project would look like this:

/ < project root>
  package.json
  .ts-node-cache <-- a single binary file for efficiency.  Same as v8-compile-cache
  .gitignore <-- to ignore the .ts-node-cache
  tsconfig.json <-- also has ts-node flags, sets `transpileOnly: true`
  src/
    index.ts
    other.ts

When you run ts-node-script ./src/index.ts with a fully populated cache, it will:

a) search upwards and find package.json
b) load cache from .ts-node-cache
c) According to cache, perform fileExists("tsconfig.json") and readFile("tsconfig.json"); confirm they are identical to cache
d) pull compiler configuration from cache, in case we need to instantiate a compiler
e) Read contents of ./src/index.ts from disk; check if its in the cache
f) Execute cached emit for ./src/index.ts

The question is whether all of the above will be appreciably faster than the 100ms burned on require("typescript").

@thetutlage
Copy link
Contributor

I was looking at the code here and it seems like checksum of the file is used for generating the cache file path.

Which means, if I run ts-node with this caching (assuming it is part of ts-node) using nodemon and make 60 changes to a single file, the cache logic will output 60 compiled files to the disk, since each time the checksum will be different. Is it the correct observation?

@cspotcode
Copy link
Collaborator Author

@thetutlage I haven't look at the code in a while, but I think you're correct. In development it would probably take a long time before this becomes an issue, and then you'd need to delete the cache directory. I suppose you could do something like this to auto-clean the cache every 10 minutes:

watch -n 600 rm -r $TS_CACHED_TRANSPILE_CACHE

If we receive a pull request which re-adds caching to ts-node natively, then we'll be able to merge and release pretty quickly. Just depends on if/when someone does the work.

@thetutlage
Copy link
Contributor

Yeah, I have been thinking about it lately and there are many things to consider. Lemme just share them here for brainstorming mainly

Process dies while writing the file

The behavior of nodemon or many other file watchers is to restart the node process on every change. If there are too many quick changes, I suspect the cache will get corrupted. Consider the following example:

  • You change and save foo.ts
  • Nodemon trigger reload on the process started using ts-node and using cache
  • The ts-node on start loads 10-20 different modules to boot the application.
  • While it was processing those modules and writing them to the disk. Nodemon triggered another restart, which interputs the existing process, which can lead to partial written files to the disk (unless fs.writeFileSync) guarantees that it will never corrupt files in case of force exit

Clear cache every x mins

Yes, this can be one solution. But will lead to in-consistent load times and this may or may not be a big deal. Have to think more...

@cspotcode
Copy link
Collaborator Author

@sgtpep
Copy link

sgtpep commented Aug 24, 2020

@cspotcode Is that possible to make caching using typescript-cached-transpile working with node --loader=ts-node/esm script.ts?

@cspotcode
Copy link
Collaborator Author

@sgtpep yes, it should work. You can pass ts-node options in various ways, but probably the cleanest is to specify them all in your tsconfig file. Once that's set up, it should work.

@sgtpep
Copy link

sgtpep commented Aug 24, 2020

@cspotcode Unfortunately, doesn't seem to work when spawning this way: TS_NODE_COMPILER=typescript-cached-transpile node --experimental-specifier-resolution=node --loader=ts-node/esm foo.ts. Any insights on where to look in the source code of typescript-cached-transpile or ts-node to understand, why it's not working?

@cspotcode
Copy link
Collaborator Author

@sgtpep are you running in transpileOnly mode? It'll only work in transpileOnly mode. The function being called on the compiler is ts.transpileModule. That is the function that ts-node calls when in transpileOnly mode, and that's the function that's being wrapped by typescript-cached-transpile in order to add caching.

@majo44
Copy link

majo44 commented Feb 25, 2021

Hi, I was created similar to ts-node package, postcss-node , also started to think about the disk caching. What do you think about generic disk caching mechanism for the require.extensions ?

Eg

const org = require.extensions['ext'];
require.extensions['ext'] = (req) => {
    return inCache(req) ? cache(req) : setCacheAndReturn(req, org(req));
}

Then the simple usage will be like

node -r ts-node/register -r postcss-node/register -r require-cache/register src/index.ts

What do you think ?

@cspotcode
Copy link
Collaborator Author

Closing, since --swc is so fast that we can avoid any disk caching complexity.

@CMCDragonkai
Copy link

I believe typescript-cached-transpile no longer works on the latest ts-node. We had to rollback.

@cspotcode
Copy link
Collaborator Author

cspotcode commented Jun 17, 2022 via email

@CMCDragonkai
Copy link

@cspotcode, if I start to use swc (but only for ts-node), do I have to create a whole other .swcrc configuration? Or does ts-node still use the options in tsconfig.json even though swc is being used internally?

I don't really want to maintain both a tsconfig.json AND a .swcrc at the same time.

@cspotcode
Copy link
Collaborator Author

No new config required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants