Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move yarn and npm5 cache directories #439

Closed
jmorrell opened this issue Jun 29, 2017 · 15 comments
Closed

Move yarn and npm5 cache directories #439

jmorrell opened this issue Jun 29, 2017 · 15 comments

Comments

@jmorrell
Copy link
Contributor

Yarn and npm5 both have cache directories that get defaulted to the $HOME directory on linux. ~/.npm for npm5 and ~/.cache/yarn for yarn.

This creates an issue for Heroku since $HOME=/app which is the same directory where your application is run. So far this hasn't been a problem because apps are built in a temporary directory, but this will be changing to /app as well in the future. If this were to happen today we'd see a big increase in slug size because the cache directory would be included along with your application.

It's a bigger issue in Heroku CI in the case of running tests with Jest. Jest will automatically pick up tests in the cache directory, causing your CI tests to fail in hard-to-debug ways when they run fine locally. Context: jestjs/jest#3935

The solution in both cases is to take control of the cache directories and move them into /tmp by default. Though if users want to cache these cache directories we need to provide a mechanism for them to be able to.

@jmorrell
Copy link
Contributor Author

jmorrell commented Jun 29, 2017

For anyone running into the CI issue before I can get this change in, you can fix your CI tests by adding the following config to your package.json:

"jest": {
  testPathIgnorePatterns: [
    '<rootDir>[/\\\\](\\.cache|node_modules)[/\\\\]'
  ]
}

@bensalilijames
Copy link

I ran into a similar issue today when setting up Heroku CI with jest:

jest-haste-map: @providesModule naming collision:
  Duplicate module name: protobufjs
  Paths: /app/.cache/yarn/v1/npm-protobufjs-5.0.2-59748d7dcf03d2db22c13da9feb024e16ab80c91/package.json collides with /app/.cache/yarn/v1/npm-protobufjs-6.8.0-04e85493c4e1653878ec283f18bc78b1e7c5d5a2/package.json
This warning is caused by a @providesModule declaration with the same name across two different files.

This warning was repeated a few hundred times, once for each duplicate package!

Adding <rootDir>[/\\\\](\\.cache|node_modules)[/\\\\] to my testPathIgnorePatterns as mentioned above fixed the issue.

I suppose I was slightly confused that the Yarn cache directory wasn't a tmp dir as this issue suggests! Has this fix been released, or perhaps I need to explicitly specify the latest buildpack version (currently not specified in app.json)?

@jmorrell
Copy link
Contributor Author

@benhjames The cache directory should be in a temp directory by default now with the main buildpack release. You might be pinned to an old version maybe?

You might be hitting #494 though I'm still confused how .cache might be getting cached.

If you open a support ticket at help.heroku.com I can dig in and figure out what's going on :)

@bensalilijames
Copy link

Thanks @jmorrell, I've opened a support ticket. :)

AFAIK I've not specified the buildpack version anywhere, so I'm guessing it should be pulling the latest release.

@jmorrell
Copy link
Contributor Author

jmorrell commented Dec 4, 2017

Re-opening this after looking into @benhjames issues some more

@jmorrell jmorrell reopened this Dec 4, 2017
@larixer
Copy link

larixer commented Jan 20, 2018

@jmorrell As I understand packager cache dirs created in /tmp folder now, by this code:

[ ! "$YARN_CACHE_FOLDER" ] && YARN_CACHE_FOLDER=$(mktemp -d -t yarncache.XXXXX)
[ ! "$NPM_CONFIG_CACHE" ] && NPM_CONFIG_CACHE=$(mktemp -d -t npmcache.XXXXX)

If they were cached, this would have increased build speed. Why these dirs are not cached?

@jmorrell
Copy link
Contributor Author

@Vlasenko The short answer is that I haven't yet had the time to investigate and measure the trade-offs vs caching node_modules.

In the mean time, if you want to configure your app to cache these, you can set NPM_CONFIG_CACHE or YARN_CONFIG_CACHE yourself, and then add that directory to your list of cacheDirectories in package.json: https://devcenter.heroku.com/articles/nodejs-support#custom-caching

If you do this I believe you will eventually need to clean up the cache directory since over time it will hold copies of modules that you no longer use in your deployment (old minor versions, etc), which will bloat your slug. npm provides npm cache verify which should remove these older versions if I understand the docs. It's not clear from yarn's cache docs that it has an equivalent so you might need to periodically blow it away.

@larixer
Copy link

larixer commented Jan 21, 2018

@jmorrell Thank you for sharing your concerns. I have created yarn cache verify feature request for Yarn:
yarnpkg/yarn#5261

A couple more points:

  1. While caching node_modules might help for npm now, but for Yarn it only harms, yarn will download dependency if it cannot find it in the cache on yarn install, no matter present it in node_modules or not. So caching node_modules for Yarn only slows down the build
  2. While in theory I can set YARN_CACHE_FOLDER together with cacheDirectories, in practice I will run into problems that ruin this idea completely, unfortunately. If I point YARN_CACHE_FOLDER into some dir inside the slug - I will hit 500Mb slug limit very easily, because yarn cache adds up much to the slug size, also final slug deployment will be slowed down to a great extent, because of the big size. If I try to point YARN_CACHE_FOLDER into something like /tmp/yarn_cache, I will be unable to set cacheDirectories to point to it, because buildpack uses buildDir + cacheDirectories as the path to cache and something like ../tmp/yarn_cache or ../var/tmp/yarn_cache will not work as well, due to how these path are handled in buildpack later.

@larixer
Copy link

larixer commented Jan 22, 2018

@jmorrell Another point to consider about caching is that starting build in unique folder each time, i.e. /tmp/build_b6c9d5b5d2a5e1e3415c6015213e9467 is not good for Node-tools-friendly caching. For example babel-loader in Webpack calculates its hashes based on Webpack request object, which happen to contain absolute path to files. And because the build folder is different each time, babel-loader will not be able to reuse its cache across different builds of the same app on Heroku, even if you configure persisting and restoring babel-loader cache the right way. We can say that the babel-loader should not rely on absolute paths for its cache, but it is rather beyond of its control, as request object is generated by Webpack. Webpack cannot utilize relative paths there as well, because other stuff will break.

@jmorrell
Copy link
Contributor Author

jmorrell commented Feb 1, 2018

Closing since this issue is fixed, but I will be looking into the points raised by @Vlasenko

@jmorrell jmorrell closed this as completed Feb 1, 2018
@loganfsmyth
Copy link

@jmorrell Out of curiosity, did this conversation around babel-loader and Webpack caching ever go anywhere? I work on Babel and it has been raised several times by folks using Heroku, so I'm curious.

@jmorrell
Copy link
Contributor Author

jmorrell commented Sep 7, 2018

@loganfsmyth I'm afraid I got distracted by a different project and completely forgot about that.

At some point I'd like to improve our caching story around frontend builds. We cache node_modules by default, but I'd love to able to have a babel cache, a webpack cache, etc, that get's set up with no intervention by the user. I think the ecosystem is pretty far from making that possible today.

@loganfsmyth
Copy link

loganfsmyth commented Sep 7, 2018

@jmorrell Totally fair. The main point that I've heard is the general concern about the build dir being unique per-build, because then any caches built using absolute paths automatically invalidate between builds, so that's the main thing I was curious about. It seems like having a stable build directory path would go a long way toward allow stable caching.

@jmorrell
Copy link
Contributor Author

jmorrell commented Sep 7, 2018

@loganfsmyth I agree. Unfortunately that has been in our backlog forever, and my efforts at getting it prioritized haven't been successful.

@loganfsmyth
Copy link

Cool, no worries, I know how that goes. At least now I can decide if it's worth tackling on our side. Thanks for getting back to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants