Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo:purge_cache doesn't purge all files #70

Open
edmorley opened this issue Jul 21, 2017 · 7 comments
Open

repo:purge_cache doesn't purge all files #70

edmorley opened this issue Jul 21, 2017 · 7 comments

Comments

@edmorley
Copy link
Member

edmorley commented Jul 21, 2017

The repo:purge_cache command doesn't actually remove all files from the cache, but tries to preserve anything under vendor/heroku/:

METADATA="vendor/heroku"
if [ -d "$METADATA" ]; then
TMPDIR=\`mktemp -d\`
cp -rf $METADATA $TMPDIR
fi
cd ..
rm -rf unpack
mkdir unpack
cd unpack
TMPDATA="$TMPDIR/heroku"
VENDOR="vendor"
if [ -d "$TMPDATA" ]; then
mkdir $VENDOR
cp -rf $TMPDATA $VENDOR
rm -rf $TMPDIR
fi

This was added in:
16b17f3

I believe the intention was to preserve the implicit previous Ruby version, for apps that don't define a specific required version.

However this seems flawed/fragile for a few reasons:

  1. There isn't a command to unconditionally delete all files for users who really want to.
  2. It's likely confusing to the user if there's still hidden state preserved after clearing the cache (eg think of the use case of "why doesn't my new stage app or review apps work, my prod app still works after clearing the cache")
  3. It only helps Ruby apps, and even then only so long as the path for the Ruby metadata doesn't change from vendor/heroku/. (Each buildpack does it's own thing, for example Python uses the .heroku/python directory instead)
  4. Even if the language runtime version is preserved, it doesn't help the user if they haven't pinned eg package deps properly etc.

As such I would propose either:

  1. Scrapping the partial cache clearing version of the command entirely, and just clearing the whole cache.
  2. Offering two commands - one that clears the whole cache, and one that tries to preserve certain files. The latter command should be improved to also save the Python/other buildpacks' metadata too.
@tt
Copy link
Member

tt commented Jul 27, 2017

I strongly agree with this.

The partial purge is odd. It's there so customers who don't specify a Ruby version will get the same after purging their cache but we should really warn them that this is a consequence of purging the cache and not specifying a Ruby version instead of trying to magically save them.

Using the cache as a store also breaks apart when used with review apps and pipelines where each app has a different version of the cache.

@dmathieu
Copy link
Contributor

Definitely 💯

@edmorley
Copy link
Member Author

edmorley commented Aug 3, 2017

we should really warn them that this is a consequence of purging the cache and not specifying a Ruby version instead of trying to magically save them

I've filed heroku/heroku-buildpack-ruby#611 for this.

Not that it need block this issue as filed, but I guess all buildpacks should really be doing similar too (I've filed heroku/heroku-buildpack-python#440 for the Python buildpack, but others may need a warning).

@schneems
Copy link

schneems commented Aug 9, 2017

To get rid of this codon needs to provide us a durable store.

@dmathieu
Copy link
Contributor

What do you mean "a durable store"? You want a second, separate and unpurgeable cache?
Not being able to fully clear a cache really looks like a hack, and should be removed. Introducing a second cache system to mitigate that hack would really be a bad solution.

@schneems
Copy link

What do you mean "a durable store"

A cache is not persistent, it can be lossy. A cached value implies a value that can be re-generated, it's not the canonical representation of the data.

By "durable" I mean something that endures. It is is not lossy. A safe place to put canonical data.

In this case, the only representation of this data that we have is in the build cache. I didn't like it when we implemented it, I don't like it now, but it's what we've got.

Here's the history:

About 4-5 years ago Ruby came to build and said "we need a durable store". They talked about it and said "no" because it was too much engineering overhead. But it was decided to "use the cache anyway and treat it like a durable store", because it effectively is.

So we did. It was one of those "temporary hacks" that was supposed to get replaced when build "had time" in the future to build something if it "was needed".

Using the cache has worked well. To my knowledge, in the last 5 years there's been no cases of losing data unexpectedly, or having data that's not "cleared" in the cache cause any kind of deploy problems. Basically there's not ever been a reason to move it other than it seems bad to keep canonical data in a cache.

If we get rid of this functionality it will break a lot about how the buildpack works. We heavily rely on this tactic not just for storing ruby versions, but for other things as well. For example we need a way to determine if a deploy is the first deploy or not, we use the cache for this. We need to store prior versions of bundler used. We need to store prior stack name so we can migrate data on stack change.

We rely on this functionality. If it goes away, we need to replace it with something compatible.

@dmathieu
Copy link
Contributor

@jbyrum ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants