Skip to content

Conversation

@cameel
Copy link
Collaborator

@cameel cameel commented Oct 7, 2021

Recently b_osx job has been failing in some of our PRs. Specifically external PRs that have been inactive for some time.

Failed to unarchive cache

Error untarring cache: Error extracting tarball /var/folders/1b/gl7yt7ds26vcyr1pkgld6l040000gn/T/cache3415993642 : usr/local/Homebrew/Library/Homebrew/shims/scm: Can't remove already-existing dir tar: Error exit delayed from previous errors. : exit status 1

"Error untarring cache" when running a macOS job suggests to include {{ arch }} in cache key. The reason for failures is probably that there has been an OS update after these PRs were last updated and the cache is simply no longer valid.

This PR adds {{ arch }} to cache keys on macOS and also on Windows just in case.

chriseth
chriseth previously approved these changes Oct 7, 2021
@chriseth
Copy link
Contributor

chriseth commented Oct 7, 2021

Looks good for now.

@cameel
Copy link
Collaborator Author

cameel commented Oct 7, 2021

Looks like {{ arch }} did not help. It must be something else.

@chriseth
Copy link
Contributor

chriseth commented Oct 7, 2021

Wait but the build worked, did you forget to add arch to the test run?

@cameel
Copy link
Collaborator Author

cameel commented Oct 7, 2021

Hmm... I'm guessing that build went fine because the key was not present in the cache. The t_ jobs restore it and that's what went wrong. Not sure why though. I'm trying to SSH into the machine but for some reason I get "permission denied".

@cameel
Copy link
Collaborator Author

cameel commented Oct 7, 2021

b_osx rerun is failing too which confirms that there's nothing special about t_ jobs here.

@cameel
Copy link
Collaborator Author

cameel commented Oct 11, 2021

Back to draft. I need to experiment with it a bit.

@cameel cameel marked this pull request as draft October 11, 2021 17:18
@cameel cameel force-pushed the fix-circleci-macos-cache-key branch from 9e3c1aa to cd22574 Compare October 11, 2021 18:13
@cameel
Copy link
Collaborator Author

cameel commented Oct 11, 2021

We'll have to wait for CircleCI for a proper fix (I think this is a problem in restore_cache and not in our config) but I have a workaround. Removing /usr/local/Homebrew/Library/Homebrew/shims/scm/ lets restore_cache finish without errors.

The dir contains only two files: git (some wrapper script over git) and svn (symlink to git). Removing them should be safe since they will be restored from cache anyway.

I'm leaving the {{ arch }} workaround in too. We probably do not need it but it does not break anything and this way we won't run into the problem described in the article.

@cameel cameel marked this pull request as ready for review October 11, 2021 18:18
@cameel cameel requested a review from chriseth October 11, 2021 18:19
@cameel
Copy link
Collaborator Author

cameel commented Oct 11, 2021

After digging deeper, I think that this is a bug in restore_cache that got triggered by a recent change in Homebrew. Homebrew/brew#12170 replaced the /usr/local/Homebrew/Library/Homebrew/shims/scm/ directory with a symlink to shared/ in the same parent dir. CircleCI has an older version of Homebrew already installed in the image but we install (and cache) a newer one. restore_cache runs tar to unpack the cache, which refuses to overwrite an existing dir with a symlink during unpacking. In this case the overwrite is legit (we want everything replaced with the cached content) but restore_cache does not take this scenario into account and just lets tar fail.

- steps_restore_cache_homebrew_workaround: &steps_restore_cache_homebrew_workaround
steps:
- run:
# FIXME: For some reason restore_cache fails saying that it cannot remove the scm/ dir.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the rm -r below? If so, what about rm -rf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the rm is the workaround :) Works without --force.

It's restore_cache that can't remove the files and it's provided by CircleCI so we have to wait for a fix from them.

@chriseth chriseth merged commit 5911fdf into develop Oct 12, 2021
@chriseth chriseth deleted the fix-circleci-macos-cache-key branch October 12, 2021 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants