Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: extract tarball to temp directory on Windows #2846

Merged

Conversation

dsanders11
Copy link
Contributor

@dsanders11 dsanders11 commented May 11, 2023

Checklist
  • npm install && npm test passes
  • tests are included
  • documentation is changed or added
  • commit message follows commit guidelines
Description of change

Fixes #2482.
Fixes #2484.
Fixes #2584.
Fixes #2683.
Fixes #2751.
Fixes #2832.

NOTE: This PR contains a temporary initial commit which reverts a PR which causes CI workflow failures on main, so that the tests for this PR can run clean. It should be removed before merging this PR. #2837 is currently trying to fix that issue.

node-tar unlinks file paths before writing to them on Windows, which causes errors when there are parallel installs going on which are extracting the headers tarball to devDir. This currently leads to random corruption of the cache at devDir with missing files and hanging files with <filename>.DELETE.<random> names. See more in-depth discussion about the underlying issue here.

This PR fixes the situation by:

  • Fixing usage of node-tar to watch for errors during extraction
  • Extracting the headers tarball on Windows to a temp directory and then copies the files over
  • Using the exponential-backoff package in case of collision (EBUSY) when copying the files to devDir
  • Bumps installVersion to force clearing of existing cached versions which might be corrupt on Windows

This is set up in commits which were pushed one at a time to this PR so that the CI workflows for each commit can be seen, confirming that the added tests fail until the final fix is applied.

Unfortunately the new tests, despite being simple, are a bit flaky, due to network requests to https://nodejs.org/download/release/. The tests also take significant longer (greater than 2x) on Windows with Node.js 18.x and 20.x - it's not clear why, but since it repros consistently with those versions, it seems likely to be some upstream regression in Node.js, possibly something to do with the fs module.

throw new Error(k + ' local checksum ' + contentShasums[k] + ' not match remote ' + expectShasums[k])
// copy over the files from the temp tarball extract directory to devDir
if (tarExtractDir !== devDir) {
await copyDirectory(tarExtractDir, devDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we wouldn't want to rename / move here instead of copy? This could still race condition if it's not atomic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we wouldn't want to rename / move here instead of copy?

Since there are parallel installs happening, there's no way to atomically rename to devDir. That directory is created (if it doesn't already exist) at the start of the install process to confirm permissions, so a direct rename of tarExtractDir to devDir would fail. Any rename process would need to first rename devDir, which opens a race condition where devDir disappears momentarily during an ongoing build, causing a failure. That was my original approach for fixing this issue, but I switched to the current implementation to close that race condition.

This could still race condition if it's not atomic?

The copyFile approach stays similar to the existing implementation, with node-tar overwriting the existing files when there are parallel installs going on. The main difference is that on Windows node-tar always unlinks the destination file path before writing to it to avoid certain conditions (same path multiple times in an archive) which don't exist for this usage.

I've stress tested this change (on CI and locally with a real package) with 35 parallel installs and didn't trip over any race conditions in practice.

lib/install.js Show resolved Hide resolved
test/test-install.js Outdated Show resolved Hide resolved
Copy link
Contributor

@StefanStojanovic StefanStojanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since there are already 2 PRs addressing GitHub Action issues, we should make sure 6bb7cd6 is removed from here before landing it. Ideally, we'd first land a GitHub Actions fix, then rebase this PR, remove the mentioned commit and push changes.

lib/install.js Outdated Show resolved Hide resolved
@StefanStojanovic
Copy link
Contributor

@rvagg @cclauss I'm changing tests from tap to mocha in #2851. Since this PR adds some tests and it's already approved, what are your thoughts on landing it (without 6bb7cd6) so I can change those tests in my PR as well?

@cclauss
Copy link
Contributor

cclauss commented May 23, 2023

I have no objection to these changes but I am not a Windows user so I lack the skills to be a reviewer.

@StefanStojanovic
Copy link
Contributor

Hey, @dsanders11 can you prepare your branch for merging by removing 6bb7cd6 and potentially addressing #2846 (comment) and squashing some commits together (especially the ones addressing PR comments)?

Signed-off-by: David Sanders <dsanders11@ucsbalum.com>
Signed-off-by: David Sanders <dsanders11@ucsbalum.com>
Signed-off-by: David Sanders <dsanders11@ucsbalum.com>
@dsanders11 dsanders11 force-pushed the windows-parallel-installs-fix branch from 2bb914b to 9f9d7fb Compare May 24, 2023 23:38
@dsanders11
Copy link
Contributor Author

@StefanStojanovic, done! Removed 6bb7cd6 and squashed it back down to the original three commits. For CI reference, 2bb914b was the last commit before removing 6bb7cd6, and it went green.

@StefanStojanovic StefanStojanovic merged commit aaa117c into nodejs:main May 25, 2023
@cclauss cclauss added the tarball label Jun 1, 2023
@dsanders11 dsanders11 deleted the windows-parallel-installs-fix branch August 3, 2023 23:00
lukekarrys pushed a commit that referenced this pull request Dec 2, 2024
* fix: check for errors while extracting downloaded tarball

Signed-off-by: David Sanders <dsanders11@ucsbalum.com>

* test: parallel installs

Signed-off-by: David Sanders <dsanders11@ucsbalum.com>

* fix: extract tarball to temp directory on Windows

Signed-off-by: David Sanders <dsanders11@ucsbalum.com>

---------

Signed-off-by: David Sanders <dsanders11@ucsbalum.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants