Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete marker file before fetching an external repository #14302

Closed

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 19, 2021

Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes #8993.

Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes bazelbuild#8993.
@fmeum fmeum requested a review from lberki as a code owner November 19, 2021 17:19
@google-cla google-cla bot added the cla: yes label Nov 19, 2021
@gregestren gregestren added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 23, 2021
@philwo
Copy link
Member

philwo commented Nov 24, 2021

LGTM, but let's ask Yun, too! :D

@meteorcloudy
Copy link
Member

Hmm, I don't think this is necessary. Because:

  • If fetchRepository is executed, then that means the existing marker file isn't consistent anymore.
  • When fetchRepository is interrupted, the marker file shouldn't be rewritten because we do that after fetchRepository is done.
  • Therefore, the next time you execute this SkyFunction, the marker data should still be inconsistent and fetchRepository should be re-executed.

Do you have a minimal reproducible case that we can look into?

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 24, 2021

Hmm, I don't think this is necessary. Because:

  • If fetchRepository is executed, then that means the existing marker file isn't consistent anymore.
  • When fetchRepository is interrupted, the marker file shouldn't be rewritten because we do that after fetchRepository is done.

This is correct.

  • Therefore, the next time you execute this SkyFunction, the marker data should still be inconsistent and fetchRepository should be re-executed.

As far as I can tell, this is not true when dealing with a managed directory: The marker file only tracks the existence of the managed directory, not its contents. If the SkyFunction is interrupted at a point where the managed directory has been created but its contents aren't consistent with what the ruleset expects, it will not rerun fetchRepository, but the build likely fails.

Every project using rules_nodejs's yarn_install rule should serve as a reproducer. For example, try the following:

  1. git clone https://github.com/bazelbuild/rules_nodejs
  2. cd examples/react_webpack
  3. bazel build //...
  4. rm -rf node_modules
  5. bazel build //..., interrupting the build soon after it shows Fetching @npm; Running yarn install on //:package.json. If your internet connection or local yarn cache is too fast, this might require adding new and large dependencies to package.json so that you have a window to interrupt the execution of the repository rule.
  6. bazel build //... should fail with an error such as
ERROR: error loading package '': Every .bzl file must have a corresponding package, but '@npm//webpack-cli:index.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.

At that point, the build failure is persistent and can only be resolved by deleting node_modules (or running bazel clean --expunge).

@meteorcloudy
Copy link
Member

I see, that makes sense now, basically unfinished fetchRepository could accidentally cause the previous marker data to be considered consistent due to the use of managed directories.

Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and explaination!

@bazel-io bazel-io closed this in 3069ac4 Nov 24, 2021
@brentleyjones
Copy link
Contributor

This is a nice little bug fix. Think we can cherry-pick it into 5.0?

fmeum added a commit to fmeum/bazel that referenced this pull request Nov 24, 2021
Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes bazelbuild#8993.

Closes bazelbuild#14302.

PiperOrigin-RevId: 412101756
fmeum added a commit to fmeum/bazel that referenced this pull request Nov 24, 2021
Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes bazelbuild#8993.

Closes bazelbuild#14302.

PiperOrigin-RevId: 412101756
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 24, 2021

This is a nice little bug fix. Think we can cherry-pick it into 5.0?

I submitted it for both 5.0.0 and 4.2.2.

@fmeum fmeum deleted the 8993-refetch-inconsistent-repos branch November 24, 2021 20:40
meteorcloudy pushed a commit that referenced this pull request Nov 25, 2021
Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes #8993.

Closes #14302.

PiperOrigin-RevId: 412101756
meteorcloudy pushed a commit that referenced this pull request Nov 25, 2021
Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes #8993.

Closes #14302.

PiperOrigin-RevId: 412101756
Bencodes pushed a commit to Bencodes/bazel that referenced this pull request Jan 10, 2022
Fetching a repository is a long-running operation that can easily be
interrupted. If it is and the marker file exists on disk, a new
evaluation of the RepositoryDelegatorFunction may treat this repository
as valid even though it is in an inconsistent state.

Clearing the marker file before initiating the fetch and only recreating
it after the fetch is complete prevents this scenario.

Fixes bazelbuild#8993.

Closes bazelbuild#14302.

PiperOrigin-RevId: 412101756
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate behavior of interrupted execution of long-to-execute repository rules
5 participants