Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor mode: move the external repo instead of copying #22668

Closed
wants to merge 4 commits into from

Conversation

meteorcloudy
Copy link
Member

@meteorcloudy meteorcloudy commented Jun 7, 2024

This drastically improves the speed of vendoring external repositories.

Related: #19563

@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Jun 7, 2024
FileSystemUtils.moveFile(markerUnderExternal, tMarker);
// 3. Move the external repo to vendor dir. It's fine if this step fails or is interrupted, because the marker
// file under external is gone anyway.
FileSystemUtils.moveTreesBelow(repoUnderExternal, repoUnderVendor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this behave if a repo symlinks files from another repo and one is vendored while the other is not? It looks like it may be necessary to follow relative symlinks but not absolute symlinks.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moveTreesBelow doesn't follow any symlinks. Judging from the code here, it's actually impossible to create relative symlink with the ctx.symlink API.

I tested with

ctx.symlink("/tmp/foo", "path_abs")
ctx.symlink("data", "path_rel")
ctx.symlink(ctx.path(Label("@bar//:data")), "path_bar")
ctx.symlink("../_main~ext~bar~/data", "path_bar_2")

and it resulted

path_abs@ -> /tmp/foo
path_bar@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar/data
path_bar_2@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar~/data
path_rel@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~foo/data

in both external and vendor dir.

This is fine if only foo is vendored, since eventually <output_base>/external/_main~ext~bar would exist and point to the right location. However, I noticed there is problem if output base is changed after vendoring.

Copy link
Collaborator

@fmeum fmeum Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

I have an experimental implementation in meteorcloudy@bf0ec69, which results

$ ll vendor_src/_bazel-external
lrwxr-xr-x  1 pcloudy  primarygroup  73 Jun 10 15:17 vendor_src/_bazel-external@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external
pcloudy@pcloudy-macbookpro2:~/workspace/my_tests/simple_cpp_test (master)
$ ll vendor_src/_main~ext~foo/
total 8
drwxr-xr-x  9 pcloudy  primarygroup  288 Jun 10 15:17 ./
drwxr-xr-x  7 pcloudy  primarygroup  224 Jun 10 15:17 ../
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 BUILD*
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 REPO.bazel*
-rwxr-xr-x  1 pcloudy  wheel          15 Jun 10 15:17 data*
lrwxr-xr-x  1 pcloudy  wheel           8 Jun 10 15:17 path_abs@ -> /tmp/foo
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_bar@ -> ../_bazel-external/_main~ext~bar/data
lrwxr-xr-x  1 pcloudy  primarygroup   38 Jun 10 15:17 path_bar_2@ -> ../_bazel-external/_main~ext~bar/data2
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_rel@ -> ../_bazel-external/_main~ext~foo/data

Please let me know what you think, and preferably I'll do it in another PR.
/cc @Wyverald @fmeum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

#22303, probably

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

this is quite clever! but what will version-control systems do with this special symlink? Usually people put bazel-* symlinks in the workspace root in .gitignore, so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Either way, some sort of symlink rewriting will need to happen, and we'll probably need to do something similar for the true repo cache.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Yes, I also think it should be gitignored since it's machine specific. And we can just always re-create the symlink since it's quite cheap to keep the code simple.

Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, just nits!

FileSystemUtils.moveFile(markerUnderExternal, tMarker);
// 3. Move the external repo to vendor dir. It's fine if this step fails or is interrupted, because the marker
// file under external is gone anyway.
FileSystemUtils.moveTreesBelow(repoUnderExternal, repoUnderVendor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

this is quite clever! but what will version-control systems do with this special symlink? Usually people put bazel-* symlinks in the workspace root in .gitignore, so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Either way, some sort of symlink rewriting will need to happen, and we'll probably need to do something similar for the true repo cache.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jun 11, 2024
meteorcloudy added a commit to meteorcloudy/bazel that referenced this pull request Jun 18, 2024
This drastically improves the speed of vendoring external repositories.

Related: bazelbuild#19563

Closes bazelbuild#22668.

PiperOrigin-RevId: 642338030
Change-Id: Idcba16c491711cf8fa6637d1e9c42cfc65e87599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants