Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use darwin's clonefile syscall #893

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

plobsing
Copy link
Contributor

@plobsing plobsing commented Aug 7, 2024

This can save time and disk-space when copying files around on the same device. File clones (aka reflinks) share backing disk blocks; they differ from hardlinks in that inodes are not shared and the contents are copy-on-write.

The Go standard library (as of v1.22) will arrange to do a similar thing for file copies on Linux (see: https://cs.opensource.google/go/go/+/refs/tags/go1.22.6:src/os/zero_copy_linux.go;l=53). Unfortunately, Mac OS' more limited API is less amenable to that form of transparent wrapping.


Changes are visible to end-users: no

Test plan

Manually confirmed that cloned files differ in inode but share backing blocks with a custom script (gist).

Before

❯ bazel test lib/tests/copy_to_directory:all
[...]
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/b/b2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/b/b2
copy lib/tests/copy_to_directory/f/f2/f2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_a/test_a => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_a/test_a
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_a/test_a2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_a/test_a2
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/a/a2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/a/a2
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/a/a => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/a/a
copy lib/tests/copy_to_directory/e/e1 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/e/e1
copy lib/tests/copy_to_directory/c => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/c
copy external/external_test_repo/test_c => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_c
copy external/external_test_repo/test_d => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_d
copy lib/tests/copy_to_directory/d => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/d
copy lib/tests/copy_to_directory/f/f2/f1 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f1
copy lib/tests/copy_to_directory/e/e2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/e/e2
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_b/test_b2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_b/test_b2
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/b/b => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/b/b
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_b/test_b => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_b/test_b
[...]
❯ log2phys lib/tests/copy_to_directory/f/f2/f2 bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2
==== lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940040459
logaddr:                    0    flags: 00000000    len:                    6    physaddr:         329977204736

==== bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940386312
logaddr:                    0    flags: 00000000    len:                    6    physaddr:         418410364928

After

❯ bazel test lib/tests/copy_to_directory:all                                                                                                
[...]
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/a/a => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/a/a
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/b/b2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/b/b2
clonefile lib/tests/copy_to_directory/f/f2/f2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_a/test_a2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_a/test_a2
clonefile lib/tests/copy_to_directory/e/e1 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/e/e1
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_b/test_b => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_b/test_b
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_b/test_b2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_b/test_b2
clonefile external/external_test_repo/test_c => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_c
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/b/b => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/b/b
clonefile external/external_test_repo/test_d => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_d
hardlink bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/a/a2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/a/a2
hardlink bazel-out/darwin_arm64-fastbuild/bin/external/external_test_repo/test_a/test_a => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/test_a/test_a
clonefile lib/tests/copy_to_directory/e/e2 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/e/e2
clonefile lib/tests/copy_to_directory/c => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/c
clonefile lib/tests/copy_to_directory/f/f2/f1 => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f1
clonefile lib/tests/copy_to_directory/d => bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/d
[...]
❯ log2phys lib/tests/copy_to_directory/f/f2/f2 bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2
==== lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940040459
logaddr:                    0    flags: 00000000    len:                    6    physaddr:         329977204736

==== bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940389079
logaddr:                    0    flags: 00000000    len:                    6    physaddr:         329977204736

Note: source and cloned file differ in inode but contents share a location on disk. This relationship can be broken by making edits to the source file (exercising transparent copy-on-write).

❯ echo quux >> lib/tests/copy_to_directory/f/f2/f2
❯ log2phys lib/tests/copy_to_directory/f/f2/f2 bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2
==== lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940040459
logaddr:                    0    flags: 00000000    len:                   11    physaddr:         418404380672

==== bazel-out/darwin_arm64-fastbuild/bin/lib/tests/copy_to_directory/case_3/lib/tests/copy_to_directory/f/f2/f2 ====
st_dev:              16777232    st_ino:            940389079
logaddr:                    0    flags: 00000000    len:                    6    physaddr:         329977204736

deps.bzl Outdated Show resolved Hide resolved
This saves time and disk-space when copying files around on the same
device. File clones (aka reflinks) share backing disk blocks; they
differ from hardlinks in that inodes are not shared and the contents are
copy-on-write.

The Go standard library (as of v1.22) arranges to do a similar thing for file
copies on Linux (see: https://cs.opensource.google/go/go/+/refs/tags/go1.22.6:src/os/zero_copy_linux.go;l=53).
Unfortunately, Mac OS' more limited API is less amenable to that form of
transparent wrapping.
Copy link
Collaborator

@gregmagolan gregmagolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@gregmagolan gregmagolan merged commit 3c121a9 into bazel-contrib:main Aug 8, 2024
24 checks passed
@plobsing plobsing deleted the darwin-clonefile branch August 9, 2024 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants