Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hermetic sandbox with support for hardlinks #13279

Closed

Conversation

frazze-jobb
Copy link
Contributor

Adds linux-sandbox flag:
--experimental_use_hermetic_linux_sandbox - Configure linux-sandbox
to run in a chroot environment to prevent access to files not
mentioned in the bazel rules unless they can be found via
explicitly whitelisted directories using --sandbox_add_mount_pair
create hardlinks instead of symlinks, and fallback to copying.
In case of writes to input files, the build will be aborted.

@frazze-jobb
Copy link
Contributor Author

Related to Issue Make the sandboxed file system more strict: #7313

@google-cla
Copy link

google-cla bot commented Mar 30, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Mar 30, 2021
@frazze-jobb frazze-jobb marked this pull request as draft March 30, 2021 07:53
@google-cla
Copy link

google-cla bot commented Mar 30, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@frazze-jobb frazze-jobb marked this pull request as ready for review March 30, 2021 08:57
@aiuto
Copy link
Contributor

aiuto commented Mar 31, 2021

It is not clear we had agreement on answer to #7313, so this PR might be premature.

@frazze-jobb

This comment has been minimized.

@google-cla google-cla bot added cla: yes and removed cla: no labels Mar 31, 2021
@frazze-jobb
Copy link
Contributor Author

This is the results from a benchmark on how long one of our builds takes with respective sandbox.

Sandbox type | Average build-time
Process-wrapper | 268.93
Linux-sandbox | 287.26
Hermetic Linux-sandbox | 271.37

@justinhorvitz
Copy link
Contributor

I will let philwo handle this review, since I'm not that familiar with sandboxes.

@justinhorvitz justinhorvitz removed their request for review March 31, 2021 21:05
Copy link
Contributor

@larsrc-google larsrc-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done with the C++/Bash parts yet, but here's a couple of notes.

src/main/tools/linux-sandbox-pid1.cc Outdated Show resolved Hide resolved
src/main/tools/linux-sandbox-pid1.cc Outdated Show resolved Hide resolved
@larsrc-google
Copy link
Contributor

Could you fix the tests?

@frazze-jobb
Copy link
Contributor Author

frazze-jobb commented Apr 12, 2021

Thanks for your review comments. Regarding the test I thought I knew what to fix, turns out I was wrong.
Can I get the test environment somehow to try and reproduce the problem?

@larsrc-google
Copy link
Contributor

Click the Details link at the failing test, it'll show you the log. But that's the log for the entire shard. Click "Artifacts" to see the output from the failing tests. Once you know what tests are failing, run them locally (possibly with ibazel).

@frazze-jobb
Copy link
Contributor Author

I seem to have solved most of the tests, but how do I prevent "buildkite/bazel-bazel-github-presubmit/darwin-openjdk-8-shard-2" from running my hermetic "linux-sandbox" test. I assume linux-sandbox is not supported on that one, and therefore the tests will be failing

@larsrc-google
Copy link
Contributor

Looking at the logs, one test is failing because the error message is different, one is passing, and one is failing for actually not being hermetic. The last one is the interesting one. For that, I would suggest doing a platform check like this.

@frazze-jobb
Copy link
Contributor Author

The tests are passing now, I have dealt with @larsrc-google initial review comments. I would appreciate some more review comments @philwo

Copy link
Contributor

@larsrc-google larsrc-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few fixes that make it easier for me to test further revisions, and a couple of clarification questions. All tests pass, though.

src/main/tools/linux-sandbox-pid1.cc Outdated Show resolved Hide resolved

# For the test to work we need to bind mount a couple of folders to
# get access to bash, ls, python etc. Depending on linux distribution
# these folders may vary. Mount all folders in the root directory '/'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove empty trailing whitespace. If your editor has an option to remove trailing whitespace, I suggest turning that on.

src/main/tools/linux-sandbox-options.cc Outdated Show resolved Hide resolved
@frazze-jobb
Copy link
Contributor Author

The test failing

public void testUnreadableFileWithNoFastDigest() throws Exception

expects that setting a new modified time should not affect this assert.

p.setLastModifiedTime(10L);
assertThat(valueForPath(p)).isEqualTo(value);

But I just added mtime to the hash/equal/fingerprint and prettyprintt so this is expected. What should I do about this test?

@larsrc-google
Copy link
Contributor

@janakdr is probably the one who would know about the mtime handling.

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michajlo wrote that test, he may have a better idea. On its face, I don't see that the test is enforcing an important property.

/**
* Visible for serialization / deserialization. Do not use this method, but call {@link #create}
* instead.
*/
public FileContentsProxy(long ctime, long nodeId) {
this.ctime = ctime;
this.mtime = ctime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this class up in e4cc0b9: would you mind syncing?

@michajlo
Copy link
Contributor

That was a while ago... Skimming history I believe I was showing that mtime was significant to equality (the version I introduced was actually asserting not-equal). At some point it got flipped to assert equal when mtime was removed / deemed unnecessary?

Copy link
Member

@philwo philwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, added a few comments.

Comment on lines +180 to +181
// Make sure that the sandbox_root path has no trailing slash.
if (sandbox_root.back() == '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the path has two trailing slashes, e.g. /dev/shm/mysandbox//? (I'm fine with exiting with an error on invalid input like that, but it looks like this wouldn't catch it at all?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

if (handle < 0) {
DIE("open");
}
if (close(handle)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add < 0 to the condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment on lines 368 to 374
// Create a bind mount to /proc
if (CreateTarget("proc", true) < 0) {
DIE("CreateTarget");
}
if (mount("/proc", "proc", NULL, MS_REC | MS_BIND, NULL) < 0) {
DIE("mount");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason that we no longer have to mount a new proc on top of the old one, because the old one still refers to our parent PID namespace, but can just use a bind mount in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have a look, this code is taken from old linux-sandbox implementation, but maybe its not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I remember this being tricky, but in the end we found a way that worked. 🤔 A good way to check whether /proc is working correctly is that when you manually run linux-sandbox e.g. with a shell and you look inside /proc, you shouldn't see any processes from outside the sandbox.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works

src/main/tools/linux-sandbox-pid1.cc Outdated Show resolved Hide resolved
@@ -349,8 +413,13 @@ static void SetupNetworking() {
}

static void EnterSandbox() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT, should we rename this to EnterWorkingDirectory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, done


static void CreateEmptyFile() {
// This is used as the base for bind mounting.
CreateTarget("tmp", true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the return value of CreateTarget for errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@larsrc-google
Copy link
Contributor

@philwo I was actually the one who suggested using the methods from blaze_util instead of adding yet another reimplementation. Seems like that's what a util class is for. Is this util particularly heavy? Should it maybe be split?

@philwo
Copy link
Member

philwo commented Jun 8, 2021

@larsrc-google But the one method was already there and worked fine and the other two are single lines of code. I think this is taking "DRY" a bit too far.

The sandbox binary is time-tested, performance and correctness critical code, I can't even tell for sure whether this PR will not break something subtle that we will only discover months down the line. This is nothing against this particular code, but more a general cautionary approach I keep around this code - for example, we still found bugs in the signal handling code of the existing code I think just last year, even though it "worked" fine for years already - until it finally didn't under some environmental circumstances.

I would really like to keep it as simple as possible so you can read it in one go and reason about it and remove any abstractions or dependencies on "generic" implementations of functions.

@larsrc-google
Copy link
Contributor

@philwo My original comment (https://github.com/bazelbuild/bazel/pull/13279/files/0a9c8f7dc80255e925189c5ac94099fca5a172f3#r608589967) was not intended to apply to existing code so much as to the new implementations being added. Adding new implementations where time-tested ones exist makes it more likely that something will break.

Copy link
Contributor

@larsrc-google larsrc-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have a merge conflict pending.

Set<Path> writableDirs,
TreeDeleter treeDeleter,
@Nullable Path statisticsPath,
Boolean sandboxDebug) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in a plain boolean instead of a Boolean object, since you're using it that way anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hardLinkRecursive(source, target);
}

private void hardLinkRecursive(Path source, Path target) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add JavaDoc. In particular explain the failure modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

throw new IllegalArgumentException(source + " is a subdirectory of " + target);
}
target.createDirectory();
Collection<Path> entries = source.getDirectoryEntries();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're only using the basename, you can avoid a bunch of string manipulation by using readdir() instead of getDirectoryEntries(), similar to WorkerExecRoot.cleanExisting(). And if you use that, you could have the directory loop be main part of this function and use the file/symlink/dir info in Dirent instead of the possibly expensive is*() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seemed not so trivial, so I decided not to do anything about this.
I will take a look if you are really concerned by the current implementation of this.

src/main/tools/linux-sandbox-pid1.cc Outdated Show resolved Hide resolved
" -h if set, create a chroot in the sandbox directory (-s) and only "
" mount whats been specified with -M/-m for improved hermeticity\n"
" -s The sandbox root directory where the chroot will be created, -W"
" should be a folder inside this sandbox root directory\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a check that this holds, probably at the end of ParseOptions? You depend on it further down for a substr operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in ParseCommandLine

}

// Recursively creates the file or directory specified in "path" and its parent
// directories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add that this sets errno and returns -1 for errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Create the parent directory.
if (CreateTarget(dirname(strdupa(path)), true) < 0) {
DIE("CreateTarget");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the name of the directory we failed to create would be helpful here (but easier if CreateTarget dies in all errors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// certain filesystems (e.g. XFS).
static void LinkFile(const char *path) {
if (link("tmp/empty_file", path) < 0) {
DIE("link");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the file we were trying to link to in the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@google-cla
Copy link

google-cla bot commented Jun 14, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Aug 10, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no and removed cla: yes labels Aug 10, 2021
@google-cla
Copy link

google-cla bot commented Aug 10, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Adds linux-sandbox flag:
--experimental_use_hermetic_linux_sandbox - Configure linux-sandbox
 to run in a chroot environment to prevent access to files not
 mentioned in the bazel rules unless they can be found via
 explicitly whitelisted directories using --sandbox_add_mount_pair
 create hardlinks instead of symlinks, and fallback to copying.
 In case of writes to input files, the build will be aborted.
@frazze-jobb
Copy link
Contributor Author

Back from vacation now,
I had to reset the branch, sync it and squash all the changes, because the branch had turned real nasty with my and other peoples commits appearing multiple times, hopefully it doesn't ruin the review to much. I no longer have failing tests at least.

I have acted on all your comments @philwo @larsrc-google.
Is it good enough for a merge?

Copy link
Member

@philwo philwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This looks fine to me.

@larsrc-google WDYT?

@larsrc-google
Copy link
Contributor

I agree. I have imported it and all tests pass. There are a few style things to fix, I'll take care of that.

if (CreateTarget("dev", true) < 0) {
DIE("CreateTarget /dev");
}
const char *devs[] = {"/dev/null", "/dev/random", "/dev/urandom", "/dev/zero", NULL};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sandbox would be less hermetic if /dev/shm is bind mounted, since then actions could communicate with each other via /dev/shm.

Would it make sense to require an explicit --sandbox_add_mount_pair for use cases where /dev/shm is desired to be bind mounted?

jhance pushed a commit to dropbox/dbx_build_tools that referenced this pull request Mar 23, 2022
Summary:
This diff updates our Bazel version to 5.0.0.

I've re-worked our patches on top of the upstream 5.0.0 tag.

- `linux-sandbox` has changed a bunch (mostly in `linux-sandbox-pid1.cc`) so I had to rewrite a bunch of the patch. The main change is that upstream has added a bunch of logic from our own patch in order to support the new hermetic sandbox flag (see bazelbuild/bazel#13279). So I've cleaned things up, removed some of our code and instead called their new code. One change is that they hardlink an empty file outside of the sandbox rather than creating new files, which sounds ok. Note that we might be able to remove even more of our own patch in favor of their hermetic support but we can do that later.
- The merkle tree computation moved from `RemoteSpawnCache` to the `RemoteExecutionService` We should be able to rewrite the patch fairly easily but they've also added an (in-process) cache for those trees (see bazelbuild/bazel#13879) so it might be helping with the slowness that we were seeing before. I'm inclined to not apply the patch to start with and we can add it back if things get much slower.

The changes are on the `dbx-2022-02-25-5.0.0` branch in the bazel repo.

Here's the list of our own commits on top of upstream:
- [[ https://sourcegraph.pp.dropbox.com/bazel/-/commit/5a121a34b1a2a39530bf6cecc3892fc4509a1735?visible=2 | DBX: Helper scripts to build dbx bazel ]]
- [[ https://sourcegraph.pp.dropbox.com/bazel/-/commit/c3707dea392806b81f2892d46ede5bf54ef02527?visible=1 | DBX: Point remotejdk URLs to magic mirror ]]
- [[ https://sourcegraph.pp.dropbox.com/bazel/-/commit/dc5a85b9a1b710230f2c786fd2cede3adb29370d?visible=2 | DBX: Make sure that the java8 toolchain uses the right options ]]
- [[ https://sourcegraph.pp.dropbox.com/bazel/-/commit/497532f9878b3b68582c12766bf034a4de6cc44a?visible=6 | DBX: rootfs patch for the linux-sandbox ]]

Also see https://blog.bazel.build/2022/01/19/bazel-5.0.html

DTOOLS-1748

Test Plan:
Will run the main projects and CI and make sure that things still work.

Ran `bzl tool //dropbox/devtools/bazel_metrics/benchmarks --target //services/metaserver edit-refresh` on both this diff and master.

On 4.1.0 on master:

```
Running no-op reload 5 times...
Finished running no-op reload! The results were:
  min: 3.01s
  avg: 3.08s
  p50: 3.08s
  max: 3.21s
Running modify metaserver/static/js/modules/core/uri.ts 5 times...
Finished running modify metaserver/static/js/modules/core/uri.ts! The results were:
  min: 5.30s
  avg: 5.78s
  p50: 5.77s
  max: 6.59s
Running modify metaserver/static/css/legacy_browse.scss 5 times...
Finished running modify metaserver/static/css/legacy_browse.scss! The results were:
  min: 4.46s
  avg: 4.83s
  p50: 4.69s
  max: 5.26s
Running add file at metaserver/static/js/modules/core/devbox-benchmark-file-{}.ts 5 times...
Finished running add file at metaserver/static/js/modules/core/devbox-benchmark-file-{}.ts! The results were:
  min: 25.69s
  avg: 26.21s
  p50: 26.22s
  max: 26.89s
Running modify metaserver/static/error/maintenance.html 5 times...
Finished running modify metaserver/static/error/maintenance.html! The results were:
  min: 4.75s
  avg: 4.85s
  p50: 4.75s
  max: 5.01s
```

On 5.0.0

```
Running no-op reload 5 times...
Finished running no-op reload! The results were:
  min: 3.48s
  avg: 3.69s
  p50: 3.48s
  max: 3.90s
Running modify metaserver/static/js/modules/core/uri.ts 5 times...
Finished running modify metaserver/static/js/modules/core/uri.ts! The results were:
  min: 5.54s
  avg: 6.34s
  p50: 5.54s
  max: 8.59s
Running modify metaserver/static/css/legacy_browse.scss 5 times...
Finished running modify metaserver/static/css/legacy_browse.scss! The results were:
  min: 4.34s
  avg: 4.75s
  p50: 5.05s
  max: 5.46s
Running add file at metaserver/static/js/modules/core/devbox-benchmark-file-{}.ts 5 times...
Finished running add file at metaserver/static/js/modules/core/devbox-benchmark-file-{}.ts! The results were:
  min: 25.55s
  avg: 25.96s
  p50: 25.64s
  max: 26.71s
Running modify metaserver/static/error/maintenance.html 5 times...
Finished running modify metaserver/static/error/maintenance.html! The results were:
  min: 4.79s
  avg: 5.33s
  p50: 5.15s
  max: 5.84s
```

GitOrigin-RevId: 0f466c5a3bde9ed1157ea936bb70826b58f2fbec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants