Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

immutable cache design #12716

Closed
tdyas opened this issue Aug 31, 2021 · 13 comments
Closed

immutable cache design #12716

tdyas opened this issue Aug 31, 2021 · 13 comments
Assignees

Comments

@tdyas
Copy link
Contributor

tdyas commented Aug 31, 2021

This issue proposes adding an "immutable caches" feature to Process invocations to support cases where large archives (e.g, the Go distribution) would need to be unpacked in an input root. See the following design doc for the details: https://docs.google.com/document/d/1e4djdm2D9L6IeeJ8Q1M0_yJU-kF4tTHopBomwzdDRpE/edit#heading=h.n4lrkjmz3x2s

@benjyw

This comment has been minimized.

@tdyas

This comment has been minimized.

@tdyas
Copy link
Contributor Author

tdyas commented Sep 2, 2021

Also, #12719 (comment) provides data on just how bad the sandbox setup cost is (60-70% of execution time on sandbox setup).

@benjyw
Copy link
Contributor

benjyw commented Sep 3, 2021

Yeah, if we have an option that is both less work and higher performance, then why not do that? Granted it's a mild concession of hermeticity, but I think one we can easily live with for now.

Eric-Arellano added a commit that referenced this issue Oct 27, 2021
…13352)

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like #12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this issue Oct 27, 2021
…antsbuild#13352)

Turns out that pantsbuild#13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like pantsbuild#13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like pantsbuild#12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Oct 27, 2021
…(Cherry-pick of #13352) (#13378)

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like #12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
@stuhood
Copy link
Member

stuhood commented Nov 1, 2021

We will need to weigh implementing either this or FUSE support for #13435, depending on user's constraints around enabling FUSE.

EDIT: Fixed issue link.

@jsirois
Copy link
Contributor

jsirois commented Nov 1, 2021

Isn't FUSE a bit of a no-go with the current status of Apple trying hard to kick kexts to the curb and the maintenance situation of the fuse project for Mac?

@stuhood
Copy link
Member

stuhood commented Nov 1, 2021

Isn't FUSE a bit of a no-go with the current status of Apple trying hard to kick kexts to the curb and the maintenance situation of the fuse project for Mac?

My understanding is that although it is closed source (seemingly in order to prevent rebranded forks of the FUSE project), it is still actively maintained. We definitely would not be expecting to fork and rebrand, so I'm not sure whether that is a blocker or not.

Would need to see how comfortable companies with large monorepos are with using it as is, and what platforms they primarily are on.

@tdyas
Copy link
Contributor Author

tdyas commented Nov 1, 2021

Isn't FUSE a bit of a no-go with the current status of Apple trying hard to kick kexts to the curb and the maintenance situation of the fuse project for Mac?

kext's got replaced with "system extensions" which run in user space (and are probably leveraging Mach to some degree). macOS FUSE appears to use system extensions which is implied by the "system extension" showing up in some error message: osxfuse/osxfuse#828

@stuhood
Copy link
Member

stuhood commented Nov 19, 2021

I spent an hour looking at FUSE last night, and the experience was vaguely disconcerting on macOS: I updated to a modern fork of the fuse crate (#13679), and the brfs binary was fine, but running the tests resulted in hangs... and the need to restart my machine.

It's very likely that the tests are not tearing down the library correctly, because no amount of using the brfs binary caused any issues.

But I wasn't quite able to benchmark brfs (invoked with RUST_LOG=debug ./brfs --local-store-path=store fused), because although running ls on visible inodes worked, find executed unsupported/unimplemented lookups which triggered warnings in brfs... I didn't make it further than that though due to the testing issue.

@stuhood
Copy link
Member

stuhood commented Nov 19, 2021

Correction. I was able to get some very unscientific benchmarks by executing find against AFS, while running shasum against brfs (so very apples to oranges, but):

  • Capturing 1683024000 bytes in 3753 files with fs_util:
    • real 0m4.232s
  • Materializing to a $dest directory with fs_util:
    • real 0m1.448s
  • Parallel shasum against materialized files (time find $dest -type f | xargs -P8 shasum | wc -l):
    • Cold:
      real    0m9.124s
      user    0m4.117s
      sys 0m1.014s
      
    • Warm:
      real    0m4.816s
      user    0m4.289s
      sys 0m0.550s
      
  • Parallel shasum against brfs with the listing supplied externally (time find $dest -type f | sed -e 's|^dest/|fused/directory/../|' | xargs -P8 shasum | wc -l):
    • Cold:
      real    0m15.633s
      user    0m4.672s
      sys 0m2.973s
      
    • Warm:
      real    0m6.140s
      user    0m5.032s
      sys 0m0.844s
      

Useful also would be measuring the total amount of IO: materialization involves reading+writing+reading, while brfs only involves reading.

The immutable cache approach would involve the materialization step (once), followed by the creation of a much smaller number of symlinks (proportional to the total number of Directorys that were configured to be supplied as immutable)?

@stuhood
Copy link
Member

stuhood commented Nov 29, 2021

One other general thought here: to avoid implementing unionfs support in brfs (which might be an option in the medium term, but which would likely complicate things, and would definitely slow things down, since all IO access would go through FUSE, rather than only for inputs), it's likely that we will want to implement a symlinking strategy regardless of whether brfs or temporarily materialized digests in the native filesystem are the symlink destination.

Given that, and given the relative instability of FUSE, it seems like the first step forward should be a symlinked strategy with the destination initially being a lightly defended temp dir, as first proposed in this ticket. We can then later have the symlink destination be brfs (perhaps even conditionally per platform).

stuhood added a commit that referenced this issue Dec 6, 2021
As discussed in #13787, the `input_files` digest must currently include the `use_nailgun` digest, and that means that the input files for the server are materialized for every run.
* Add an `InputDigests` struct to encapsulate managing the collection of input digests for a `Process` (which will soon also include the immutable digests from #12716), and split the `use_nailgun` digest from the `input_files` digest.
* Move nailgun spawning (which uses `std::process` and `std::fs`, and so is synchronous) onto the `Executor`.
* Adjust the (still hardcoded) nailgun pool size to keep idle servers beyond the number that are currently active (e.g. to allow for idle `javac` processes while `scalac` processes are active).

Collectively, these changes make compilation 40% faster.

Fixes #13787.
@stuhood stuhood self-assigned this Dec 8, 2021
stuhood added a commit that referenced this issue Dec 10, 2021
As designed by @tdyas in #12716, this change adds `immutable_inputs` to `Process`, and adapts the JVM `@rules` to use them: for now, only for the tools themselves. A followup change will move all JVM classpath inputs to `immutable_inputs`.

Nailgun support is now enabled by specifying the subset of the `immutable_inputs` which should be used to start the server: all others will be used to start the client.

[ci skip-build-wheels]
@stuhood
Copy link
Member

stuhood commented Jan 4, 2022

Resolved by #13848. Thanks a lot @tdyas!

@stuhood
Copy link
Member

stuhood commented Sep 12, 2022

It looks like there might now be a more stable FUSE backend for macOS: https://www.fuse-t.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants