Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel remote-caching does not properly treat volatile-status as if it never changes #10075

Closed
Panfilwk opened this issue Oct 21, 2019 · 29 comments
Labels
more data needed team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged

Comments

@Panfilwk
Copy link
Contributor

Description of the problem / feature request:

When a target depends on the file volatile-status.txt as an input, that target will not be rebuilt if volatile-status.txt changes and the target is cached locally. However, the remote cache does not behave in the same way, and a change in the hash of volatile-status.txt will cause the target not to read from the remote cache.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Clone the repo test_volatile_status and set up a bazel remote cache. Then run the following commands at the root of the repo:

bazel build --remote_cache=<your_remote_cache> //:artifact
bazel build --remote_cache=<your_remote_cache> //:artifact 
# the above will read from the local cache, as expected

bazel clean
bazel build --remote_cache=<your_remote_cache> //:artifact
# this will not read from the remote cache

What operating system are you running Bazel on?

Ubuntu 16.04.6 LTS

What's the output of bazel info release?

release 0.29.0

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

https://github.com/Panfilwk/test_volatile_status.git
da002118b6c79bd7e33b017221c772fbfba4f009
da002118b6c79bd7e33b017221c772fbfba4f009

Have you found anything relevant by searching the web?

This issue touches on this concept for local caching, but nothing about how the behavior of this file works with remote caching.

Any other information, logs, or outputs that you want to share?

exec1.log.txt and exec2.log.txt are the resulting parsed experimental execution logs from the first and third build commands following the debug instructions here

@irengrig irengrig added team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels Oct 23, 2019
@buchgr
Copy link
Contributor

buchgr commented Oct 28, 2019

The behavior you are seeing is the expected behavior. Bazel's local cache will not rerun an action if only volatile-status.txt changed but will rerun it otherwise. The remote cache does not know about volatile-status.txt and it shouldn't. volatile-status.txt is mostly used to for timestamping (=this binary was built at XX.XX.XXXX XX:XX).

What exactly is the use case you are trying to make work?

@Panfilwk
Copy link
Contributor Author

For our CI builds that use Bazel, we have an issue that docker containers spun up for tests sometimes linger after the CI job is finished. We want to tag these containers with the CI build information (using volatile-status.txt) to determine which jobs are causing containers to linger and hopefully discover the root cause, or to allow each job to clean up it's own containers without interfering with other jobs.

We don't want the tests using these containers to rerun if only the build information has changed and we want to cache across multiple workers that could potentially run a job, hence the use of remote-caching.

@buchgr
Copy link
Contributor

buchgr commented Oct 29, 2019

For our CI builds that use Bazel, we have an issue that docker containers spun up for tests sometimes linger after the CI job is finished.

That sounds like your machines aren't recycled after each run? Would it be an option to keep the Bazel server running between builds? This would give you the volatile-status.txt behavior you are looking for.

@buchgr
Copy link
Contributor

buchgr commented Oct 29, 2019

To clarify: I haven't thought deeply about this problem nor do I have any experience with such a use case. So I am neither for nor against having this feature. However, implementing this in Bazel would require protocol support and so would need discussions with the API community.

@ulfjack @lberki @ericfelly might have more insights on how we handle this internally.

@lberki
Copy link
Contributor

lberki commented Oct 29, 2019

I vote for "Will not fix; too much effort" here.

What you are saying is certainly true, but bazel clean; bazel build --remote_cache=<stuff> is a rare use case and it's arguably not a "cached" build anyway (depending on your definition of "cached").

This, coupled with the fact that fixing this would require changes to the cache protocol makes the cost outweigh the benefits IMHO.

One thing you could do is to read the identifier (non-hermetically) from a local file as opposed to through volatile-status.txt. That way, you wouldn't get cache hit but still be able to tag the test invocation with the ID.

@ulfjack
Copy link
Contributor

ulfjack commented Oct 29, 2019

We're very reluctant to change the behavior of volatile-status wrt. remote caching or execution, because doing so can potentially cause incorrect cross-user caching. Better to have remote caching / execution be strict and loose a bit of performance there.

Now, if you need to tag actions in order to figure out where they're coming from, I think your best bet is to use a side channel, instead of trying to do it with action inputs / env variables. That is, whatever spins up those docker containers should add metadata to the container in a way that the action can't inspect, so it doesn't affect caching, but that you can inspect if you find a hanging container after the fact. I don't have much experience with containers, but I'm sure there's a way to do that. If you're using Bazel to spin up the container, we could add some code to Bazel to do that by default.

@Panfilwk
Copy link
Contributor Author

That sounds like your machines aren't recycled after each run? Would it be an option to keep the Bazel server running between builds? This would give you the volatile-status.txt behavior you are looking for.

Machines are recycled, but there's no guarantee about which of several worker machines a CI job will run on, so we want the cache to be shared between all of them.

What you are saying is certainly true, but bazel clean; bazel build --remote_cache= is a rare use case and it's arguably not a "cached" build anyway (depending on your definition of "cached").

This is how official bazel documentation recommends debugging remote-cache hits

If this would require large changes to the structure of remote caching though, I think reading from a local file or other side channel as @ulfjack and @lberki suggested would work for our use case. I'll close this, but would appreciate any suggestions on how best to use a side channel in this way; current ideas are reading from an expected absolute path on CI workers, or maybe using the --action_env flag?

@buchgr
Copy link
Contributor

buchgr commented Oct 30, 2019

@Panfilwk --action_env is taken into account for remote cache key computation. An absolute path from where your action happens to read from should work.

@lberki
Copy link
Contributor

lberki commented Nov 4, 2019

+1 for what @ulfjack said; suboptimal performance is not great, but cache poisoning is even worse.

As for the side channel to use, as long as you can tag a container with some sort of unique ID you can use to find the container again, the easiest thing that comes to mind is exactly what you suggested: reading the ID from a file at some fixed path, then printing it on the stdout of the test.

@mariusgrigoriu
Copy link

Not sure I understand something. Wouldn't the stamped output files be stored in the remote cache, and couldn't bazel just ignore volatile-status.txt and just download the previously built output file?

@mariusgrigoriu
Copy link

Or at the very least update the docs to state that bazel only pretends that the volatile file doesn't change in the context of a local incremental build.

@joeljeske
Copy link
Contributor

joeljeske commented Mar 7, 2020

Yea I agree, it seems like unless the volatile status can actually be omitted from the actions inputs during action digest for the remote cache, then volatile keys is very misleading; it appears to work during local development of stamping rules, but then fails in a primary way people expect to leverage it.

I also ended up side loading a status file for my stamping needs, but it seems like extra work that is prone to error that really should be handled by bazel.

@alanfalloon
Copy link
Contributor

alanfalloon commented Mar 19, 2020

@Panfilwk Can we re-open this please? Once you add remote-execution, there is literally no way to "sneak" the version information to the remote builders in a way that avoids the action hash. This was the one and only way to have a remotely built target that embeds version info that doesn't cause it to rebuild every time.

@mariusgrigoriu
Copy link

I submitted some PR / issues to various rules (k8s, docker, etc.) that are affected by this to disable volatile status stamping but feels wrong to me. This really should be addressed here. I understand the concerns about correctness higher in the thread, but Bazel claims both Fast and Correct. This issue seriously impairs "Fast."

@alanfalloon
Copy link
Contributor

@ulfjack @lberki Can you please reconsider fixing this. The listed work-around for this issue is to use no-remote-exec on the stamped action so that it can sneak access to volatile-status.txt without listing it as an input. But in combination with #9412 the no-remote-exec action will not get cached and will always re-run anyway.
So the combination of this issue and #9412 means that you have to choose between remote execution and stamping. The work-around was already a problem and killing build performance, but now it doesn't even work.

@ulfjack
Copy link
Contributor

ulfjack commented Mar 26, 2020

I don't work at Google anymore. That said, there is no technical way to fix this with a dumb remote cache, or with the current remote execution protocol.

Avoid stamping unless you absolutely positively must have stamped binaries - ideally, never enable stamping for developer builds, and require that everything that is deployed to production (or shipped to an app store or whatever) is built by an automated system that explicitly enables stamping.

@joeljeske
Copy link
Contributor

Are you suggesting that stamping should occur in an artifact pipeline after bazel? How does google perform their stamping?

@lberki
Copy link
Contributor

lberki commented Mar 27, 2020

(Oh hello, @ulfjack !)

At Google we:

  1. Try to inject build stamping information into the build as late as possible (so that changing build stamp information invalidates as few actions as possible)
  2. Only used stamped builds for outputs that need it (e.g. ones that are to be released). This is reflected by the value of the --stamp command line flag being false by default.

siddharthab pushed a commit to grailbio/rules_r that referenced this issue Apr 13, 2021
bazel remote cache does not respect the intent of volatile status file,
so let's ensure that the status variables are constant in our tests.

bazelbuild/bazel#10075
@siddharthab
Copy link
Contributor

For all who are still struggling with this issue, consider just making your volatile status files constant through your workspace status command. You can use an environment variable to control this behavior, which can be set or unset depending on whether you are testing or deploying. If the status files themselves are constant, you don't have to worry about how they are handled by bazel and any language rules.

You can override bazel's default values for status vars as in this example here.

@dieend
Copy link

dieend commented Jun 23, 2021

Hi, we use bazel and use the volatile-status to generate the Kubernetes YAML, and add the commit info to the Kubernetes object metadata. If the generated container image doesn't change on a newer commit, we don't expect the YAML info to change. It's important to have volatile-status work as documented when using remote-cache for us.

Even if we stamp it very late, it'll trigger unnecessary rollout if a change in the hash of volatile-status.txt will cause the target not to read from the remote cache.

How do we work around this?

@pauldraper
Copy link
Contributor

pauldraper commented Feb 2, 2022

The behavior you are seeing is the expected behavior.

Why do people want different caching behavior local vs remote? What the case that is trying to solve?

Seems like they should work similarly, no?

irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 3, 2022
When used with a remote build cache, these stamps (despite being marked
as volatile), induce an unnecessary golink step when switching back and
forth between branches. See
bazelbuild/bazel#10075. Opting out here shaves
a few precious seconds if the previously built binary is already
available.

Release note: None
@burdiyan
Copy link

I think this issue definitely needs to be reopened, unless there's a dupe already. All the workarounds to pass information to build actions that would not affect the action key are very annoying, and error-prone.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 4, 2022
- Sandboxing on MacOS have measurable overhead. See
  bazelbuild/bazel#8230; it comes from
  symlinks being much slower to create. This commit disables sandboxing
  on local builds (still present in CI) to hopefully speed up builds.
  - If this proves to be buggy for whatever reason, we could opt out of
    sandboxing for go package compiles specifically using
    --strategy=GoCompilePkg=local.
  - In the future if we want local sandboxing, we could explore
    sandboxfs: https://bazel.build/docs/sandboxing
- When used with a remote build cache, these stamps (despite being marked
  as volatile), induce an unnecessary golink step when switching back
  and forth between branches. See bazelbuild/bazel#10075. Opting out
  here shaves a few precious seconds if the previously built binary is
  already available.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 6, 2022
- Sandboxing on MacOS have measurable overhead. See
  bazelbuild/bazel#8230; it comes from
  symlinks being much slower to create. This commit disables sandboxing
  on local builds (still present in CI) to hopefully speed up builds.
  - If this proves to be buggy for whatever reason, we could opt out of
    sandboxing for go package compiles specifically using
    --strategy=GoCompilePkg=local.
  - In the future if we want local sandboxing, we could explore
    sandboxfs: https://bazel.build/docs/sandboxing
- When used with a remote build cache, these stamps (despite being marked
  as volatile), induce an unnecessary golink step when switching back
  and forth between branches. See bazelbuild/bazel#10075. Opting out
  here shaves a few precious seconds if the previously built binary is
  already available.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 7, 2022
- Sandboxing on MacOS have measurable overhead. See
  bazelbuild/bazel#8230; it comes from
  symlinks being much slower to create. This commit disables sandboxing
  on local builds (still present in CI) to hopefully speed up builds.
  - If this proves to be buggy for whatever reason, we could opt out of
    sandboxing for go package compiles specifically using
    --strategy=GoCompilePkg=local.
  - In the future if we want local sandboxing, we could explore
    sandboxfs: https://bazel.build/docs/sandboxing
- When used with a remote build cache, these stamps (despite being marked
  as volatile), induce an unnecessary golink step when switching back
  and forth between branches. See bazelbuild/bazel#10075. Opting out
  here shaves a few precious seconds if the previously built binary is
  already available.

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Apr 7, 2022
79360: bazel: disable sandboxing and avoid stamping r=irfansharif a=irfansharif

- Sandboxing on MacOS have measurable overhead. See
  bazelbuild/bazel#8230; it comes from
  symlinks being much slower to create. This commit disables sandboxing
  on local builds (still present in CI) to hopefully speed up builds.
  - If this proves to be buggy for whatever reason, we could opt out of
    sandboxing for go package compiles specifically using
    --strategy=GoCompilePkg=local.
  - In the future if we want local sandboxing, we could explore
    sandboxfs: https://bazel.build/docs/sandboxing
- When used with a remote build cache, these stamps (despite being marked
  as volatile), induce an unnecessary golink step when switching back
  and forth between branches. See bazelbuild/bazel#10075. Opting out
  here shaves a few precious seconds if the previously built binary is
  already available.

Release note: None

Jira issue: CRDB-14748

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@ghost
Copy link

ghost commented Jun 5, 2022

We are also facing this issue, which impact the efficiency of our continuous delivery and load and fill, unnecessarily, our remote cache. I have just spent a little time to find the origin of the problem, and if it can help, here a minimal example to reproduce the behaviour.

Example

WORKSPACE

BUILD.bazel

load(":myrule.bzl", "myrule")

myrule(name = "dummy")

myrule.bzl

def _myrule_impl(ctx):
    output_file = ctx.actions.declare_file(ctx.attr.name)

    ctx.actions.run_shell(
        outputs = [output_file],
        inputs = [ctx.version_file],
        command = "cp -f $1 $2",
        arguments = [
            ctx.version_file.path,
            output_file.path,
        ],
    )

    return DefaultInfo(
        files = depset([output_file]),
    )

myrule = rule(
    implementation = _myrule_impl,
)

A simple rule which copy the file volatile-status.txt to bazel-bin/dummy. Exactly the same example than test_volatile_status.

Scenarios

First build

We start from an empty environment. The Bazel local cache and the remote cache are fully cleaned.

bazel build \
    --stamp \
    --remote_cache=grpc://localhost:9092 \
    --experimental_remote_grpc_log=/tmp/grpc.log \
    //:dummy

bazel-bin/dummy:

BUILD_TIMESTAMP 1654449958

bazel-out/volatile-status.txt:

BUILD_TIMESTAMP 1654449958

As expected, the file, not present in the local cache and in the remote cache, is built and the remote cache is filled.
We can see all executed gRPC functions from the log file:

[05 Jun 2022 19:25:58.229] build.bazel.remote.execution.v2.ActionCache::GetActionResult - NotFound (2ms)
    Message: 1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399 not found in AC
    -->
    	|- ActionDigest: 1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399/143
    <--

[05 Jun 2022 19:25:58.245] build.bazel.remote.execution.v2.ContentAddressableStorage::FindMissingBlobs - OK (2ms)
    -->
    	|- BlobDigests
    	   - 405ee2251a1467910dd9163e6991d0a6df935920484e9759e6ecc728b980bd7b/27
    	   - e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/0
    	   - d543d2b229eec2c9cfff4ef2a4795f50d53944628193efb2b27cec40da964ddb/131
    	   - 1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399/143
    <--
    	|- MissingBlobDigests
    	   - 405ee2251a1467910dd9163e6991d0a6df935920484e9759e6ecc728b980bd7b/27
    	   - d543d2b229eec2c9cfff4ef2a4795f50d53944628193efb2b27cec40da964ddb/131
    	   - 1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399/143

[05 Jun 2022 19:25:58.253] google.bytestream.ByteStream::Write - OK (20ms)
    -->
    	|- ResourceNames: [uploads/935da4f7-4095-4492-a43b-7bab3b2b147d/blobs/1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399/143]

[05 Jun 2022 19:25:58.252] google.bytestream.ByteStream::Write - OK (22ms)
    -->
    	|- ResourceNames: [uploads/25a22748-a5e1-4294-a15d-03766431a14a/blobs/405ee2251a1467910dd9163e6991d0a6df935920484e9759e6ecc728b980bd7b/27]

[05 Jun 2022 19:25:58.253] google.bytestream.ByteStream::Write - OK (28ms)
    -->
    	|- ResourceNames: [uploads/26a4e52c-eac7-4839-a0b7-abb5994bbc38/blobs/d543d2b229eec2c9cfff4ef2a4795f50d53944628193efb2b27cec40da964ddb/131]

[05 Jun 2022 19:25:58.283] build.bazel.remote.execution.v2.ActionCache::UpdateActionResult - OK (10ms)
    -->
    	|- ActionDigest: 1e8a3a53556eb21863ddd36e4e6dc6b4b40e1e5308c489130334dca96e77f399/143
    	|- ActionResult:
    		|- OutputFiles:
    		|-   - x bazel-out/k8-fastbuild/bin/dummy
    		|-     |- 405ee2251a1467910dd9163e6991d0a6df935920484e9759e6ecc728b980bd7b/27
    <--

Second build without clean cache

bazel build \
    --stamp \
    --remote_cache=grpc://localhost:9092 \
    --experimental_remote_grpc_log=/tmp/grpc.log \
    //:dummy

bazel-bin/dummy:

BUILD_TIMESTAMP 1654449958

bazel-out/volatile-status.txt:

BUILD_TIMESTAMP 1654450956

Not surprisingly, the file, already in the local cache is not rebuilt even if the volatile stamp file has changed.
No gRPC requests are executed.
This behaviour is conform with the documentation.

Third build with a clean of the local cache

bazel clean --expunge
bazel build \
    --stamp \
    --remote_cache=grpc://localhost:9092 \
    --experimental_remote_grpc_log=/tmp/grpc.log \
    //:dummy

bazel-bin/dummy:

BUILD_TIMESTAMP 1654451141

bazel-out/volatile-status.txt:

BUILD_TIMESTAMP 1654451141

The file, with a different digestKey than the first build, is searched from the remote cache and is not found.
The file is then rebuilt and pushed again in the remote cache. We can see all similary executed gRPC requests than the first build.

The computation of this digestKey should not take into account the updated content of the volatile-status.txt, as is the case when we remove or manually update the volatile stamp file between two builds.

It's the definition of the volatile file:

... In order to avoid rerunning stamped actions all the time though, Bazel pretends that the volatile file never changes. In other words, if the volatile status file is the only file whose contents has changed, Bazel will not invalidate actions that depend on it. If other inputs of the actions have changed, then Bazel reruns that action, and the action will see the updated volatile status, but just the volatile status changing alone will not invalidate the action.

(from the user-manual 5.1.1)

This behaviour should be the same even if we use a remote cache.

Workaround

To avoid this issue, we are going to remove the use of the --stamp option and:

  • Find a dirty and not hermetic way to inject our volatile values.
    or
  • Use an another tool or another build system to finish the build of our final artifacts.

If we can reopen this issue that would be really great and appreciated ❤️ !

kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
kormide added a commit to aspect-forks/angular that referenced this issue Dec 25, 2022
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)
devversion pushed a commit to angular/angular that referenced this issue Jan 2, 2023
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)

PR Close #48579
devversion pushed a commit to angular/angular that referenced this issue Jan 2, 2023
Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)

PR Close #48579
trekladyone pushed a commit to trekladyone/angular that referenced this issue Feb 1, 2023
…r#48579)

Fix non-hermetic zipping of example zips by fixing the zip entry timestamps.

I also hardcoded stamp values in stable-status.txt and volatile-status.txt using the workspace status command for the aio_local_deps config to improve cache performance. The Bazel remote cache appears to not ignore volatile-status.txt when there are no other changes, unlike the local Bazel cache:

bazelbuild/bazel#10075 (comment)

PR Close angular#48579
gitlab-dfinity pushed a commit to dfinity/ic that referenced this issue Mar 30, 2023
' into 'master'

[IDX-2760] Pass volatile status over a side channel.

Passing it as dependency invalidates remote cache:
bazelbuild/bazel#10075 

See merge request dfinity-lab/public/ic!11580
ParkMyCar added a commit to MaterializeInc/materialize that referenced this issue Sep 5, 2024
~~This PR improves our remote caching by changing the way we use
[`--workspace_status_command`](https://bazel.build/reference/command-line-reference#flag--workspace_status_command)
which enables us to fetch 100% of actions from the remote cache when
applicable.~~

~~Previously I noticed CI jobs would always appear to rebuild the leafs
of our repo (e.g. `environmentd` and `clusterd`), which unfortunately
take the longest because of linking. This was because our workspace
status command required generating some Rust code which was breaking our
remote caching. Now with this change I've observed the remote caching
working exactly as a local cache, e.g. a change in the `sql` crate only
requires re-building `environmentd` whereas it used to also cause a
recompile of `clusterd` and some other crates.~~

Edit: Unfortunately there is no free lunch and my testing was flawed.
The original approach from this PR of tweaking the way we use
`workspace_status_command` did not work, and we were actually just
providing a constant value for the git hash instead of the updated
value. After investigating further it turns out that Bazel treats local
cache artifacts differently than remote cache artifacts when it comes to
the workspace status and specificallly the `volatile-status.txt` file we
rely on here, see bazelbuild/bazel#10075.

The only way to prevent `build_info!()` from causing unnecessary
rebuilds _and_ update the git hash for anything that does get rebuilt,
is to side channel the necessary information like we're doing here. The
approach is we write the current git hash to a known location, in this
case `/tmp/mz_git_hash.txt` and then read it via Rust. `bin/bazel` has
been updated to write this file, and I'll make the same change to
`mzbuild`. When running with `--config=release-stamp` we fallback to the
original behavior with `workspace_status_command` and formally "stamp"
the builds which causes everything to get rebuilt.

This is definitely a hack but from what I can tell there is no other way
to get the same behavior. An alternative is we always just return a
constant git hash when not running with `--config=release-stamped`, not
sure how important stamping development builds is.

### Motivation

Progress towards
https://github.com/MaterializeInc/materialize/issues/26796

* Improve remote caching speed

### Checklist

- [x] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [x] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [x] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [x] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more data needed team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged
Projects
None yet
Development

No branches or pull requests