Option --check_tests_up_to_date doesn't work with Remote cache server #3978

PoncinMatthieu · 2017-10-27T11:24:20Z

Description of the bug:

Bazel option --check_tests_up_to_date doesn't work with the remote cache server.
Also there is no easy way to know if the tests were retrieved from the cache server or if they actually run.
Is this a bug or an intended behavior? Is there any alternative?

Background to the question / context

I have been trying to setup a remote cache server for setting up bazel with jenkins. We have a mono repo with multiple BUILD projects/libs and what I am trying to do is to setup a main jenkins pipeline which would run on every open pull requests and every commits to master. This main pipeline is checking if tests are up to date, if they are not, then we execute a downstream pipeline specific to the project requiring tests. The downstream pipeline run the tests and automatically deploy to staging/dev environments depending on the branch and project.
We need to use the remote cache server because we want the ability to run this pipeline in parallel for every PR. So that we can open 2 PRs at the same time and one doesn't need to wait for the other to run tests. Eventually this will be done from multiple jenkins nodes but for now we simply checkout the code in a different directory for each branches.
This setup works really well with a single server and single branch/workspace since we only need the local cache and the --check_tests_up_to_date works as expected. However this doesn't work when using the remote cache server.

Minimal example to reproduce the problem:

Consider the following repository: https://github.com/dfabulich/hazelcast-junit-test
First start the hazelcast server:

java -jar hazelcast-3.8.jar &

Then run the tests for the first time

$> bazel clean && bazel test :apptest --hazelcast_node=127.0.0.1:5701 --spawn_strategy=remote
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Analysed target //:apptest (19 packages loaded).
INFO: Found 1 test target...
Target //:apptest up-to-date:
  bazel-bin/apptest.jar
  bazel-bin/apptest
INFO: Elapsed time: 14.457s, Critical Path: 12.08s
INFO: Build completed successfully, 12 total actions
//:apptest                                                               PASSED in 10.4s

Executed 1 out of 1 test: 1 test passes.

All good: We can see that the tests take 10s seconds to execute (this is due to a sleep in the tests)

Then we run the tests for a second time, but do a bazel clean to remove the local cache. Checking out the repo on a different path would do the same:

$> bazel clean && bazel test :apptest --hazelcast_node=127.0.0.1:5701 --spawn_strategy=remote
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Analysed target //:apptest (19 packages loaded).
INFO: Found 1 test target...
Target //:apptest up-to-date:
  bazel-bin/apptest.jar
  bazel-bin/apptest
INFO: Elapsed time: 4.531s, Critical Path: 2.44s
INFO: Build completed successfully, 12 total actions
//:apptest                                                               PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.

Good but could be better: We can see that the tests passed but didn't actually run from the fact that tests took 0 seconds. However it doesn't show the usual string (cached) PASSED

Then doing the same with the argument --check_tests_up_to_date:

$> bazel clean && bazel test :apptest --hazelcast_node=127.0.0.1:5701 --spawn_strategy=remote --check_tests_up_to_date
INFO: Reading 'startup' options from /Users/ovelincompany/Projects/test/hazelcast-junit-test/tools/bazel.rc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Reading 'startup' options from /Users/ovelincompany/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Found 1 test target...
ERROR: action 'Expanding template apptest' is not up-to-date.
ERROR: action 'Creating source manifest for //:apptest' is not up-to-date.
ERROR: action 'BazelWorkspaceStatusAction stable-status.txt' is not up-to-date.
Target //:apptest failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.986s, Critical Path: 0.00s
//:apptest                                                            NO STATUS

Finished with 0 passing and 0 failing tests up to date, 1 out of date.

Not good: bazel doesn't use the remote cache server and shows NO STATUS .. forcing us to rebuild/retest all of our apps.

Environment info

Operating System:
Linux ubuntu
Bazel version (output of bazel info release):
release 0.7.0

Have you found anything relevant by searching the web?

This issue was useful for me to understand how the remote cache works: #1413

Small note:

Our actual project is mostly python code, but the behaviour is the same as for this java unit test.

The text was updated successfully, but these errors were encountered:

PoncinMatthieu · 2017-10-29T12:31:42Z

Not sure if it can help solve the issue, but I took latest master and tracked this back to the method result.getData().getRemotelyCached() returning false here:

bazel/src/main/java/com/google/devtools/build/lib/runtime/TestResultAnalyzer.java

Line 212 in 8601d4a

if (!(result.isCached() || result.getData().getRemotelyCached())) {

I might be wrong but I believe the expected behaviour is for it to return true in the case the test was
taken from the remote cache.

I have zero knowledge on protobuf.. but my wild guess is that this method returns the value from this:

bazel/src/main/protobuf/test_status.proto

Line 90 in bdb4fe4

optional bool remotely_cached = 7;

Which is missing from the test result data and probably ought to be there.

ulfjack · 2017-10-31T22:03:48Z

I think marking this as feature request is correct. The current design intentionally targets the local cache use case, so this is an intentional change in design.
I'm not sure this is a P2 from our point of view. It certainly seems nice to have, but it's overall more of a performance optimization, isn't it? If you run this in order to decide whether to run the tests, why not just run the tests and get the cache hits for some and non-cache hits for others?
This isn't going to be easy to implement. The current implementation is at an action level and requires local action cache cache hits. Applying it to the remote spawn cache will require a completely different implementation. (Note the difference between "actions" and "spawns".) In some cases, you can't even ask the remote cache for an entry without doing some local work first, e.g., in order to run the Java compiler, Bazel first may have to write a parameter file, and that happens locally (at this time).

Overall, I'm not sure the additional complexity is warranted. On the other hand, we've been considering removing the local action cache in favor of a local spawn cache (for which there is an implementation now), and we probably can't just remove the check_up_to_date and check_tests_up_to_date flags, so we'd need to find a way to implement these flags on top of spawn caches, in which case we might as well make it work in all cases.

PoncinMatthieu · 2017-11-01T18:51:02Z

Hi @ulfjack ,
Thank you for looking into the issue!

From my point of view, as a user, it is hard not to see it as a bug, so maybe there should be a mention on this in the documentation or command line reference. As a first time naive user, I have spent a good fair amount of time reading the doc and I expected this option to work on both local and remote cache. And again as a user, I can't see a good reason for such option to work only on the local cache.
I understand this comes off as a performance optimisation, but from our use case I would argue that this feature is very important to us. Though I am quite surprised we seem to be the only ones trying to use this option with the remote cache on CI.. so maybe we are trying to use bazel with jenkins the wrong way? Maybe there are better ways? I guess Github is not the best place for discussing/explaining our use case so I will create a new thread to talk about this on bazel-discuss.
That’s a fair point, I don’t understand very well the difference between spawn and action but it does sound like it would require a lot of changes.

I think anyone using those flags on the local cache will eventually want to be using them on the remote cache.. so either this should be implemented or the flags simply shouldn't exist to avoid this usage pattern?

I'm not sure if I could but with some guidance I would be very happy to try to contribute to implementing those flags for the remote cache.

PoncinMatthieu · 2017-11-23T04:01:15Z

In case anyone else is trying to do something similar and is blocked by this, I would recommend having a look at querying the graph instead of using the cache.
With the following script showing how to check for code changed between your branch and master and run only required tests: https://github.com/bazelbuild/bazel/blob/master/scripts/ci/ci.sh

For more info, read this discussion: https://groups.google.com/forum/#!starred/bazel-discuss/FVLLarcC5hA

buchgr · 2019-02-05T10:03:04Z

I wasn't aware that we had this flag. The general idea seems useful for remote caching / execution to check which actions are not cached remotely (and maybe why). fyi @agoulti

sgowroji · 2023-02-17T08:27:21Z

Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen (or ping me to reopen) if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

Bazel lacks a method to verify if test results are current in the cache. Additional details can be found at: bazelbuild/bazel#3978. To address this limitation, let's change the test runner so that it doesn't run tests and exits with a non-zero code immediately if the $TEST_CHECK_UP_TO_DATE_FILE file exists. To check test results in the cache, we need to create the $TEST_CHECK_UP_TO_DATE_FILE and run tests. Successful test execution indicates that the results are up-to-date in the cache. Signed-off-by: Andrei Vagin <avagin@google.com>

agoulti self-assigned this Oct 30, 2017

katre assigned ulfjack Oct 31, 2017

katre added platform: other P2 We'll consider working on this in future. (Assignee optional) type: feature request labels Oct 31, 2017

ulfjack added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Oct 31, 2017

agoulti removed their assignment Nov 1, 2017

hlopko added the category: remote execution / caching label Mar 21, 2018

ulfjack removed their assignment Nov 19, 2018

buchgr added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed category: remote execution / caching platform: other labels Feb 5, 2019

sgowroji added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 17, 2023

sgowroji closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option --check_tests_up_to_date doesn't work with Remote cache server #3978

Option --check_tests_up_to_date doesn't work with Remote cache server #3978

PoncinMatthieu commented Oct 27, 2017

PoncinMatthieu commented Oct 29, 2017

ulfjack commented Oct 31, 2017

PoncinMatthieu commented Nov 1, 2017

PoncinMatthieu commented Nov 23, 2017

buchgr commented Feb 5, 2019

sgowroji commented Feb 17, 2023

Option --check_tests_up_to_date doesn't work with Remote cache server #3978

Option --check_tests_up_to_date doesn't work with Remote cache server #3978

Comments

PoncinMatthieu commented Oct 27, 2017

Description of the bug:

Background to the question / context

Minimal example to reproduce the problem:

Environment info

Have you found anything relevant by searching the web?

Small note:

PoncinMatthieu commented Oct 29, 2017

ulfjack commented Oct 31, 2017

PoncinMatthieu commented Nov 1, 2017

PoncinMatthieu commented Nov 23, 2017

buchgr commented Feb 5, 2019

sgowroji commented Feb 17, 2023