Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows remote test error: failed: java.io.IOException: test.err (Permission denied) #20741

Closed
jayconrod opened this issue Jan 4, 2024 · 4 comments
Assignees
Labels
area-Windows Windows-specific issues and feature requests P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@jayconrod
Copy link
Contributor

Description of the bug:

I apologize in advance, I've found another weird Windows remote execution bug.

When using Windows remote execution, I'm seeing bazel test invocations fail with messages like:

ERROR: C:/actionsrunner/_work/engflow/engflow/test/java/com/engflow/re/integration/auth/BUILD:1:10: Testing //test/java/com/engflow/re/integration/auth:auth failed: java.io.IOException: C:/_b/bpc5uood/execroot/engflow/bazel-out/x64_windows-opt/testlogs/test/java/com/engflow/re/integration/auth/auth/test.err (Permission denied)

The test is actually failing remotely, but Bazel shouldn't crash. It also fails to complete its BES stream, which is frustrating but not the main issue.

From the message, I'd guess Bazel is opening the file, then attempting to delete it.

The test.err file exists, is owned by BUILTIN/Administrators, and has no special ACLs. No other processes have it open after bazel test exits. It can be deleted manually. I think the path comes from TestRunnerAction.java. Its purpose seems to be to hold stderr from tests, however, that's not what's happening here.

The thing that triggers this is a new remote execution feature where we populate the ExecuteResponse.message field. The contents of the test.err file match the string that the server puts in that field.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Write a test that fails. I've mainly seen this in Java tests, but I don't believe it's language specific.
  2. Run the test from a Windows client on a Windows remote execution service that populates the ExecuteResponse.message field.

(Sorry I realize this isn't going to be easy for anyone else. It reproduces 100% for me though, so if you have code pointers or debugging suggestions, I'll help as much as I can).

Which operating system are you running Bazel on?

Windows Server 2022 / x86_64

What is the output of bazel info release?

release 8.0.0-pre.20231030.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

n/a

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

n/a

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No

Have you found anything relevant by searching the web?

No

Any other information, logs, or outputs that you want to share?

n/a

@coeuvre
Copy link
Member

coeuvre commented Jan 4, 2024

ExecuteResponse.message is written to action's stderr which is backed by test.err file in the case of test action.

If this test is marked as flaky, or you have set --flaky_test_attempts, Bazel will retry the test if it failed. Before a retry, Bazel needs to rename test outputs generated by previous run, including test.err.

Can you check whether the IOException comes from there?

@sgowroji sgowroji added area-Windows Windows-specific issues and feature requests team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jan 4, 2024
@jayconrod
Copy link
Contributor Author

The target isn't flaky (well.. it's not marked as flaky). I tried running with --flaky_test_attempts=1 and still saw the error.

I dumped the server log and found the stack below. It looks like the error is coming from StandaloneTestRunner.writeOutFile. It copies data from test.err to test.log, then attempts to delete test.err. I believe test.err is still open at this point, so the deletion fails.

com.google.devtools.build.lib.skyframe.ActionExecutionFunction$ActionExecutionFunctionException: com.google.devtools.build.lib.actions.AlreadyReportedActionExecutionException: java.io.IOException: C:/_b/dho4ew2i/execroot/engflow/bazel-out/x64_windows-fastbuild/testlogs/test/java/com/engflow/instancemetrics/instancemetrics/test.err (Permission denied)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:357)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:171)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: com.google.devtools.build.lib.actions.AlreadyReportedActionExecutionException: java.io.IOException: C:/_b/dho4ew2i/execroot/engflow/bazel-out/x64_windows-fastbuild/testlogs/test/java/com/engflow/instancemetrics/instancemetrics/test.err (Permission denied)
	... 10 more
Caused by: com.google.devtools.build.lib.actions.EnvironmentalExecException: java.io.IOException: C:/_b/dho4ew2i/execroot/engflow/bazel-out/x64_windows-fastbuild/testlogs/test/java/com/engflow/instancemetrics/instancemetrics/test.err (Permission denied)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.runTestAttempt(StandaloneTestStrategy.java:812)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.beginTestAttempt(StandaloneTestStrategy.java:315)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy$StandaloneTestRunnerSpawn.execute(StandaloneTestStrategy.java:581)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.executeAllAttempts(TestRunnerAction.java:1163)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:975)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:952)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.executeAction(SkyframeActionExecutor.java:1148)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1065)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:165)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:94)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:562)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:859)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:333)
	... 9 more
Caused by: java.io.IOException: C:/_b/dho4ew2i/execroot/engflow/bazel-out/x64_windows-fastbuild/testlogs/test/java/com/engflow/instancemetrics/instancemetrics/test.err (Permission denied)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.delete(WindowsFileSystem.java:60)
	at com.google.devtools.build.lib.vfs.Path.delete(Path.java:587)
	at com.google.devtools.build.lib.remote.RemoteActionFileSystem.delete(RemoteActionFileSystem.java:298)
	at com.google.devtools.build.lib.vfs.Path.delete(Path.java:587)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.writeOutFile(StandaloneTestStrategy.java:346)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.runTestAttempt(StandaloneTestStrategy.java:806)
	... 21 more
Caused by: java.nio.file.AccessDeniedException: C:/_b/dho4ew2i/execroot/engflow/bazel-out/x64_windows-fastbuild/testlogs/test/java/com/engflow/instancemetrics/instancemetrics/test.err
	at com.google.devtools.build.lib.windows.WindowsFileOperations.deletePath(WindowsFileOperations.java:307)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.delete(WindowsFileSystem.java:56)
	... 26 more

@coeuvre
Copy link
Member

coeuvre commented Jan 5, 2024

Thanks for the stacktrace! I can now reproduce this issue on a Windows machine. Working on the fix.

@coeuvre coeuvre self-assigned this Jan 8, 2024
@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) and removed untriaged labels Jan 8, 2024
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 24, 2024
Unfortuantely, there isn't an easy way to add integration test for this specific edge case on Windows.

Fixes bazelbuild#20741.

Closes bazelbuild#20752.

PiperOrigin-RevId: 596547046
Change-Id: I4f517b161c03793329d5a4e21ec8ac4a5b53abb0
github-merge-queue bot pushed a commit that referenced this issue Jan 29, 2024
Unfortuantely, there isn't an easy way to add integration test for this
specific edge case on Windows.

Fixes #20741.

Closes #20752.

Commit
958e0c4

PiperOrigin-RevId: 596547046
Change-Id: I4f517b161c03793329d5a4e21ec8ac4a5b53abb0

Co-authored-by: Chi Wawng <coeuvre@gmail.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants