Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SingleFile app fails after failed extraction #1255

Closed
jls5177 opened this issue Jan 2, 2020 · 5 comments
Closed

SingleFile app fails after failed extraction #1255

jls5177 opened this issue Jan 2, 2020 · 5 comments
Assignees
Milestone

Comments

@jls5177
Copy link

jls5177 commented Jan 2, 2020

Problem: I have a dotnet core 3 app that is published as a SingleFile which is run concurrently as part of our continuous build/test platform. About once a day one of the build nodes fails to find one of the assemblies. At this point all successive runs of the app fail with the same error.

C:\>C:\dotnet\bin\Release\netcoreapp3.1\win-x64\publish\dotnet.exe
Error:
  An assembly specified in the application dependencies manifest (dotnet.deps.json) was not found:
    package: 'runtimepack.Microsoft.NETCore.App.Runtime.win-x64', version: '3.1.0'
    path: 'Microsoft.VisualBasic.dll'

There are 2 issues here:

  1. Concurrent extracts cause some files to be missing from the extraction directory. Possible due to the directory rename step?

  2. Subsequent runs after a failed extraction do not correct the missing assemblies.

The first issue might be harder to chase down as this is a race condition. In addition, fixing the second issue would at least minimize the exposure to a single run. I provide reproduction steps for the second issue below.

Expected Results: The app should recover from a failed extraction on the next run. Re-extracting any missing assemblies from the binary.

Workaround: I can recover the build node by deleting the extraction directory which forces the app to extract again on the next run.

Reproduction Steps: I am able to recreate the failure by performing the following steps:

  1. Build a new console app and publishing it as a SingleFile
C:\dotnet>dotnet new console

Welcome to .NET Core 3.1!
---------------------
SDK Version: 3.1.100

Getting ready...
The template "Console Application" was created successfully.

Processing post-creation actions...
Running 'dotnet restore' on C:\dotnet\dotnet.csproj...
  Restore completed in 184.05 ms for C:\dotnet\dotnet.csproj.

Restore succeeded.

C:\dotnet>dotnet publish -r win-x64 -c Release /p:PublishSingleFile=true
Microsoft (R) Build Engine version 16.4.0+e901037fe for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Restore completed in 9.74 sec for C:\dotnet\dotnet.csproj.
  dotnet -> C:\dotnet\bin\Release\netcoreapp3.1\win-x64\dotnet.dll
  dotnet -> C:\dotnet\bin\Release\netcoreapp3.1\win-x64\publish\
  1. Running the app one time to extract the contents
  2. Delete any of the assemblies in the extracted directory. Need to leave the directory otherwise the app will be extracted again
  3. Run the app again to see the failure listed above

Comments

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Host untriaged New issue has not been triaged by the area owner labels Jan 2, 2020
@jls5177
Copy link
Author

jls5177 commented Jan 2, 2020

Attaching the trace logs to help debug the issue. One is the output of the first run and the next one is the output after deleting some of the assemblies.
trace_firstrun.txt
trace_with_missing_assembly.txt

@swaroop-sridhar swaroop-sridhar added area-Single-File and removed untriaged New issue has not been triaged by the area owner labels Jan 3, 2020
@swaroop-sridhar swaroop-sridhar self-assigned this Jan 3, 2020
@swaroop-sridhar
Copy link
Contributor

Part (2) of the problem is a dup of https://github.com/dotnet/core-setup/issues/8314.
We understand the problem, and working on a fix.

Part(1) of the problem is interesting -- rename happening before all files are extracted.
I've never encountered this failure mode before. I tried running multiple versions of a fairly big app (Greenshot) several times, but couldn't repro the failure.

We may need to add a check/wait before the rename step to make sure that all files are actually in place.

@MarkusKuntner
Copy link

We also received these error messages (2) from beta users, which means that we will postpone our release.
It is not reasonable for a user to set an environment variable. So this is already a serious bug. We hope that the problem can be solved quickly, a slightly longer startup time (when checking if all *.dll's are present) is reasonable, if the program then launches safely.
Starting multiple instances of an app concurrently (1) should be taken into consideration when fixing the bug (2). "jls5177" pointed out a possible flaw.

@jls5177
Copy link
Author

jls5177 commented Jan 8, 2020

Thanks @swaroop-sridhar for linking the duplicate ticket for issue #2. I agree this is a duplicate of that issue and work the issue there.

Sorry to mislead everyone but upon further testing I was able to reproduce the issue with a single executor per build node (corehost trace log attached here, failed_single_executor_corehost_trace.txt)

Moving to a single executor just increased the time to failure. The core host trace log looks identical to just removing a file from the extracted directory. So this does not look to be a concurrency issue.

Some additional data points:

  1. When a build node fails it always fails with either 0 or 58 files in the extracted directory. The folder should have 240 files.
  2. The build system has the single app pre-installed using a MSI installer and has not changed since deployment. Meaning the extraction directory has been consistent since we migrated to the SingleEXE deployment and started seeing this issue, the folder name is "aeidsv02.4h1" in the logs.
  3. All build nodes are running Windows 2019 with latest patches.
  4. When running multiple build executors per build node we would see the failure at least 3 to 5 times a week.
  5. Around the time I opened this bug we switched to a single executor per build node and have seen the issue only 2 times since then.

Are there any other logs that I can capture to help root cause the issue?

Unfortunately, I will have to revert back to a normal dotnet published application until this issue is resolved.

@jeffschwMSFT jeffschwMSFT added this to the 5.0 milestone Jan 8, 2020
@swaroop-sridhar
Copy link
Contributor

Thanks for confirming @jls5177
I'll resolve this issue as a dup of dotnet/core-setup#8314.
Sorry for the inconvenience. I expect that this will be fixed in the next servicing release.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants