Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet publish / run for linux-arm fails with NetCore5 targets in preview 5+ #1099

Closed
pgrawehr opened this issue Jun 7, 2020 · 25 comments
Closed
Labels
bug Something isn't working

Comments

@pgrawehr
Copy link
Contributor

pgrawehr commented Jun 7, 2020

Describe the bug

Running dotnet publish for a project with target netcore5 fails with the current runtime

Global.json at the time of this writting:

{
  "tools": {
    "dotnet": "5.0.100-preview.5.20251.2",
    "runtimes": {
      "dotnet/x64": [
        "2.1.11"
      ]
    }
  },
  "sdk": {
    "version": "5.0.100-preview.5.20251.2"
  },
  "msbuild-sdks": {
    "Microsoft.DotNet.Arcade.Sdk": "5.0.0-beta.20261.9",
    "Microsoft.DotNet.Helix.Sdk": "5.0.0-beta.20261.9"
  }
}

Steps to reproduce

On the desktop (Windows) run:

cd projects\iot
build
# passes as expected
cd src\devices\Ads1115\samples
dotnet publish Ads1115.Samples.csproj -r linux-arm -c debug --self-contained false
# passes as well - all fine (the path \src\devices\Ads1115\samples\bin\Debug\netcoreapp2.1\linux-arm\publish now contains binaries ready to run on the pi)

# now change in Ads1115.Samples.csproj the <TargetFramework> entry to <TargetFramework>netcoreapp5.0</TargetFramework>
dotnet publish Ads1115.Samples.csproj -r linux-arm -c debug --self-contained false

An error is now reported:

C:\projects\iot2\src\devices\Ads1115\samples>dotnet publish Ads1115.Samples.csproj -r linux-arm -c debug --self-contained false
Microsoft (R)-Build-Engine, Version 16.7.0-preview-20229-03+2cee6d020 für .NET
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.

  Wiederherzustellende Projekte werden ermittelt...
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102: Das Paket "Microsoft.NETCore.App.Host.linux-arm" der Version (= 5.0.0-preview.5.20230.9) wurde nicht gefunden.
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102:   - 2034 Version(en) gefunden in "https://dotnetfeed.blob.core.windows.net/dotnet-core/index.json" [ Nächste Version: 5.0.0-preview.5.20224.13 ]
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102:   - 22 Version(en) gefunden in "nuget.org" [ Nächste Version: 5.0.0-preview.4.20251.6 ]
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102:   - 0 Version(en) gefunden in https://dotnet.myget.org/F/dotnet-core/api/v3/index.json.
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102:   - 0 Version(en) gefunden in https://dotnetfeed.blob.core.windows.net/dotnet-iot/index.json.
C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj : error NU1102:   - 0 Version(en) gefunden in https://dotnetfeed.blob.core.windows.net/dotnet-tools-internal/index.json.
  Fehler beim Wiederherstellen von "C:\projects\iot2\src\devices\Ads1115\samples\Ads1115.Samples.csproj" (in "1.11 sec").
  2 von 3 Projekten sind für die Wiederherstellung auf dem neuesten Stand.

Expected behavior

Build passes

Actual behavior

See above. It looks a bit as if the SDK build that's currently referenced is incomplete. The problem does not happen if going back to the "official" beta version 5.0.100-preview.4.20258.7

Versions used
SDK:
5.0.100-preview.5.20251.2 [C:\Program Files\dotnet\sdk]
Runtime:
Microsoft.NETCore.App 5.0.0-preview.5.20230.9 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]

Edit
See below: The problem got even worse with previews 6 and 7. I can't get them to work at all on the Raspi when targetting NetCore 5.0.

@pgrawehr pgrawehr added the bug Something isn't working label Jun 7, 2020
@joperezr
Copy link
Member

joperezr commented Jun 7, 2020

Thanks for logging this @pgrawehr, this is very likely because the .Net 5 TFM changed on the new SDK and won’t be netcoreapp5.0 anymore, can you try using net5.0 instead?

@pgrawehr
Copy link
Contributor Author

pgrawehr commented Jun 7, 2020

@joperezr : Nope, exactly the same result.

@pgrawehr
Copy link
Contributor Author

pgrawehr commented Jun 7, 2020

When using -r win-x64 everything is fine.

@joperezr
Copy link
Member

joperezr commented Jun 7, 2020

Oh I see the other issue, you don’t have the dotnet 5 nuget feed since we don’t really depend on it yet and preview 5 builds haven’t been published to nuget.org, which explains why preview 4 works fine (that has published already)

For dotnet 5, add the feed:
https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json

@joperezr
Copy link
Member

joperezr commented Jun 7, 2020

Win-x64 works most likely because that runtime comes with the sdk itself as that matches the sdk you installed, but it doesn’t come with the ones for other RIds like linux-arm

@pgrawehr
Copy link
Contributor Author

pgrawehr commented Jun 7, 2020

Nope, still same.

Output of nuget sources:

Registrierte Quellen:

  1.  dotnet-eng [Aktiviert]
      https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-eng/nuget/v3/index.json
  2.  dotnet-tools [Aktiviert]
      https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-tools/nuget/v3/index.json
  3.  dotnet5 [Aktiviert]
      https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json
  4.  dotnet3.1 [Aktiviert]
      https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet3.1/nuget/v3/index.json
  5.  nuget.org [Aktiviert]
      https://api.nuget.org/v3/index.json

@pgrawehr
Copy link
Contributor Author

@joperezr : The problem just got worse when trying to update to preview 6. I can't run any application built for netcoreapp5.0 (or net5.0) on the Pi. When trying to do so, all I get is

Failed to create CoreCLR, HRESULT: 0x80070057

Steps to reproduce are as above, except now with SDK version "5.0.100-preview.6.20310.4".

The version installed on the Pi is https://download.visualstudio.microsoft.com/download/pr/fc54f62e-c7bd-43a3-a27b-4afb08bc4d6f/b01ccacf3d94efc0bbe26f64f7fde9b7/dotnet-sdk-5.0.100-preview.6.20318.15-linux-arm.tar.gz

I've tryied both global and local installations of the SDK, no improvement. This is serious, because it basically means Net5 is not usable on the Pi, and I have to revert to 3.1 for my application, which shows other issues.

@pgrawehr
Copy link
Contributor Author

Note: It works again after downgrading to 5.0-preview.4. This does not look good at all...

@joperezr joperezr added this to the vNext milestone Aug 10, 2020
@pgrawehr pgrawehr changed the title dotnet publish for linux-arm fails with latest sources against netcore5.0 dotnet publish / run for linux-arm fails with NetCore5 targets in preview 5+ Aug 14, 2020
@pgrawehr
Copy link
Contributor Author

I did some further (actually very basic) test, but no luck:

  1. Get the latest preview (I used preview 8)
  2. open a shell on the raspi. Do
mkdir dotnetpreview
tar xzf Downloads/dotnet-sdk-5.0.100-preview.8.20326.6-linux-arm.tar.gz -C dotnetpreview
  1. Have an app built for net5.0 (I used the ADS1115 sample, as described above). I had it built against preview 4 on my PC, but the version it was built with doesn't seem to be the issue.
  2. Cd to the bin directory (i.e. iot/src/devices/Ads1115/samples/bin/Debug/linux-arm/publish). Then run
dotnet ./Ads1115.Samples.dll

(this uses the globaly installed preview 4) -> Starts the app and works fine
5. Run

~/dotnetpreview/dotnet ./Ads1115.Samples.dll

Result is

Failed to create CoreCLR, HRESULT: 0x80070057

Unfortunatelly, I have no clue on how to debug this further, since the problem is very unlikely to be within the app itself. The dotnet executable itself doesn't seem to have debug information, so running gdb doesn't tell me much either.

@sdmaclea
Copy link

The dotnet executable itself doesn't seem to have debug information, so running gdb doesn't tell me much either.

The dotnet symbol tool can be used to get the symbols for released .NET binaries. It can be installed with dotnet tool install -g dotnet-symbol. Then can use it to get the symbols for various binaries. Something like dotnet-symbol <path>/libcoreclr.so

@pgrawehr
Copy link
Contributor Author

Ok, that gets one step further. Had to do a bit of trickery:

sudo chmod 777 /root
sudo ln -s /home/pi/projects/runtime /root/runtime
dotnet-symbol <path to dotnet executable> (being ~/projects/iot/.dotnet/dotnet for the project-level installation here)

to get the source lookup to work (gdb wouldn't allow selecting an alternate location to the source path given in the debug information, it seems).
Now ddd <path to dotnet executable> does what it should. (Use "Program -> Run..." to provide the dll to execute).

@pgrawehr
Copy link
Contributor Author

@sdmaclea Can't get much further, after trying for hours :-(

I can't step into coreclr_initialize (file runtime\src\coreclr\src\dlls\mscoree\unixinterface.cpp) where likely the error happens. The file mscoree.dll which should contain this function doesn't exist. Was also unsuccessful in trying to build the runtime on the Pi, so that I could do some change-and-rebuild style debugging (but this mainly because my network connection is a bit slow here, and somebody found that hardcoded, short timeouts for package downloads are a good idea...)

Any further advice on how to debug this problem would be greatly appreciated. In the end, it's probably something stupid like an unexpected value of an environment variable or so. But I'm meanwhile quite sure that it is something that will show up for others as well, once 5.0 ships.

@sdmaclea
Copy link

If I were debugging this given a working and non-working version, I would look at strace. Specifically at files that were opened or not found. If I remember correctly it is strace -e trace=file <command to execute>. I would be looking for files which are not found which would indicate a missing dependency.

I am on vacation today, but I can look tomorrow. I think we have a few Pi 3's running Raspberry PI, one probably is running linux-arm.

/cc @dotnet/dotnet-diag

@pgrawehr
Copy link
Contributor Author

Can't see any obvious problem yet, but I'll have another look later. I've attached the two trace files: preview4.txt runs my app against preview 4 (works fine - I've aborted the program right after it starts) and preview8.txt contains the exactly same binary run against preview 8 (which fails to start). Note that it is not a forward-compatibility problem. If I rebuild the app against preview 8, this doesn't change anything (except of course, that I can't run it against preview 4, which is expected).

preview4.txt
preview8.txt

@pgrawehr
Copy link
Contributor Author

@sdmaclea After a lot of playing around and looong builds, I was finally able to build a debug version of the CLR ( I'll eventually post a PR for improving the documentation). That now showed that the problem is the removal of the winmd support, which is used here. There's a ticket to update that #1103 , but I don't exactly understand what that means yet.

Two questions in this context:

  1. Is it expected that existing applications suddenly fail to start, just because they reference.winmd files? Note that they're not even being used in this scenario, as we're executing on Linux.
  2. If Support is removed from the CLR, why does building the library work without any problem?

Cc: @joperezr

@sdmaclea
Copy link

@pgrawehr Thanks for digging to root cause on this. This sounds like the current design issue with the loading rules for the app model. My recollection is that all assemblies mentioned in the manifest must be present to start the app. With these winmd files being removed, perhaps there could be an exception (especially since these cannot be used on linux...)

/cc @vitek-karas

@vitek-karas
Copy link
Member

Host (dotnet.exe, hostfxr, hostpolicy, dealing with .deps.json, finding all assemblies):

  • If there would be a file listed in .deps.json which is not on disk, the app will fail to start - but with a different error.
  • You can debug this with host tracing: set COREHOST_TRACE=1 and COREHOST_TRACEFILE=host.txt and it will write detailed log into the specified text file - among other things it will tell you exactly where it tries to load files from.

Runtime (coreclr.dll, the source of this error):

  • WinRT support has been removed in .NET 5 - so this is likely that it's trying to load or deal with the .winmd and it now fails as it doesn't know what to do with it.
  • As far as I can tell it is expected that WinRT related code will simply not work anymore.
    @jkoritzinsky would know the exact details.

SDK (how the app gets built):

  • I too would expect the SDK to fail to build the app if it contained .winmd files - not sure why it doesn't. @jkoritzinsky should know more as well.

@jkoritzinsky
Copy link
Member

Yes, the runtime fails to load winmd files now, which is expected, although the experience can be improved: dotnet/sdk#12087.

The SDK has a target to block referencing WinMD files, but there might be some holes: dotnet/sdk#13233 (there's a number of discussions going on offline on how to best handle this)

@sdmaclea
Copy link

@jkoritzinsky I wonder if this is more problematic than we expect. I would expect 3.1 apps should just run w/o recompilation on 5.0 if it is the only runtime installed. Sounds like that is no longer true for a class of apps. How big a class of apps is this?

@jkoritzinsky
Copy link
Member

Any apps that reference WinMD files (transitive included) won't run on 5.0. There's not a lot of them, and most every usage that I know of (including dotnet/iot) has a plan to move off them for 5.0.

@vitek-karas
Copy link
Member

Also 3.1 app built with default settings won't run on 5.0 - roll forward won't allow that. The developer has to explicitly opt into major version roll forward (or rebuild the app as 5.0).

@pgrawehr
Copy link
Contributor Author

pgrawehr commented Sep 1, 2020

The primary error happens in applicationcontext.cpp. Due to the change there, any reference to winmd files (direct or transitive) will be considered invalid and the mentioned error happens. This is independent on whether the file actually exists or not (in my test scenario it actually was there), since the error happens before even trying to access the file.

@pgrawehr
Copy link
Contributor Author

@joperezr Can you give me a hint on how these .winmd references need to be replaced? What alternative approach shall be used?

@joperezr
Copy link
Member

Sure, I have the changes ready since they are very simple but we are currently blocked on commiting them( check #1091 for more info). Basically the change we need to do is to removed the winmd references from the project as well as the dependency to package System.Runtime.WindowsRuntime, and instead add a package dependency to Microsoft.Windows.SDK.NET. That’s it, that should make it work, except that we can’t make that change yet as there is a bug on that SDK package because it has assemblies that are not strong-named signed, and because our assembly is then we won’t be able to compile. I have pinged the team that manages that package so they are aware we are broken and will work on fixing it so that we are unblocked.

@joperezr
Copy link
Member

Forgot to close this one too but it was also fixed by #1217

@ghost ghost locked as resolved and limited conversation to collaborators Nov 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants