Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbgshim fails to decode libcoreclr.so headers from Snap install #3510

Closed
gregg-miskelly opened this issue Nov 16, 2022 · 3 comments
Closed
Assignees
Labels
bug Something isn't working
Milestone

Comments

@gregg-miskelly
Copy link

Description

dbgshim fails to read the build id from a libcoreclr.so installed from Snap. This is a regression from the version of dbgshim that shipped with .NET6.

Repro steps

  1. Install the .NET SDK or .NET Runtime via Snap on Linux. There are probably some other technologies that might trigger this problem as well.
  2. Create a hello world project
  3. Try to debug using VS Code or Visual Studio

Result

libdbgshim.so will attempt to obtain the build id from libcoreclr.so with the following stack. This will return false.

libdbgshim.so!ElfReader::GetBuildId(ElfReader * this, BYTE * buffer, ULONG bufferSize, PULONG pBuildSize) Line 538
	at \__w\1\s\src\shared\dbgutil\elfreader.cpp(538)
libdbgshim.so!TryGetBuildIdFromFile(const WCHAR * modulePath, BYTE * buffer, ULONG bufferSize, PULONG pBuildSize) Line 151
	at \__w\1\s\src\shared\dbgutil\elfreader.cpp(151)
libdbgshim.so!GetTargetCLRMetrics(LPCWSTR wszModulePath, CLR_ENGINE_METRICS * pEngineMetricsOut, ClrInfo * pClrInfoOut, DWORD * pdwRVAContinueStartupEvent) Line 1297
	at \__w\1\s\src\dbgshim\dbgshim.cpp(1297)
libdbgshim.so!RuntimeStartupHelper::InvokeStartupCallback(RuntimeStartupHelper * this, const char * pszModulePath, HMODULE hModule) Line 354
	at \__w\1\s\src\dbgshim\dbgshim.cpp(354)
libdbgshim.so!PAL_RuntimeStartupHelper::InvokeStartupCallback(PAL_RuntimeStartupHelper * this) Line 1504
	at \__w\1\s\src\shared\pal\src\thread\process.cpp(1504)
libdbgshim.so!PAL_RuntimeStartupHelper::StartupHelperThread(PAL_RuntimeStartupHelper * this) Line 1559
	at \__w\1\s\src\shared\pal\src\thread\process.cpp(1559)
libdbgshim.so!StartupHelperThread(LPVOID p) Line 1579
	at \__w\1\s\src\shared\pal\src\thread\process.cpp(1579)
libdbgshim.so!CorUnix::CPalThread::ThreadEntry(void * pvParam) Line 739
	at \__w\1\s\src\shared\pal\src\thread\thread.cpp(739)
libpthread.so.0!start_thread(void * arg) Line 477
	at \build\glibc-SzIz7B\glibc-2.31\nptl\pthread_create.c(477)
libc.so.6!clone() Line 95
	at \build\glibc-SzIz7B\glibc-2.31\sysdeps\unix\sysv\linux\x86_64\clone.S(95)

There is also a second error handling bug that this bug shows off, which is that if GetTargetCLRMetrics fails, this code should be doing a goto exit instead of return false:

// From dbgshim.cpp, line 345
            hr = GetTargetCLRMetrics(clrInfo.RuntimeModulePath, NULL, &clrInfo, NULL);
            if (FAILED(hr))
            { 
                return false;
            }

At least on my machine, the runtime path is /snap/dotnet-sdk/187/shared/Microsoft.NETCore.App/6.0.11/libcoreclr.so which is correct. Using objdump -h /snap/dotnet-sdk/187/shared/Microsoft.NETCore.App/6.0.11/libcoreclr.so I can see that there is a build id:

 35 .note.gnu.build-id 00000024  000000000073b358  000000000073b358  00707358  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
@mikem8361
Copy link
Member

Are you sure the module is libcoreclr.so that get the build id is failing? dbgshim loops through all the modules (until InvokeStartupCallback returns true) in the process. The libcoreclr.so installed by snap has a valid build id like you said. The difference you pointed out isn't significant because the dbgshim elf reader uses the program header NOTE to find the build id instead of the section headers and the snap runtime has the proper program header NOTE entry for it. Why TryGetBuildIdFromFile is failing needs more investigation.

The GetTargetCLRMetrics failure path you pointed out is correct (should return false). A GetTargetCLRMetrics failure normally means it isn't a runtime module (not libcoreclr.so or the single-file module) and false needs to be returned by InvokeStartupCallback to continue to the next module. There is one subtle thing that needs to be fixed is when TryGetBuildIdFromFile fails GetTargetCLRMetric it should not return failure but an invalid/uninitialized ClrInfo struct that will cause the call to ProvideLibraries to fail and the callback to be invoked with the error.

How do I get VSCode to use the snap installed SDK/runtime (at /snap/dotnet-sdk/187) instead of the normally installed ones (in /usr/share/dotnet)?

mikem8361 added a commit to mikem8361/diagnostics that referenced this issue Nov 29, 2022
@gregg-miskelly
Copy link
Author

Are you sure the module is libcoreclr.so that get the build id is failing?

Yes

The GetTargetCLRMetrics failure path you pointed out is correct (should return false).

The result of this is that the coreclr is signaling the event, and the target process will not continue, but because we failed to find a clr, there is no error sent to the debugger.

How do I get VSCode to use the snap installed SDK/runtime (at /snap/dotnet-sdk/187) instead of the normally installed ones (in /usr/share/dotnet)?

I setup a new VM that installed the .NET runtime only via snap. I am assuming one could also remove the dotnet symlink and accomplish the same thing.

@mikem8361
Copy link
Member

I tracked the problem down to the snap installed runtimes (built I think by Canonical) having a trashed NOTE program header. It almost looks like they run some post-processing tool on the binary that adds some more NOTEs, but they don't update the program header entry. There are two ways to get to the build id NOTE: one through the program headers and the other through the segment headers. The ELF reader that dbgshim uses only uses the program headers because the segment headers are not valid when inspecting an in memory module remotely. The ELF reader can read files directly (live dbgshim sessions) or read ELF modules from memory (createdump does this).

The fix for live sessions is to add the code to read the build id via the segment headers for the file case (TryGetBuildIdFromFile) but this doesn't fix the dump case via OpenVirtualProcess. It will fail without calling the library provider. I'm not sure what to do about that yet.

One thing I was thinking about was to change to always calling the new library provider (3) even if there isn't any "index" info (size/timestamp on Windows, build id on Linux/MacOS). It currently returns CORDBG_E_INCOMPATIBLE_PROTOCOL. This would leave it up to VS to decide how to get the DAC/DBI if we can't get this info. For live sessions, VS would load the DBI/DAC next to the runtime. For dumps via OpenVirtualProcess, I'm not sure this helps.

mikem8361 added a commit to mikem8361/diagnostics that referenced this issue Nov 30, 2022
Issue: dotnet#3510

Add GetBuildIdFromSectionHeader that used when getting the build id note
from the program headers failures.
mikem8361 added a commit that referenced this issue Dec 1, 2022
…3530)

Issue: #3510

Add GetBuildIdFromSectionHeader that used when getting the build id note
from the program headers failures.
@ghost ghost locked as resolved and limited conversation to collaborators Jun 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants