Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XUnitLogChecker: Unable to find dumps in Linux #96818

Closed
carlossanlop opened this issue Jan 10, 2024 · 23 comments
Closed

XUnitLogChecker: Unable to find dumps in Linux #96818

carlossanlop opened this issue Jan 10, 2024 · 23 comments
Assignees
Milestone

Comments

@carlossanlop
Copy link
Member

I created a test PR to see the status of XUnitLogChecker. #96806

I discovered that XUnitLogChecker is unable to find dumps in the expected directory in Unix. Did anything change recently? This was working in my last PR.

Examples:

@ivdiazsa @hoyosjs @JulieLeeMSFT

@ghost
Copy link

ghost commented Jan 10, 2024

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

I created a test PR to see the status of XUnitLogChecker. #96806

I discovered that XUnitLogChecker is unable to find dumps in the expected directory in Unix. Did anything change recently? This was working in my last PR.

Examples:

@ivdiazsa @hoyosjs @JulieLeeMSFT

Author: carlossanlop
Assignees: -
Labels:

area-Infrastructure

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 10, 2024
@ericstj
Copy link
Member

ericstj commented Jan 11, 2024

@carlossanlop are you setting DOTNET variables to trigger dump creation? I think we were supposed to set DOTNET_DbgMiniDumpName and maybe some others. cc @hoyosjs

@carlossanlop
Copy link
Member Author

Yes, they're being set:

export DOTNET_DbgEnableMiniDump=1
export DOTNET_EnableCrashReport=1
export DOTNET_DbgMiniDumpName=$HELIX_DUMP_FOLDER/coredump.%d.dmp

@ericstj
Copy link
Member

ericstj commented Jan 11, 2024

I see - so given the docs of those variables they should produce a dump, but they didn't. Looks like all those logs have this:

[createdump] Gathering state for process 56 dotnet
[createdump] Crashing thread 0047 signal 11 (000b)
[createdump] Writing crash report to file /home/helixbot/dotnetbuild/dumps/coredump.56.dmp.crashreport.json
[createdump] Crash report successfully written
waitpid() returned successfully (wstatus 0000008b) WEXITSTATUS 0 WTERMSIG b

@hoyosjs wouldn't that indicate a problem with CoreCLR's dump writing logic? Looks like it's only writing .crashreport.json and not the .dmp

@ericstj
Copy link
Member

ericstj commented Jan 12, 2024

Have a look at https://dev.azure.com/dnceng-public/public/_build/results?buildId=522611&view=ms.vss-test-web.build-test-results-tab&runId=12281022&paneView=debug&resultId=207997
There are dumps captured and uploaded - they were probably configured as system dumps. @hoyosjs - could that be what's going on? Some system setting is taking precedence over the environment variables?

@hoyosjs
Copy link
Member

hoyosjs commented Jan 12, 2024

The environment variable configured dumps happen prior to process teardown. That means their dump, which is declared successful, happens before system dump.

@ericstj
Copy link
Member

ericstj commented Jan 12, 2024

But there was no dump created -- it only created a report. Could it be that createdump has a bug here?

// Gather all the useful memory regions from the DAC
if (!crashInfo->EnumerateMemoryRegionsWithDAC(options.DumpType))
{
goto exit;
}

Or do you think someone set

- `DOTNET_EnableCrashReportOnly`: In .NET 7.0 or greater, same as DOTNET_EnableCrashReport except the core dump is not generated.
?

I couldn't find any evidence of the latter - so this really looks like a problem with createdump.

@ghost
Copy link

ghost commented Jan 12, 2024

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

I created a test PR to see the status of XUnitLogChecker. #96806

I discovered that XUnitLogChecker is unable to find dumps in the expected directory in Unix. Did anything change recently? This was working in my last PR.

Examples:

@ivdiazsa @hoyosjs @JulieLeeMSFT

Author: carlossanlop
Assignees: -
Labels:

area-Diagnostics-coreclr, untriaged

Milestone: -

@ivdiazsa ivdiazsa added this to the 9.0.0 milestone Jan 12, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 12, 2024
@tommcdon
Copy link
Member

But there was no dump created -- it only created a report. Could it be that createdump has a bug here?

@mikem8361

@ericstj
Copy link
Member

ericstj commented Jan 17, 2024

Maybe the test scripts could clear the DOTNET_EnableCrashReportOnly variable just to rule that out. I didn't see anyone setting this in any dotnet repo - but I can't say I trust my search to catch every bit of code that runs on the helix machines.

@mikem8361
Copy link
Member

Is there anyway to turn on createdump logging with DOTNET_CreateDumpVerboseDiagnostics? That should tell us a lot more about what is going on.

@ericstj
Copy link
Member

ericstj commented Jan 18, 2024

I think those can just be set in https://github.com/carlossanlop/runtime/blob/TestXULC/eng/testing/RunnerTemplate.sh#L125-L127 and it should give info.

I went ahead and gave this a try by making the edit directly in @carlossanlop's branch. carlossanlop@8088cb3

We can observe how that impacts the PR validation: #96806

@mikem8361
Copy link
Member

It looks like after the createdump logging was enabled, the dumps in the various runs I looked at were successful. The logging did expose some sign extension bugs on arm32 which I'm fixing, but the rest of the platforms looked ok.

@ericstj
Copy link
Member

ericstj commented Jan 31, 2024

I went ahead and removed the DOTNET_CreateDumpVerboseDiagnostics setting. We can see if the problem returns. If it does then perhaps we'll need to look into other ways to understand why the dumps aren't created.

You can look for the results of this PR to see the state: #96806

@mikem8361
Copy link
Member

I've made some fixes to createdump for arm32 too that are in.

@tommcdon
Copy link
Member

tommcdon commented Feb 6, 2024

Hi @ericstj, just checking in - please let us know if there is any further action for the diagnostics team or if this issue can be closed.

@tommcdon tommcdon added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 6, 2024
@ghost
Copy link

ghost commented Feb 6, 2024

This issue has been marked needs-author-action and may be missing some important information.

@ericstj
Copy link
Member

ericstj commented Feb 7, 2024

Tagging @carlossanlop to help as well.

You should be able to look at the linked PR to see if its producing dumps or not. That's effectively testing createdump on the CI machines. I just had a look and it seems to me it's not producing the dumps:
https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-96806-merge-a9ff7e2f92b842e080/System.IO.Compression.Brotli.Tests/1/console.8faf5d93.log?helixlogtype=result

As you can see, createdump is doing something:

[createdump] Gathering state for process 25502 dotnet
[createdump] Crashing thread 63ae signal 11 (000b)
./RunTests.sh: line 180: 25502 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Compression.Brotli.Tests.runtimeconfig.json --depsfile System.IO.Compression.Brotli.Tests.deps.json xunit.console.dll System.IO.Compression.Brotli.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

But still no .dmp file produced.

No dumps found in /home/helixbot/dotnetbuild/dumps.

I think the next steps might be to iterate in that PR to test that createdump will work correctly in the CI machines.

@ericstj ericstj removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 7, 2024
@ericstj
Copy link
Member

ericstj commented Feb 7, 2024

I wonder if the system dump being produced interferes with createdump, and enabling more verbose logging slows down createdump enough so that the system dump process no longer interferes? Just guessing as I don't know why createdump might not produce a dump at all even when running.

@JulieLeeMSFT
Copy link
Member

JulieLeeMSFT commented Feb 7, 2024

@carlossanlop, can you link the example PR that shows it does not create dump?
Eric already shared the PR above.

@ericstj
Copy link
Member

ericstj commented Feb 7, 2024

Yeah, it's the same PR as before. Check the linux coreCLR legs (the above is from an x64 one).

What I do to check is examine all failing tests, then look for the .dmp and the console log for the stack.
https://dev.azure.com/dnceng-public/public/_build/results?buildId=548264&view=ms.vss-test-web.build-test-results-tab

@mikem8361
Copy link
Member

It looks like the test PR is successfully creating dumps. Juan rebased it on the latest which had a remote stack unwinder fix.

@ericstj
Copy link
Member

ericstj commented Feb 8, 2024

Confirmed, https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-96806-merge-48852c8dc9bb4f22bc/System.IO.Compression.Brotli.Tests/1/console.47cae588.log?helixlogtype=result

@mikem8361 @hoyosjs - could one of you link the PR that fixed this? Are you putting tests in place to make sure it doesn't regress again?

@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants