Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GCStress timeouts in JIT/jit64 #85040

Merged
merged 13 commits into from
May 18, 2023
Merged

Fix GCStress timeouts in JIT/jit64 #85040

merged 13 commits into from
May 18, 2023

Conversation

markples
Copy link
Member

@markples markples commented Apr 19, 2023

This includes several changes that seem to help with the timeouts. It might be overkill but seems like a good direction as this has been broken for a while.

  • Change the test wrapper logic to only put one test in a TestExecutor so that the callstacks are much simpler.
  • Factor the test wrapper logic into some helpers to simplify the main method. I also tried to make the "Full" and "XHarness" code generation very similar but didn't try to factor/unify them.
  • Mark several tests as RequiresProcessIsolation so that their gcstress is kept separate from the rest of the tests. Disables a large test under gcstress.
  • Add gcstress striping to some merged groups.

Should fix #85590

@ghost ghost assigned markples Apr 19, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 19, 2023
@ghost
Copy link

ghost commented Apr 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: markples
Assignees: markples
Labels:

area-CodeGen-coreclr

Milestone: -

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

PTAL @kunalspathak (and this should help with the weekend gcstress failure)

kunalspathak
kunalspathak previously approved these changes Apr 19, 2023
@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples markples changed the title Add GCStress striping to jit64_do Fix GCStress timeouts in JIT/jit64 Apr 20, 2023
@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

@trylek @davidwrighton We've been having gcstress timeouts occur every time we add merged test groups. The behavior has indicated some degradation over time within a gcstress process (probably the original motivation for striping). However, we've also seen individual tests take much longer, even when first or early in a merged test group run. My new theory is that the extra stack frames have a prohibitively high cost (and like it's just the test executor methods with the N try/catch blocks).

The current iteration of this PR is (overly) aggressive at simplifying the stack. It also still marks several tests as RequiresProcessIsolation as leftover from my initial experiments. Before I go further, I was hoping to get some feedback on the area. My thought is to just go to one test per TestExecutor (and therefore simplify the logic there), make XHarnessTestRunner match it for consistency, and keep the RPIs in order to get gcstress testing unblocked. They can be removed in the future, though this is low priority since individual tests don't hurt test throughput too much.

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

fyi - I'm now looking at using BuildAsStandalone in gcstress builds to completely avoid merged test groups for now. See #85284 though it will probably take a few rounds for me to get the yaml right.

@trylek
Copy link
Member

trylek commented May 4, 2023

@markples - Do you think we might be able to reduce some of these costs by emitting calls to the individual test entrypoints through helper methods so that each such helper method would have just the one try-catch block?

@markples
Copy link
Member Author

markples commented May 4, 2023

@trylek This PR currently does that (it was easy by setting the grouping value to 1). I think that it helped but still hit a problem (though it's been long enough that I don't remember the details), which is why I had shelved this and was trying the BuildAsStandalone thing. However, that has hit an issue that (at least) one of the HardwareIntrinsics projects is big enough to time out (test merging can stripe -within- a project since it is dealing with individual tests).

@markples
Copy link
Member Author

markples commented May 5, 2023

fyi - this is close but I'm waiting for test results

@markples
Copy link
Member Author

@trylek I propose that we move forward with these fixes for now. They might be overkill, and we might change things again in the future, but this gets jit64 gcstress under control and lets us move forward. A few JIT\Regression legs are still slow but working.

(also resetting @kunalspathak 's review since much has changed since then)

@markples markples dismissed kunalspathak’s stale review May 11, 2023 23:47

The code has changed a lot

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks Mark!

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks Mark!

@markples
Copy link
Member Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

MemorySsa is failing elsewhere.

@markples
Copy link
Member Author

running gcstress yet again because my other change restructued the groups

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

/azp run runtime, runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

Previous test run might have passed.. but the devops machine flaked out. JIT/jit64 and JIT/opt appear to be ok.

@markples
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markples
Copy link
Member Author

Some of the gc stress legs are still quite slow, suggesting more striping would be desirable. Hopefully this current run is sufficient to unblock testing and that striping can be handled separately, but osx arm64 continues to be stubborn with this.

@markples
Copy link
Member Author

Build analysis is showing a failure from a previous run of runtime-coreclr gcstress0x3-gcstress0xc. (perhaps of interest to @JulieLeeMSFT @trylek @ivdiazsa ?)

Failure was in https://dev.azure.com/dnceng-public/public/_build/results?buildId=277134&view=results
Current run is https://dev.azure.com/dnceng-public/public/_build/results?buildId=277895&view=results

@markples markples merged commit fb0b206 into dotnet:main May 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Failure - jit64_do.0.1
3 participants