Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCStress works slower with R2R on tests with deep recursion. #11947

Open
sandreenko opened this issue Feb 1, 2019 · 6 comments
Open

GCStress works slower with R2R on tests with deep recursion. #11947

sandreenko opened this issue Feb 1, 2019 · 6 comments

Comments

@sandreenko
Copy link
Contributor

sandreenko commented Feb 1, 2019

That issue came from #11252, there we hit timeout when run baseservices\exceptions\regressions\V1\SEH\VJ\RecursiveException with GCStress=0xf and RunCrossGen=1. The issue doesn't repro when you do not run CrossGen.

The small repro is to run this test with GCStress=0x2 and RunCrossGen=1, on my machine it takes ~3 mins, the run without RunCrossGen=1 takes 2 secs.

a small repro:

using System;
using System.Threading;
using System.IO;

public class RecursiveException {
  public static int Main(String [] args) {
    int retVal = 100;
    RecursiveException re = new RecursiveException();
    re.recurse(0);
    return retVal;
  }
  
  public void recurse(int counter) {  
    if (counter == 5000) // Change it to 10000 and it will fail with stack overflow without crossgen, with crossgen it will pass.
    {
      Console.WriteLine("Finish recurse " + GC.CollectionCount(0) + ", " + GC.CollectionCount(1) + ", " + GC.CollectionCount(2));
    }
    else
      recurse(++counter);
  }
}


run with GCStress=0x2, RunCrossGen=1, TieredCompilation=0:
Finish recurse 3077, 3083, 3083 and It takes ~70 seconds.

run GCStress=0x2, RunCrossGen=0, TieredCompilation=0:
Finish recurse 3110, 3116, 3116 and it takes ~2 seconds.

So with the same number of GC collections each takes much longer when we use CrossGen.
Xperf shows strange results (for example for both runs it shows 0 GC cpu time:

•Total CPU Time: 3,282 msec
•Total GC CPU Time: 0 msec
•Total CPU Time: 95,892 msec
•Total GC CPU Time: 0 msec

functions that are in top of exclusively time are:

coreclr!EEContract::DoChecks coreclr!BaseContract::DoChecks
coreclr!BaseContract::DoChecks

that are expensive functions but it looks like we call them more often when use RunCrossGen=1 and it is not clear why.

All steps are for checked x64, release generates different code and numbers there are different.
I was not able to repro this issue without GCStress (with manual calls to GC.Collect()).

I think it is a R2R or GCStress issue, @sergiy-k PTAL.

cc @dotnet/jit-contrib, @Maoni0, @davidwengier .

sandreenko referenced this issue in sandreenko/coreclr Feb 1, 2019
@sandreenko
Copy link
Contributor Author

Actually maybe it is a CodeGen issue if we use TailCall in one case and don't in the other. Let me check.

@sandreenko
Copy link
Contributor Author

Yes, the difference is that we do not do tail call optimization for crossgen.

I think it is by design that bool ZapInfo::canTailCall returns false for if (IsReadyToRunCompilation()) and the issue won't be fixed?

sandreenko referenced this issue in dotnet/coreclr Feb 1, 2019
@sergiy-k
Copy link
Contributor

@janvorli has been working on a new design for tail calls. I don't think that this feature will be implemented for .NET Core 3.0.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@AaronRobinsonMSFT
Copy link
Member

This still takes a long time even with CrossGen2. Resetting the issue state to be triaged again.

@AaronRobinsonMSFT AaronRobinsonMSFT added the untriaged New issue has not been triaged by the area owner label Apr 3, 2021
@AaronRobinsonMSFT AaronRobinsonMSFT removed this from the Future milestone Apr 3, 2021
@AndyAyersMS
Copy link
Member

#35423 tracks R2R support for tail calls.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2021
@mangod9 mangod9 added this to the 7.0.0 milestone Jul 6, 2021
@jakobbotsch
Copy link
Member

Do we only run gcstress for x64/arm64 on this test? It's unclear to me why it wouldn't hit the same slowness on x86/arm32 that do not support fast tailcalls, regardless of crossgen.

FWIW we do support fast tailcalls with crossgen2 now after #56669, so this issue is probably fixed. But would like to understand why we only see the slowness with crossgen if this is due to fast tailcalls.

@mangod9 mangod9 modified the milestones: 7.0.0, Future Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants