Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

L2 · 2021-09-07T22:46:08Z

When running the following benchmark, it will either complete in 25 seconds or 60 seconds. From about 30 iterations of the benchmark, ~40% of the time it'll take 60 seconds to finish.

vary: coreclr
test_executables:
  defgcperfsim: C:\foo\performance\artifacts\bin\GCPerfSim\release\netcoreapp5.0\GCPerfSim.dll
coreclrs:
  a:
    core_root: C:\foo\net6\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root
options:
  default_iteration_count: 10
  default_max_seconds: 300
  collect: thread_times
common_config:
  complus_gcserver: false
  complus_gcconcurrent: false
benchmarks:
  2gb_pinning:
    arguments:
      tc: 1
      tagb: 50.0
      tlgb: 2
      lohar: 0
      pohar: 0
      sohsr: 1000-1000
      lohsr: 152400-152400
      pohsr: 1000-1000
      sohsi: 50
      lohsi: 0
      pohsi: 0
      sohpi: 50
      lohpi: 0
      sohfi: 0
      lohfi: 0
      pohfi: 0
      allocType: reference
      testKind: time
scores:
  speed:
    FirstToLastGCSeconds:
      weight: 1
    PauseDurationMSec_95P:
      weight: 1

(note that the values for sohsr, lohsr, and pohsr are set to a single value, but kept as a range to make the script work)

From Perfview's gc stats, it looks like at around gc collection # 3470, we pin about 15k objects.

In the iterations that take 25 seconds, after this 15k pinned object event, we continue doing gen0 and gen1 collections, with no more events pinning 15k objects.
However, in the 60 second iterations, after this initial 15k pinned object event, we keep doing only gen1 or gen2 collects and the number of pinned objects always rebounds to 15k or more before every gen2 collect.

Thanks

The text was updated successfully, but these errors were encountered:

danmoseley · 2021-09-08T03:18:10Z

@cshung I believe this is yours

cshung · 2021-09-08T03:31:12Z

Thank you @danmoseley.
@L2, before we look into the details, is the observed bimodal behavior a problem for you, or you are just curious?

L2 · 2021-09-08T18:56:05Z

Thanks @danmoseley and @cshung

Yes, the main concern is the usability of the gc benchmark results if this type of fluctuation is inherently present. It introduces a lot of noise and makes it difficult to produce valid comparisons.

For this example in particular, it looks like the gc gets stuck in some state where it always needs to do gen1 and gen2 collects after the initial 15k pinned object event, so I'm also curious as to why this happens intermittently between separate benchmark runs using the exact same config.

On your end, are these gc benchmarks the main tool used to validate perf when making changes to the gc? In general do you experience any significant noise in the gc benchmark results and if so what would be the best way to reduce this noise (specifically on windows x64)?

Thanks

L2 · 2021-09-27T16:39:28Z

@cshung Any luck getting this to reproduce on your end?

Thanks

cshung · 2021-09-27T21:50:59Z

@L2, my apologies for not responding promptly. I was focusing on investigating a stress crash bug for the last few weeks. I have not nailed it down yet.

Since that might crash users' applications and therefore I gave that bug a higher priority.

I will get back to this issue as soon as I can.

cshung · 2021-09-28T17:59:45Z

I ran the benchmark 100 times and here is a histogram of total seconds taken to run the benchmark:

x-axis is total seconds taken, y-axis is the number of runs.

So I conclude I cannot reproduce this locally.

That being said, GC performance is likely to be correlated with the machine-specific parameters (e.g. processor speed, amount of memory, amount of cache available).

How did you come up with this particular set of parameters?
Why did this particular set of parameters matter?
Can you share some traces?

Without knowing exactly what happened, it is difficult for us to investigate. At a very high level, we are suspecting that you are hitting right at the level of some performance tuning heuristic in the GC, so depending on luck, this falls into one of the two possible branches, and then it stuck there.

Maoni0 · 2021-09-29T19:07:27Z

a good way to go about this is if @L2 doesn't mind, they could send us the GCCollectOnly traces for when it demonstrates the bimodal behavior. the GCCollectOnly traces contain no PII so hopefully that's not a problem.

cshung mentioned this issue Nov 10, 2021

Avoid keep triggering gen 2 GCs because of low ephemeral space dotnet/runtime#61428

Closed

L2 mentioned this issue Jan 5, 2022

[WIP] Avoiding Bimodal GC behavior dotnet/runtime#63408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

L2 commented Sep 7, 2021 •

edited

Loading

danmoseley commented Sep 8, 2021

cshung commented Sep 8, 2021 •

edited

Loading

L2 commented Sep 8, 2021

L2 commented Sep 27, 2021

cshung commented Sep 27, 2021 •

edited

Loading

cshung commented Sep 28, 2021

Maoni0 commented Sep 29, 2021

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

Comments

L2 commented Sep 7, 2021 • edited Loading

danmoseley commented Sep 8, 2021

cshung commented Sep 8, 2021 • edited Loading

L2 commented Sep 8, 2021

L2 commented Sep 27, 2021

cshung commented Sep 27, 2021 • edited Loading

cshung commented Sep 28, 2021

Maoni0 commented Sep 29, 2021

L2 commented Sep 7, 2021 •

edited

Loading

cshung commented Sep 8, 2021 •

edited

Loading

cshung commented Sep 27, 2021 •

edited

Loading