Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

Open
L2 opened this issue Sep 7, 2021 · 7 comments
Open

Bimodal behavior with GC Workstation 2GB Pinning Benchmark #1992

L2 opened this issue Sep 7, 2021 · 7 comments

Comments

@L2
Copy link
Contributor

L2 commented Sep 7, 2021

When running the following benchmark, it will either complete in 25 seconds or 60 seconds. From about 30 iterations of the benchmark, ~40% of the time it'll take 60 seconds to finish.

vary: coreclr
test_executables:
  defgcperfsim: C:\foo\performance\artifacts\bin\GCPerfSim\release\netcoreapp5.0\GCPerfSim.dll
coreclrs:
  a:
    core_root: C:\foo\net6\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root
options:
  default_iteration_count: 10
  default_max_seconds: 300
  collect: thread_times
common_config:
  complus_gcserver: false
  complus_gcconcurrent: false
benchmarks:
  2gb_pinning:
    arguments:
      tc: 1
      tagb: 50.0
      tlgb: 2
      lohar: 0
      pohar: 0
      sohsr: 1000-1000
      lohsr: 152400-152400
      pohsr: 1000-1000
      sohsi: 50
      lohsi: 0
      pohsi: 0
      sohpi: 50
      lohpi: 0
      sohfi: 0
      lohfi: 0
      pohfi: 0
      allocType: reference
      testKind: time
scores:
  speed:
    FirstToLastGCSeconds:
      weight: 1
    PauseDurationMSec_95P:
      weight: 1

(note that the values for sohsr, lohsr, and pohsr are set to a single value, but kept as a range to make the script work)

From Perfview's gc stats, it looks like at around gc collection # 3470, we pin about 15k objects.

  • In the iterations that take 25 seconds, after this 15k pinned object event, we continue doing gen0 and gen1 collections, with no more events pinning 15k objects.
  • However, in the 60 second iterations, after this initial 15k pinned object event, we keep doing only gen1 or gen2 collects and the number of pinned objects always rebounds to 15k or more before every gen2 collect.

Thanks

@danmoseley
Copy link
Member

@cshung I believe this is yours

@cshung
Copy link
Member

cshung commented Sep 8, 2021

Thank you @danmoseley.
@L2, before we look into the details, is the observed bimodal behavior a problem for you, or you are just curious?

@L2
Copy link
Contributor Author

L2 commented Sep 8, 2021

Thanks @danmoseley and @cshung

Yes, the main concern is the usability of the gc benchmark results if this type of fluctuation is inherently present. It introduces a lot of noise and makes it difficult to produce valid comparisons.

For this example in particular, it looks like the gc gets stuck in some state where it always needs to do gen1 and gen2 collects after the initial 15k pinned object event, so I'm also curious as to why this happens intermittently between separate benchmark runs using the exact same config.

On your end, are these gc benchmarks the main tool used to validate perf when making changes to the gc? In general do you experience any significant noise in the gc benchmark results and if so what would be the best way to reduce this noise (specifically on windows x64)?

Thanks

@L2
Copy link
Contributor Author

L2 commented Sep 27, 2021

@cshung Any luck getting this to reproduce on your end?

Thanks

@cshung
Copy link
Member

cshung commented Sep 27, 2021

@L2, my apologies for not responding promptly. I was focusing on investigating a stress crash bug for the last few weeks. I have not nailed it down yet.

Since that might crash users' applications and therefore I gave that bug a higher priority.

I will get back to this issue as soon as I can.

@cshung
Copy link
Member

cshung commented Sep 28, 2021

I ran the benchmark 100 times and here is a histogram of total seconds taken to run the benchmark:

image

x-axis is total seconds taken, y-axis is the number of runs.

So I conclude I cannot reproduce this locally.

That being said, GC performance is likely to be correlated with the machine-specific parameters (e.g. processor speed, amount of memory, amount of cache available).

  • How did you come up with this particular set of parameters?
  • Why did this particular set of parameters matter?
  • Can you share some traces?

Without knowing exactly what happened, it is difficult for us to investigate. At a very high level, we are suspecting that you are hitting right at the level of some performance tuning heuristic in the GC, so depending on luck, this falls into one of the two possible branches, and then it stuck there.

@Maoni0
Copy link
Member

Maoni0 commented Sep 29, 2021

a good way to go about this is if @L2 doesn't mind, they could send us the GCCollectOnly traces for when it demonstrates the bimodal behavior. the GCCollectOnly traces contain no PII so hopefully that's not a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants