Pgo phase3 #47558

davidwrighton · 2021-01-28T01:48:28Z

Fix class type probes (the increment of the count wasn't correct)
Use the 64bit integer encoders for all the pgo data
New section for R2R PE Files containing instrumentation data
R2RDump functionality to display all the data embedded in the file
Enable BBOPT for optimized builds in a more unconditional fashion

Future PGO work will include

Move Pgo type handle histogram processing into JIT (which will make type guessing work in crossgen2 as well as in the runtime)
Triggers for controlling Pgo data extraction
Size control for pgo instrumentation data

With this checkin, the feature is functional from a crossgen2.exe point of view, but its not polished, and it cannot be used from the Crossgen2 sdk integration (as the sdk does not have the ability to pass an extra pair of arguments to the compiler. That said, the following script can be used to demonstrate the feature.

The example expects that the following environment variables are set

CORE_ROOT = Pointing to the normal CORE_ROOT
DOTNET_PGO_PATH = Points to the location of the dotnet-pgo binary in the artifacts directory
__TestDotNetCmd = Points at the dotnet.cmd in the root of the enlistment

And that the application is called pgotest.csproj and is in the current directory. Assuming all of that is setup, then the following script will compile an application, run it under instrumentation, feed the results back into crossgen2, and then run the application again, this time with instrumentation enabled.

setlocal

rem Environment variables that will cause code not compiled by default to be compiled with instrumentation enabled, and for them to run for some time with the instrumentation in place.
set COMPLUS_TieredPGO=1
set COMPLUS_TC_QuickJitForLoops=1
set COMPLUS_TC_CallCountThreshold=10000

rem comment this out to skip the pgo driven devirtualization optimization
set COMPLUS_JitClassProfiling=1

rem Remove the comment below to generate instrumentation data for all methods, not just the non-R2R methods, unfortunately, this will also impact dotnet-trace
rem set COMPLUS_ZapDisable=1

dotnet build -p:Configuration=Release
dotnet-trace collect --providers Microsoft-Windows-DotNETRuntime:0x1E000080018:5 -- %CORE_ROOT%\corerun bin\Release\net5.0\pgotest.dll
%DOTNET_PGO_PATH%\dotnet-pgo.exe --trace-file trace.nettrace --output-file-name trace.mibc --uncompressed --pgo-file-type mibc

set COMPLUS_TieredPGO=
set COMPLUS_TC_QuickJitForLoops=
set COMPLUS_TC_CallCountThreshold=
set COMPLUS_JitClassProfiling=
set COMPLUS_ZapDisable=

rem Produce new output binary with crossgen2, passing the --mibc switch and --embed-pgo-data switch
md crossgenoutput
call %__TestDotNetCmd% %CORE_ROOT%\crossgen2\crossgen2.dll --map -O -o crossgenoutput\pgotest.dll -r %CORE_ROOT%\*.dll --mibc trace.mibc --embed-pgo-data bin\Release\net5.0\pgotest.dll

rem Run the application
call %CORE_ROOT%\corerun crossgenoutput\pgotest.dll

Loop implemented but still chasing bugs Handle volatility issues End to end type handle processing working R2R dump support for pgo instrumentation data

- Fix type histogram processing to actually properly handling unknown type handles in the histogram

…edded in the R2R file

MichalStrehovsky · 2021-01-28T10:28:14Z

src/coreclr/vm/readytoruninfo.cpp

@@ -701,6 +701,22 @@ ReadyToRunInfo::ReadyToRunInfo(Module * pModule, PEImageLayout * pLayout, READYT
 m_availableTypesHashtable = NativeHashtable(parser);
 }

+ // For format version 4.1 and later, there is an optional table of instrumentation data


Suggested change

// For format version 4.1 and later, there is an optional table of instrumentation data

// For format version 5.2 and later, there is an optional table of instrumentation data

AndyAyersMS

Looked over the jit interface and pgo manager changes.

This should address #13672, right?

AndyAyersMS · 2021-01-28T20:58:16Z

...ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/InstrumentationDataTableNode.cs

+ definedSymbols: new ISymbolDefinitionNode[] { this });
+ }
+
+ public override int ClassCode => 1887299452;


Is there some process by which these values are chosen?

It is a random number, it simply needs to not match one of the other ClassCode's, and having a bunch of entropy in the number makes a few things faster. This is (poorly) documented where the base ClassCode property is defined.

AndyAyersMS · 2021-01-28T21:11:15Z

src/coreclr/vm/pgo.cpp

 #endif // DACCESS_COMPILE

-HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, SArray<ICorJitInfo::PgoInstrumentationSchema>* pSchema, BYTE**pInstrumentationData)
+HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE** pAllocatedData, ICorJitInfo::PgoInstrumentationSchema** ppSchema, UINT32 *pCountSchemaItems, BYTE**pInstrumentationData)


Should this prefer "live" data over R2R data?

For instance, will R2R assemblies loaded under ZapDisable lookup embedded profile data for their methods even though they may have gone through Tier0?

Umm, sure. I feel that ideally we would merge the data, but I don't really want to write that code right now, especially as it would be a duplication of the managed merger algorithm. Now that I think about it a bit, if we don't prefer in memory data we'll be losing all the potential for tiered pgo improving over static pgo, so... yeah, I'll swap it around.

AndyAyersMS · 2021-01-28T21:16:07Z

src/coreclr/vm/pgo.cpp

 maxCount = h.m_histogram[m].m_count;
 }
 }

- if (maxCount > 0)
+ UINT32 maxKnownLikelihood = (100 * maxKnownCount) / h.m_totalCount;
+ if ((maxKnownCount > 0) && ((maxKnownCount == maxCount) || (maxKnownLikelihood > 33)))


I don't think we should filter results here. The jit can decide if likelihood warrants testing things (currently we have different thresholds for virtual and interface calls, for instance).

Ok. Will do.

davidwrighton · 2021-01-28T23:37:50Z

@AndyAyersMS This won't quite address #13672 as we haven't yet actually enabled this logic in our build systems, but yes, this is much of the technical work to get us there. Currently the system puts a bit too much of the pgo data into the file, causing too much size bloat to be something we can enable by default, and the dotnet-optimization repo isn't yet generating the mibc data, but we're getting there.

davidwrighton · 2021-01-28T23:52:21Z

@AndyAyersMS I just wanted to make sure you were aware of the change to enable the BBOPT mode unconditionally when optimizing. It seemed like a good enough idea to me, but it is a tad scary.

AndyAyersMS · 2021-01-29T01:15:35Z

Currently the system puts a bit too much of the pgo data into the file,

Can you say more here? #46882 (which is a few days out) plus not instrumenting single-block methods should reduce the number of count records by roughly a factor of two. The latter may be problematic if we plan to use PGO for partial prejitting or global importance, but otherwise single-block method counts are uninteresting.

the change to enable the BBOPT mode unconditionally

I think it's fine. The gating factor is the availability of PGO data, so for now this shouldn't have any impact (unless you are also making legacy IBC data available at runtime, but I didn't see any code that would do that).

We will probably want to introduce a jit or runtime mode to suppress BBOPT once PGO data becomes more commonplace.

davidwrighton · 2021-01-29T19:44:57Z

@AndyAyersMS well, the current implementation puts 100% of the data into the file. While the data is actually pretty well compressed, between the data and the bookkeeping to find the appropriate data there is measurable increase in size of the final file that is beyond the current desired level. (I don't have good numbers except for my really small test app where the increase in size is comparable to the size of the generated code. If that holds for larger binaries, we're talking a 30% increase in file size which is utterly unacceptable.). My goal is to get numbers out of our optimization tests, and see how it affects corelib, etc, and see what is to be done.

I have a number of ideas on how to reduce the size impact once it becomes usefully measurable, but there are likely quite a few more tweaks to come. Some ideas include:

A post-processed version of the type histogram that is generated during the merge process. This will be optimized for the actual use case of running getLikelyClass. This will strictly be a size improvement with probably no reduction in meaningful carried data. This could either come in a fairly general purpose form, or we could encode the getLikelyClass behavior directly into the data stream.
Don't carry data for methods which have insignificant amounts of execution in the test traces
Consider dropping data for methods with only a single block. There are a great many of these, and I strongly suspect that the bookkeeping cost for carrying the data is out of proportion to the value of having instrumentation information. As you note, it may be a better idea to not even bother to collect data for single block methods.
Consider holding data in the R2R image at methoddef granularity instead of working with instantiated methods when the results are similar, or use some algorithm to compare the various instantiation data items and conclude they are similar enough.

There isn't an official goal yet for exactly the size we're looking for, but my personal goal is that we shouldn't increase the size of our generated assemblies by more than 1% with this work.

davidwrighton · 2021-01-29T19:45:12Z

@dotnet/crossgen-contrib

AndyAyersMS · 2021-01-29T19:49:47Z

FYI @dotnet/jit-contrib for the "always set BBOPT when optimizing" bit of this change

AndyAyersMS · 2021-01-29T19:55:09Z

My goal is to get numbers out of our optimization tests, and see how it affects corelib, etc, and see what is to be done.

If you mean "perf numbers" be aware there are still a lot of rough edges in the jit.

... we shouldn't increase the size of our generated assemblies by more than 1%

That seems ambitious.

We should split out the impact of the extra profile payload from the impact of the codegen changes it will engender.

Ideally the jit would be smart about producing compact code for things that are unlikely hot paths, and so "buy room" for the profile payload and/or more ambitious codegen on hot paths, but I don't think we're there yet, save for a few special cases.

davidwrighton · 2021-01-29T21:32:08Z

@AnyAyersMS For numbers in this context, I'm more concerned about size numbers than wallclock style perf. And my opinion on size is that I know that we don't actually execute all that much of most of our binaries in the vast majority of applications, so most methods won't have any data at all. That will apply a substantial factor to reduce the size cost of carrying instrumentation data. As I said, I don't want to put down any strong statements about what we'll do until we have some form of numbers, but in the static pgo story (at least for our product shipped images, we have a much smaller size budget to work with than I'd like.)

nattress · 2021-01-29T21:45:45Z

src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunMethod.cs

@@ -236,6 +238,232 @@ private int GetSize()
 }
 }

+ public class PgoInfoKey : IEquatable<PgoInfoKey>


Move these classes to their own file?

nattress · 2021-01-29T22:10:51Z

src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunMethod.cs

+ _pgoInfo = _readyToRunReader.GetPgoInfoByKey(PgoInfoKey.FromReadyToRunMethod(this));
+ if (_pgoInfo == null)
+ {
+ _pgoInfo = new object();


Why do we init to new object() if we have no pgoInfo for a key?

Mmm... my thought was to avoid a bunch of manual null checks, but I'll have to take a look and see if I was thinking straight at the time.

nattress

LGTM thanks David!

davidwrighton added 3 commits January 27, 2021 09:55

Pgo Phase 3

3b93805

Loop implemented but still chasing bugs Handle volatility issues End to end type handle processing working R2R dump support for pgo instrumentation data

- Do all the int compression using the 64bit converter

bad6df6

- Fix type histogram processing to actually properly handling unknown type handles in the histogram

Make --embed-pgo-data actually control whether or not pgo data is emb…

694d8fc

…edded in the R2R file

dotnet-issue-labeler bot added the area-crossgen2-coreclr label Jan 28, 2021

MichalStrehovsky reviewed Jan 28, 2021

View reviewed changes

davidwrighton added 2 commits January 28, 2021 11:12

Fix GCC compile error

868b8e1

Fix comment and address issues with composite images

288bf0f

davidwrighton requested review from AndyAyersMS and nattress January 28, 2021 20:21

AndyAyersMS reviewed Jan 28, 2021

View reviewed changes

Address feedback

58f0ca5

nattress reviewed Jan 29, 2021

View reviewed changes

nattress approved these changes Jan 29, 2021

View reviewed changes

davidwrighton added 3 commits January 29, 2021 17:09

Address code review feedback

61c4c0f

Merge remote-tracking branch 'upstream/master' into pgo_phase3

2e722a5

Fix allocation of pgo data

e5fd474

davidwrighton merged commit afa50f9 into dotnet:master Feb 2, 2021

AndyAyersMS mentioned this pull request Feb 4, 2021

Dynamic PGO #43618

Closed

54 tasks

ghost locked as resolved and limited conversation to collaborators Mar 4, 2021

davidwrighton deleted the pgo_phase3 branch April 20, 2021 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pgo phase3 #47558

Pgo phase3 #47558

davidwrighton commented Jan 28, 2021

MichalStrehovsky Jan 28, 2021

AndyAyersMS left a comment

AndyAyersMS Jan 28, 2021

davidwrighton Jan 28, 2021

AndyAyersMS Jan 28, 2021

davidwrighton Jan 28, 2021

AndyAyersMS Jan 28, 2021

davidwrighton Jan 28, 2021

davidwrighton commented Jan 28, 2021

davidwrighton commented Jan 28, 2021

AndyAyersMS commented Jan 29, 2021 •

edited

Loading

davidwrighton commented Jan 29, 2021

davidwrighton commented Jan 29, 2021

AndyAyersMS commented Jan 29, 2021

AndyAyersMS commented Jan 29, 2021

davidwrighton commented Jan 29, 2021

nattress Jan 29, 2021

nattress Jan 29, 2021

davidwrighton Jan 29, 2021

nattress left a comment

	// For format version 4.1 and later, there is an optional table of instrumentation data
	// For format version 5.2 and later, there is an optional table of instrumentation data

Pgo phase3 #47558

Pgo phase3 #47558

Conversation

davidwrighton commented Jan 28, 2021

Choose a reason for hiding this comment

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwrighton commented Jan 28, 2021

davidwrighton commented Jan 28, 2021

AndyAyersMS commented Jan 29, 2021 • edited Loading

davidwrighton commented Jan 29, 2021

davidwrighton commented Jan 29, 2021

AndyAyersMS commented Jan 29, 2021

AndyAyersMS commented Jan 29, 2021

davidwrighton commented Jan 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nattress left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Jan 29, 2021 •

edited

Loading