Increase JIT compiler throughput #6857

pkukol · 2016-10-20T16:25:31Z

The following is a list of areas being considered for throughput improvements in the near future; if anyone wants to help with these just add a note here, if we have lots of volunteers we can establish a simple tracking system. Note that the stuff gathered in this issue addresses the specific goal of speeding up IL -> machine code conversion in RyuJIT - IOW, things like better heuristics to decide when to use / not use MinOpts, finding more ways to avoid compilation at runtime (via crossgen or whatever) are not covered here.

Importing IL (typically takes 25% of overall JIT time):
1. Cache results of calling through ICorJitInfo, such as IL token resolution:
  1. Option 1 - do this only for large methods, no caching across compilations.
  2. Option 2 - cache things "globally", across methods; requires retention policy.
2. Cache the internal format of carefully chosen methods that are frequently inlined:
  1. This only helps "normal opts" unless we extend MinOpts to do "fast" inlining.
  2. Needs retention policy so memory (and any overhead for serializing the IR) is not wasted.
  3. Closely tied to the inliner so this should probably be integrated into the inline policy / etc logic.
3. Recognize a tiny subset of trivial IL body patterns; for matched methods:
  1. Spit out "canned" IR to bypass the "full" import logic.
  2. Mark "trivial" bodies (or parts of bodies?) and add simplified processing downstream for such bodies (e.g. no jumps, no stores to locals, or whatever).
4. Find the most frequent paths through the most expensive JIT -> EE calls, and try to speed them up:
  1. For some calls the JIT may not need all the information the EE is currently returning -> add shortcuts and/or subset versions?
  2. Batch / overlap / defer EE calls:
    1. This has ramifications for class loading order / etc, so feasibility is an open question.
    2. When the importer asks about tokens / methods / fields it doesn't need all the info immediately; split the relevant (expensive) EE calls into two parts, the first - hopefully much faster than the whole - would only return the minimal info the importer must have right away, the rest could come on a separate thread / via an asynch callback or something like that.
    3. If we do any "quick look at IL" processing (e.g. to do some of the stuff above for "trivial" methods) we could fire off calls to ask about the tokens we encounter.
Speed up genGenerateCode, i.e. the far back end (takes about 10-20% of total time):
1. Speed up GC info gathering and encoding:
  1. GC encoder: avoid sorting when possible, speed up bit-twiddling, etc.
  2. Instruction encoding: try to speed up the most frequent emitXxxxx methods, emitter::emitGCregLiveUpd()
  3. Speed up GCInfo::gcMakeRegPtrTable() and related logic.
  4. Other things, such as the scope tracking stuff (CodeGen::siBeginBlock etc).
  5. Note: probably ignore CodeGen::genCodeForTreeNode() even though it's up to 5% - way too many little pieces.
Speed up LSRA (around 20% of total compile time) and make it consume a lot less memory:
1. Punting this entirely to the RA specialists.
Slim down Lowering::DoPhase (currently up to 10% of JIT time):
1. Completely bypass fgInterBlockLocalVarLiveness() - up to 2% of total time.
2. Avoid doing lvaSortByRefCount() for very small numbers of variables (0.5% of total time).
3. Other things? Returns probably diminish rapidly.
Speed up Morph (around 8% of total compile time):
1. Spend less time spent recursively walking trees for MinOpts.
Global improvements (few percent):
1. Shrink GenTree nodes.
2. Speed up tree walks.
3. Allocate memory in larger chunks.
Skip more things for MinOpts (few percent):
1. Skip (parts of) liveness analysis (also see above)?
2. Bypass some tree optimizations (just do the simplest / easiest thing).
3. Skip "ordering" passes (evalOrder/blockOrder) for trivial or reducible CFG's or some such.
4. Short-cut things like lvaSortByRefCount() for small numbers of variables, etc.

category:throughput
theme:throughput
skill-level:expert
cost:extra-large

mazong1123 · 2016-10-26T07:39:58Z

Count me in. I'd like to help on this.

TIHan · 2023-11-01T21:26:11Z

It's worth looking at each item in the list considering there may be possible throughput wins.

pkukol self-assigned this Oct 20, 2016

RussKeldorph unassigned pkukol Mar 17, 2017

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

TIHan modified the milestones: Future, 9.0.0 Nov 1, 2023

TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Nov 1, 2023

JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label Apr 9, 2024

JulieLeeMSFT modified the milestones: 9.0.0, Future Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase JIT compiler throughput #6857

Increase JIT compiler throughput #6857

pkukol commented Oct 20, 2016

mazong1123 commented Oct 26, 2016

TIHan commented Nov 1, 2023

Increase JIT compiler throughput #6857

Increase JIT compiler throughput #6857

Comments

pkukol commented Oct 20, 2016

mazong1123 commented Oct 26, 2016

TIHan commented Nov 1, 2023