Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase JIT compiler throughput #6857

Open
pkukol opened this issue Oct 20, 2016 · 2 comments
Open

Increase JIT compiler throughput #6857

pkukol opened this issue Oct 20, 2016 · 2 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitThroughput CLR JIT issues regarding speed of JIT itself Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Milestone

Comments

@pkukol
Copy link
Contributor

pkukol commented Oct 20, 2016

The following is a list of areas being considered for throughput improvements in the near future; if anyone wants to help with these just add a note here, if we have lots of volunteers we can establish a simple tracking system. Note that the stuff gathered in this issue addresses the specific goal of speeding up IL -> machine code conversion in RyuJIT - IOW, things like better heuristics to decide when to use / not use MinOpts, finding more ways to avoid compilation at runtime (via crossgen or whatever) are not covered here.

  1. Importing IL (typically takes 25% of overall JIT time):
    1. Cache results of calling through ICorJitInfo, such as IL token resolution:
      1. Option 1 - do this only for large methods, no caching across compilations.
      2. Option 2 - cache things "globally", across methods; requires retention policy.
    2. Cache the internal format of carefully chosen methods that are frequently inlined:
      1. This only helps "normal opts" unless we extend MinOpts to do "fast" inlining.
      2. Needs retention policy so memory (and any overhead for serializing the IR) is not wasted.
      3. Closely tied to the inliner so this should probably be integrated into the inline policy / etc logic.
    3. Recognize a tiny subset of trivial IL body patterns; for matched methods:
      1. Spit out "canned" IR to bypass the "full" import logic.
      2. Mark "trivial" bodies (or parts of bodies?) and add simplified processing downstream for such bodies (e.g. no jumps, no stores to locals, or whatever).
    4. Find the most frequent paths through the most expensive JIT -> EE calls, and try to speed them up:
      1. For some calls the JIT may not need all the information the EE is currently returning -> add shortcuts and/or subset versions?
      2. Batch / overlap / defer EE calls:
        1. This has ramifications for class loading order / etc, so feasibility is an open question.
        2. When the importer asks about tokens / methods / fields it doesn't need all the info immediately; split the relevant (expensive) EE calls into two parts, the first - hopefully much faster than the whole - would only return the minimal info the importer must have right away, the rest could come on a separate thread / via an asynch callback or something like that.
        3. If we do any "quick look at IL" processing (e.g. to do some of the stuff above for "trivial" methods) we could fire off calls to ask about the tokens we encounter.
  2. Speed up genGenerateCode, i.e. the far back end (takes about 10-20% of total time):
    1. Speed up GC info gathering and encoding:
      1. GC encoder: avoid sorting when possible, speed up bit-twiddling, etc.
      2. Instruction encoding: try to speed up the most frequent emitXxxxx methods, emitter::emitGCregLiveUpd()
      3. Speed up GCInfo::gcMakeRegPtrTable() and related logic.
      4. Other things, such as the scope tracking stuff (CodeGen::siBeginBlock etc).
      5. Note: probably ignore CodeGen::genCodeForTreeNode() even though it's up to 5% - way too many little pieces.
  3. Speed up LSRA (around 20% of total compile time) and make it consume a lot less memory:
    1. Punting this entirely to the RA specialists.
  4. Slim down Lowering::DoPhase (currently up to 10% of JIT time):
    1. Completely bypass fgInterBlockLocalVarLiveness() - up to 2% of total time.
    2. Avoid doing lvaSortByRefCount() for very small numbers of variables (0.5% of total time).
    3. Other things? Returns probably diminish rapidly.
  5. Speed up Morph (around 8% of total compile time):
    1. Spend less time spent recursively walking trees for MinOpts.
  6. Global improvements (few percent):
    1. Shrink GenTree nodes.
    2. Speed up tree walks.
    3. Allocate memory in larger chunks.
  7. Skip more things for MinOpts (few percent):
    1. Skip (parts of) liveness analysis (also see above)?
    2. Bypass some tree optimizations (just do the simplest / easiest thing).
    3. Skip "ordering" passes (evalOrder/blockOrder) for trivial or reducible CFG's or some such.
    4. Short-cut things like lvaSortByRefCount() for small numbers of variables, etc.

category:throughput
theme:throughput
skill-level:expert
cost:extra-large

@pkukol pkukol self-assigned this Oct 20, 2016
@mazong1123
Copy link
Contributor

Count me in. I'd like to help on this.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@TIHan TIHan modified the milestones: Future, 9.0.0 Nov 1, 2023
@TIHan TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Nov 1, 2023
@TIHan
Copy link
Contributor

TIHan commented Nov 1, 2023

It's worth looking at each item in the list considering there may be possible throughput wins.

@JulieLeeMSFT JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label Apr 9, 2024
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 9.0.0, Future Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitThroughput CLR JIT issues regarding speed of JIT itself Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants