Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Rename tiers and config options, add config option JitTier0ForLoops #23597

Closed
wants to merge 9 commits into from

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Mar 30, 2019

Depends on #23599

  • Renamed tier 0 / tier 1 to StartupTier, OptimizedTier
  • Added config option QuickJitForLoops, which determines whether quick JIT, when enabled, may be used for methods that contain loops. Off by default, so after this change, StartupTierQuickJit=1 or ForceQuickJit=1 would still not use quick JIT for methods that contain loops by default.

@kouvel kouvel added this to the 3.0 milestone Mar 30, 2019
@kouvel kouvel self-assigned this Mar 30, 2019
@kouvel kouvel requested a review from noahfalk March 30, 2019 15:18
@kouvel
Copy link
Member Author

kouvel commented Mar 30, 2019

CC @jkotas @richlander

src/jit/error.cpp Outdated Show resolved Hide resolved
src/jit/error.cpp Outdated Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented Mar 30, 2019

I would split this into two PRs:

  • Mush have changes in defaults, etc. for preview4
  • QuickJitForLoops for separate PR. It does not need to be in preview4.

@kouvel
Copy link
Member Author

kouvel commented Mar 30, 2019

QuickJitForLoops for separate PR. It does not need to be in preview4.

QuickJit is disabled by default but if someone wanted to enable it in the startup tier for perf, I would still like the default behavior to be to disable QuickJit for methods that contain loops to avoid exposing people to that issue. Enabling QuickJit for loops would be a more advanced/risky thing to do. So I'm thinking about including the loop detection and switching from tier 0 to tier 1 in preview 4 in some fashion.

@jkotas
Copy link
Member

jkotas commented Mar 30, 2019

I agree that the loop detection is useful. I do not think that the implementation in this PR is good because of it depends on exception handling for control flow.

My suggestion to split this was to reduce risk for P4: Getting the right defaults in place is high priority. Improving the heuristics for optional QuickJit is lower priority.

@kouvel
Copy link
Member Author

kouvel commented Mar 30, 2019

Ok sounds good

@kouvel kouvel changed the title Disable tier 0 JIT (quick JIT) by default, config option and tier renames Rename tiers and config options, add config option QuickJitForLoops Mar 30, 2019
@kouvel
Copy link
Member Author

kouvel commented Mar 30, 2019

I have separated the minimal changes into #23599 for preview 4, and rebased the changes here on top of that commit, so ignore the first commit in this PR.

@kouvel
Copy link
Member Author

kouvel commented Mar 31, 2019

Here are some perf results for startup time. I have excluded steady-state perf results because they don't have any significant diffs, and I have excluded cases with the cold-methods-with-hot-loops issue that are fully resolved by this change and #23599.

Perf results:

  • Tier = default mode after this change
  • Tier+QJ = quick JIT enabled for non-R2R'ed methods except for methods that contain loops
  • Tier+QJ+QJL = default mode before this change - quick JIT enabled for non-R2R'ed methods including for methods that contain loops
  • Diffs are against the NoTier column
  • Lower is better

JitBench startup

  NoTier Tier Diff Tier+QJ Diff Tier+QJ+QJL Diff
DotNetBuildHelloWorld 1332.14 1387.14 4.13% 1337.67 0.41% 1290.33 -3.14%
CscHelloWorld 609.00 632.57 3.87% 571.00 -6.24% 500.00 -17.90%
CscRoslynSource 2316.29 2461.33 6.26% 2397.00 3.48% 2353.80 1.62%
EmptyConsoleProgram 57.86 57.57 -0.49% 56.14 -2.96% 56.67 -2.06%
MusicStoreStartup 597.57 615.60 3.02% 583.29 -2.39% 551.71 -7.67%
MusicStoreFirstRequest 513.29 534.20 4.07% 452.00 -11.94% 406.00 -20.90%

ASP.NET startup

  NoTier Tier Diff Tier+QJ Diff Tier+QJ+QJL Diff
DbFortunesEf Windows 1783.16 1833.70 2.83% 1547.62 -13.21% 1282.57 -28.07%
DbFortunesEf Linux 1774.85 1823.57 2.75% 1525.10 -14.07% 1253.31 -29.39%
DbFortunesRaw Windows 1064.10 1074.16 0.95% 972.62 -8.60% 839.72 -21.09%
DbFortunesRaw Linux 1100.03 1102.17 0.19% 994.59 -9.59% 828.14 -24.72%
Json Windows 565.31 586.11 3.68% 539.14 -4.63% 516.31 -8.67%
Json Linux 518.77 525.45 1.29% 471.86 -9.04% 417.72 -19.48%
MvcJson Windows 735.69 758.05 3.04% 676.88 -7.99% 654.59 -11.02%
MvcJson Linux 671.86 691.51 2.92% 614.09 -8.60% 547.66 -18.49%
MvcPlaintext Windows 656.82 666.21 1.43% 620.26 -5.57% 581.36 -11.49%
MvcPlaintext Linux 611.49 627.18 2.57% 558.31 -8.70% 506.87 -17.11%
Plaintext Windows 487.44 501.95 2.98% 457.81 -6.08% 421.02 -13.63%
Plaintext Linux 427.12 435.44 1.95% 383.82 -10.14% 348.57 -18.39%

JitBench startup - R2R disabled

  NoTier Tier Diff Tier+QJ Diff Tier+QJ+QJL Diff
DotNetBuildHelloWorld 3547.86 3573.43 0.72% 3050.71 -14.01% 2568.17 -27.61%
CscHelloWorld 2194.86 2194.57 -0.01% 1779.00 -18.95% 1212.17 -44.77%
CscRoslynSource 4001.14 3998.43 -0.07% 3890.50 -2.77% 3645.57 -8.89%
EmptyConsoleProgram 101.57 102.14 0.56% 91.29 -10.13% 84.57 -16.74%
MusicStoreStartup 2071.57 2067.71 -0.19% 1654.86 -20.12% 1268.33 -38.77%
MusicStoreFirstRequest 1738.29 1730.71 -0.44% 1285.86 -26.03% 977.14 -43.79%

ASP.NET startup - R2R disabled

  NoTier Tier Diff Tier+QJ Diff Tier+QJ+QJL Diff
DbFortunesEf Windows 2827.39 2811.59 -0.56% 2319.55 -17.96% 1808.65 -36.03%
DbFortunesEf Linux 2694.68 2734.69 1.48% 2256.13 -16.27% 1763.44 -34.56%
DbFortunesRaw Windows 1971.53 1955.90 -0.79% 1627.06 -17.47% 1282.32 -34.96%
DbFortunesRaw Linux 1914.01 1886.68 -1.43% 1630.11 -14.83% 1259.21 -34.21%
Json Windows 1309.95 1297.10 -0.98% 1091.96 -16.64% 865.64 -33.92%
Json Linux 1194.95 1199.75 0.40% 1000.41 -16.28% 797.74 -33.24%
MvcJson Windows 1878.55 1885.21 0.35% 1550.97 -17.44% 1240.70 -33.95%
MvcJson Linux 1740.76 1745.34 0.26% 1468.83 -15.62% 1149.11 -33.99%
MvcPlaintext Windows 1676.32 1669.72 -0.39% 1395.98 -16.72% 1077.02 -35.75%
MvcPlaintext Linux 1567.92 1565.73 -0.14% 1329.23 -15.22% 1031.59 -34.21%
Plaintext Windows 1100.61 1102.61 0.18% 914.07 -16.95% 739.77 -32.79%
Plaintext Linux 1017.82 1020.50 0.26% 855.75 -15.92% 683.04 -32.89%

@kouvel
Copy link
Member Author

kouvel commented Mar 31, 2019

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

@AndyAyersMS
Copy link
Member

Aside from the preview4 / subsequent split, I would like to see the changes here that are not part of #23599 be split into a pure renaming part and then a policy/logic changes.

In the meantime:

  • How do we avoid jitting a method twice at tier1, if the jit decides to switch codegen from tier0 to tier1? Seems like the jit is feeding information back to the runtime, but I don't see where it gets used.
  • If we have this feedback mechanism, seems like we could use it more broadly to communicate when rejitting is not likely to be beneficial.
  • If we are going to rename, seems like we should rename things everywhere. But my preference would be to stick with Tier0 and Tier1, at least internally. I can see perhaps the new names might be useful for the user-visible config settings.
  • For loop detection, why not just put the logic in fgMarkBackwardJump ?

@kouvel
Copy link
Member Author

kouvel commented Apr 1, 2019

Adding a set of tables that compare perf after this change directly against before this change and to tiering disabled. Also added some steady-state perf numbers. Lower is better for all diffs.


JitBench startup perf - time (ms)

  NoTier Before After Diff from before Diff from NoTier
DotNetBuildHelloWorld 1332.14 1290.33 1387.14 7.50% 4.13%
CscHelloWorld 609.00 500.00 632.57 26.51% 3.87%
CscRoslynSource 2316.29 2353.80 2461.33 4.57% 6.26%
EmptyConsoleProgram 57.86 56.67 57.57 1.60% -0.49%
MusicStoreStartup 597.57 551.71 615.60 11.58% 3.02%
MusicStoreFirstRequest 513.29 406.00 534.20 31.58% 4.07%

ASP.NET startup perf - time (ms), server start + first request

  NoTier Before After Diff from before Diff from NoTier
DbFortunesEf Windows 1783.16 1282.57 1833.70 42.97% 2.83%
DbFortunesEf Linux 1774.85 1253.31 1823.57 45.50% 2.75%
DbFortunesRaw Windows 1064.10 839.72 1074.16 27.92% 0.95%
DbFortunesRaw Linux 1100.03 828.14 1102.17 33.09% 0.19%
Json Windows 565.31 516.31 586.11 13.52% 3.68%
Json Linux 518.77 417.72 525.45 25.79% 1.29%
MvcJson Windows 735.69 654.59 758.05 15.81% 3.04%
MvcJson Linux 671.86 547.66 691.51 26.27% 2.92%
MvcPlaintext Windows 656.82 581.36 666.21 14.60% 1.43%
MvcPlaintext Linux 611.49 506.87 627.18 23.74% 2.57%
Plaintext Windows 487.44 421.02 501.95 19.22% 2.98%
Plaintext Linux 427.12 348.57 435.44 24.92% 1.95%

JitBench startup perf - R2R disabled

  NoTier Before After Diff from before Diff from NoTier
DotNetBuildHelloWorld 3547.86 2568.17 3573.43 39.14% 0.72%
CscHelloWorld 2194.86 1212.17 2194.57 81.05% -0.01%
CscRoslynSource 4001.14 3645.57 3998.43 9.68% -0.07%
EmptyConsoleProgram 101.57 84.57 102.14 20.78% 0.56%
MusicStoreStartup 2071.57 1268.33 2067.71 63.03% -0.19%
MusicStoreFirstRequest 1738.29 977.14 1730.71 77.12% -0.44%

ASP.NET startup perf - R2R disabled - time (ms), server start + first request

  NoTier Before After Diff from before Diff from NoTier
DbFortunesEf Windows 2827.39 1808.65 2811.59 55.45% -0.56%
DbFortunesEf Linux 2694.68 1763.44 2734.69 55.08% 1.48%
DbFortunesRaw Windows 1971.53 1282.32 1955.90 52.53% -0.79%
DbFortunesRaw Linux 1914.01 1259.21 1886.68 49.83% -1.43%
Json Windows 1309.95 865.64 1297.10 49.84% -0.98%
Json Linux 1194.95 797.74 1199.75 50.39% 0.40%
MvcJson Windows 1878.55 1240.70 1885.21 51.95% 0.35%
MvcJson Linux 1740.76 1149.11 1745.34 51.89% 0.26%
MvcPlaintext Windows 1676.32 1077.02 1669.72 55.03% -0.39%
MvcPlaintext Linux 1567.92 1031.59 1565.73 51.78% -0.14%
Plaintext Windows 1100.61 739.77 1102.61 49.05% 0.18%
Plaintext Linux 1017.82 683.04 1020.50 49.41% 0.26%

JitBench steady-state perf - time (ms)

  NoTier Before After Diff from before Diff from NoTier
MusicStoreMedianResponse 2.58 1.87 1.82 -2.37% -29.44%
Word2VecTraining 32511.14 32503.14 32676.43 0.53% 0.51%
Word2VecMedianSearch 21.47 21.59 21.63 0.21% 0.76%

ASP.NET steady-state perf - time (ns) per request

  NoTier NoTier,NoR2r Before After Diff from before Diff from NoTier Diff from NoTier,NoR2r
DbFortunesEf Linux 14965.92 12743.28 12650.50 12865.87 1.70% -14.03% 0.96%
DbFortunesRaw Linux 10653.14 8670.81 8693.86 8793.95 1.15% -17.45% 1.42%
Json Linux 3600.04 2662.96 2639.31 2619.86 -0.74% -27.23% -1.62%
Json https Linux 5788.02 4394.60 4340.89 4338.63 -0.05% -25.04% -1.27%
MvcJson Linux 8867.86 5927.52 5788.18 5843.80 0.96% -34.10% -1.41%
MvcPlaintext Linux 3071.05 1677.61 1636.26 1630.24 -0.37% -46.92% -2.82%
Plaintext Linux 756.31 509.73 504.92 499.42 -1.09% -33.97% -2.02%
Plaintext https Linux 1607.07 1119.20 1092.08 1112.21 1.84% -30.79% -0.62%
PlaintextNonPipelined Linux 2804.53 2101.86 2089.88 2093.27 0.16% -25.36% -0.41%
PlaintextNonPipelined https Linux 4922.10 3943.04 3916.68 3960.39 1.12% -19.54% 0.44%

@kouvel
Copy link
Member Author

kouvel commented Apr 1, 2019

I would like to see the changes here that are not part of #23599 be split into a pure renaming part and then a policy/logic changes.

Done, the second commit now has only the renames, and the commit titled "Functional changes" has the first set of functional changes.

How do we avoid jitting a method twice at tier1, if the jit decides to switch codegen from tier0 to tier1? Seems like the jit is feeding information back to the runtime, but I don't see where it gets used.

The call to setMethodAttribs here:
https://github.com/kouvel/coreclr/blob/9bcfcded882ccc37113c77b49ebb6b42655631dd/src/jit/flowgraph.cpp#L4276-L4277

Ends up here and call counting for the startup tier is disabled for the method:
https://github.com/kouvel/coreclr/blob/9bcfcded882ccc37113c77b49ebb6b42655631dd/src/vm/jitinterface.cpp#L6910-L6913

So the method will not move up a tier. For the default NativeCodeVersion the code above would also cause it to be treated as tier 1 code. I was missing an update of non-default NativeCodeVersion, fixed in the last commit.

If we have this feedback mechanism, seems like we could use it more broadly to communicate when rejitting is not likely to be beneficial.

I think I will move the QuickJitForLoops flag to be a JIT config flag. Since we're not exposing it as a config option from the host, I think it would fit better for it to be a JIT config flag.

Yes in the future the JIT may decide to turn on opts at at point, maybe if the code would be much larger or if missed optimizations would be too expensive. If the tier 0 code is good enough and it would not be beneficial to rejit then it can inform the VM in the same way and the VM will treat it as tier 1 code.

If we are going to rename, seems like we should rename things everywhere. But my preference would be to stick with Tier0 and Tier1, at least internally. I can see perhaps the new names might be useful for the user-visible config settings.

I was thinking if we add a new tier between tier 0 and tier 1 in the future that would be a mess. Giving them a name that reflects what that tier does I think would be good for code readability as well. I skipped renaming the JIT-side flags for now, as I'm not sure if it would be preferred to call them tier 0 and tier 1. The VM-side tiers don't have to necessarily match to JIT-side tiers (or flags). For example there could be a "CallCountingTier" in the future where the entry point would be a stub that counts calls before going to the code entry point (just one possible implementation, could be done in other ways).

For loop detection, why not just put the logic in fgMarkBackwardJump ?

That function is also called for recursive functions, so it's saying that a recursive function has a loop. I was intending on excluding those because they will build up call counts during the recursion to be promoted to the optimized tier.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Apr 1, 2019

I was thinking if we add a new tier between tier 0 and tier 1 in the future that would be a mess.

I had similar thoughts -- at least for me the right analog in naming (at least internally) is the optimization level settings used by C/C++ compilers, generally a numeric scale where larger numbers imply more optimization & slower compilation. Instrumentation (call or block counts) I think would be a somewhat separate concern. Generally systems tend to instrument less optimized code, but it is not a hard and fast rule.

For loop detection, why not just put the logic in fgMarkBackwardJump ?

That function is also called for recursive functions ... I was intending on excluding those... they will build up call counts during the recursion

Perhaps -- but we might also blow the stack in the meantime. So I would be in favor of considering explicitly tail recursive methods as being methods with loops. And similar concerns for methods with explicit tail call sites.

I think one of the things we're seeing with the issues coming up with Tier0 -- at least from the jit standpoint -- is that we don't have as good a handle as we'd like on what codegen features end up being "requirements" in release, and when we vary from what Tier1 does, we end up with surprises.

If we're lucky most of the "required" things will actually be fairly cheap to implement; certainly the box-related opts and devirt fall into that category, as well as tail call opts.

@kouvel
Copy link
Member Author

kouvel commented Apr 1, 2019

So I would be in favor of considering explicitly tail recursive methods as being methods with loops. And similar concerns for methods with explicit tail call sites.

Yea good point, will fix. Should fgMarkBackwardJump continue to be called for all recursive methods, such that it also covers implicit tail calls? I'm supposing that I should check for explicit tail calls specifically for this heuristic and not affect the backward jump flag on the block.

@kouvel
Copy link
Member Author

kouvel commented Apr 1, 2019

Rebased on top of #23599, this PR begins from the commit titled "Rename tiers and config options, add config option QuickJitForLoops"

@kouvel
Copy link
Member Author

kouvel commented Apr 1, 2019

I had similar thoughts -- at least for me the right analog in naming (at least internally) is the optimization level settings used by C/C++ compilers, generally a numeric scale where larger numbers imply more optimization & slower compilation. Instrumentation (call or block counts) I think would be a somewhat separate concern. Generally systems tend to instrument less optimized code, but it is not a hard and fast rule.

I would like to separate the concept of "optimization level" from "tier". Perhaps achieving that would involve some more renames, something along the lines of:

  • Rename JIT-side tier 0 / tier 1 flags to say OptimizationLevel instead of Tier
  • Rename OptimizationTier enumeration to Tier or something like that

I think that will give some flexibility. For example, if we were to introduce instrumentation, that could be a separate tier InstrumentedTier that may choose to instrument code with OptimizationLevel0 or OptimizationLevel1. [Edit] Or maybe that could map to a separate JIT flag and the JIT would choose which optimization level to instrument.

What do you think?

@AndyAyersMS
Copy link
Member

Should fgMarkBackwardJump continue to be called for all recursive methods, such that it also covers implicit tail calls? I'm supposing that I should check for explicit tail calls specifically for this heuristic and not affect the backward jump flag on the block

The jit needs to call fgMarkBackwardJump from the importer as it does now, as some jit behavior is likely dependent on this. We probably don't need to call it unless we're at a tail call site, but let's handle that as a separate issue.

This recursive call detection in the importer comes too late to switch opt levels, as aspects of importer behavior depend on minopts (eg use of the box temp, possibly more). So if you want to detect this early you will need something customized.

I would like to separate the concept of "optimization level" from "tier". Perhaps achieving that would involve some more renames

I think instrumentation should be a flag (as it is now for IBC).

My preference is generally for the runtime to specify intent and the jit to map that to behavior, so that on the jit side we have some level of control over the combinatorics possible in codegen. Whether that is called a tier or an opt level is less critical, but we should probably at the same time revamp the other "intent" style jit flags like CORJIT_FLAG_SIZE_OPT.

@kouvel
Copy link
Member Author

kouvel commented Apr 2, 2019

I think instrumentation should be a flag (as it is now for IBC).

My preference is generally for the runtime to specify intent and the jit to map that to behavior, so that on the jit side we have some level of control over the combinatorics possible in codegen. Whether that is called a tier or an opt level is less critical, but we should probably at the same time revamp the other "intent" style jit flags like CORJIT_FLAG_SIZE_OPT.

Ok I feel that your opinion does not necessarily disagree with my proposal. I agree that there needs to be some sort of mapping between VM concepts and JIT concepts that make sense. My motivation was mostly to separate those two things such that the VM can conceptualize things in a way that make sense there without relevance to JIT concepts. A component may do the mapping from VM concepts to JIT concepts, and the mapping from a VM concept could simply be an intent that the JIT would remap to specific behaviors. Does that align with your thinking?

@AndyAyersMS
Copy link
Member

I think so...? Once we have a concrete proposal we can see if we actually agree or not.

@kouvel
Copy link
Member Author

kouvel commented Apr 2, 2019

I'll take a stab at it, others feel free to weigh in

@noahfalk
Copy link
Member

noahfalk commented Apr 2, 2019

I was thinking if we add a new tier between tier 0 and tier 1 in the future that would be a mess. Giving them a name that reflects what that tier does I think would be good for code readability as well.

I'm fine with rename, but if the goal is future proofing I'm both accepting and expecting that future changes to the functionality will probably result in more name changes. For example if we introduced an interpreter in the future the names would probably churn again because in some places StartupTier refers to 'the 1st tier' and in other places it refers to 'the tier that has low opt jitted code'.

I would like to separate the concept of "optimization level" from "tier".

I'm not sure what tiers would be referring to if they didn't refer to optimization levels? I do think there is value in separating instrumentation from optimization, but I would just label code with 2 variables: JitCompile(JITFLAG_INSTRUMENTATION_X | JITFLAG_OPTIMIZATION_TIER_Y).

For example, if we were to introduce instrumentation, that could be a separate tier InstrumentedTier

I think as tiering matures we are likely to want a variety of instrumented options. For example in my tier2 experiment I had three that I treated distinctly (T0+low instrumentation, T1+low instrumentation, T1+high instrumentation). We could define a combo of instrumentation+optimization level as a tier and give each one a unique name, but personally I found it easier to program against the underlying two variables without trying to abstract it further.

I think instrumentation should be a flag (as it is now for IBC).

👍

@@ -832,7 +832,7 @@ enum CorInfoFlag
CORINFO_FLG_INTRINSIC = 0x00400000, // This method MAY have an intrinsic ID
CORINFO_FLG_CONSTRUCTOR = 0x00800000, // This method is an instance or type initializer
CORINFO_FLG_AGGRESSIVE_OPT = 0x01000000, // The method may contain hot code and should be aggressively optimized if possible
// CORINFO_FLG_UNUSED = 0x02000000,
CORINFO_FLG_ALLOW_TIER0_TO_TIER1 = 0x02000000, // Indicates that for a tier 0 compilation request, the JIT may choose to switch to tier 1 if appropriate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of this slightly differently - that the runtime made a request for an indeterminate optimization level and the JIT resolved it to a specific optimization level based on some criteria. Although arguably this is just semantics, my concern about describing it this way is that new developers are likely to take flags/constants that say Tier0 at face value without necessarily noticing that there is a special case that converts 0 into 1. Using alternate terminology such as TIER0_OR_1 or UNRESOLVED_TIER (and not specifying TIER0) feels much harder to misunderstand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requested intent from the VM seems to be "JIT quickly". The JIT may choose to do so or not for various reasons. For example if the method only has a small loop and a large amount of code outside it may choose to optimize the loop and to not optimize code outside. In that case the VM would still have to consider that as unoptimized code. Or if the JIT finds that the tier 0 code would be too slow because of too many high-impact optimizations being missed, it may choose to generate fully optimized code, in which case the VM would consider that as optimized code. Maybe something like "PREFER_TIER0" would be clearer intent, or something more generic like "JIT_QUICKLY".

I think I'll leave the JIT config flags as they are for now. For this and the preview 4 PR I would like to start with some decent names for the config options. It may not be a big deal though if we have to rename them again.

@kouvel
Copy link
Member Author

kouvel commented Apr 2, 2019

Yea it could be done in different ways:

  1. StartupTier may refer to the whole startup portion, including interpreter, quick JIT, and call counting for each. InstrumentedTier could have various modes of instrumentation / optimization. Optimized tier may be profile-optimized or not.
  2. Have separate tier for each phase, such as InterpretedTier, QuickJittedTier, several variants of InstrumentedTier, several variants of OptimizedTier

(1) may be easier to work with but kind of makes the tier irrelevant. Configuring the whole combination would be a bit challenging, for example the call counting threshold for interpreter and quick JIT to transition to the next tier probably would not be the same.

(2) makes it easy to tie config options to the tier, such as call counting threshold, options for configuring modes of interpreter, quick JIT, or instrumentation in that tier. Tiers would also map to code versions so it would be easy to describe a code version with the tier name.

I'm leaning towards (2) at the moment, with that perhaps I would rename StartupTier to QuickJittedTier and that would keep those config options more stable in the future. Thoughts?

@kouvel
Copy link
Member Author

kouvel commented Apr 3, 2019

In Ubuntu x64 Checked CoreFX Tests it seems to be running an old version of the ThrowExceptionForHR_ErrorInfo_ReturnsValidException test, the issue appears to have been fixed by dotnet/corefx#34968 a while back.

…T for loops

- Renamed tier 0 / tier 1 to StartupTier, OptimizedTier
- Added config option JitTier0ForLoops, which determines whether quick JIT, when enabled, may be used for methods that contain loops. Off by default, so after this change, QuickJit=1 would still not use quick JIT for methods that contain loops by default.
@kouvel kouvel changed the title Rename tiers and config options, add config option QuickJitForLoops Rename tiers and config options, add config option JitTier0ForLoops Apr 9, 2019
@kouvel
Copy link
Member Author

kouvel commented Apr 9, 2019

It looks like a tier 0 compilation is respecting explicit tail calls at least in some cases. I tried the following with QuickJit enabled:

IL:

  .method private hidebysig static int32 
          CalculateSumWithTailCall(int32 n,
                                   [opt] int32 sum) cil managed
  {
    .param [2] = int32(0x00000000)
    // Code size       27 (0x1b)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  ldc.i4.0
    IL_0002:  bgt.s      IL_0006

    IL_0004:  ldc.i4.0
    IL_0005:  ret

    IL_0006:  ldarg.1
    IL_0007:  ldarg.0
    IL_0008:  add
    IL_0009:  starg.s    sum
    IL_000b:  ldarg.0
    IL_000c:  ldc.i4.1
    IL_000d:  bne.un.s   IL_0011

    IL_000f:  ldarg.1
    IL_0010:  ret

    IL_0011:  ldarg.0
    IL_0012:  ldc.i4.1
    IL_0013:  sub
    IL_0014:  ldarg.1
    IL_0015:  tail. call int32 ExplicitTailCallNoSO::CalculateSumWithTailCall(int32,
                                                                              int32)
    IL_001a:  ret
  } // end of method ExplicitTailCallNoSO::CalculateSumWithTailCall

Effectively equivalent to the following except with an explicit tail call:

    private static int CalculateSumWithTailCall(int n, int sum = 0)
    {
        if (n <= 0)
            return 0;
        sum += n;
        if (n == 1)
            return sum;
        return CalculateSumWithTailCall(n - 1, sum);
    }
; Assembly listing for method ExplicitTailCallNoSO:CalculateSumWithTailCall(int,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; compiler->opts.MinOpts() is true
; rbp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )     int  ->  [rbp+0x10]
;  V01 arg1         [V01    ] (  1,  1   )     int  ->  [rbp+0x18]
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V03 tmp1         [V03    ] (  1,  1   )     int  ->  [rbp-0x04]   "arg temp"
;
; Lcl frame size = 16

G_M30597_IG01:
       55                   push     rbp
       4883EC10             sub      rsp, 16
       488D6C2410           lea      rbp, [rsp+10H]
       894D10               mov      dword ptr [rbp+10H], ecx
       895518               mov      dword ptr [rbp+18H], edx

G_M30597_IG02:
       837D1000             cmp      dword ptr [rbp+10H], 0
       7F08                 jg       SHORT G_M30597_IG04
       33C0                 xor      eax, eax

G_M30597_IG03:
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

G_M30597_IG04:
       8B4518               mov      eax, dword ptr [rbp+18H]
       034510               add      eax, dword ptr [rbp+10H]
       894518               mov      dword ptr [rbp+18H], eax
       837D1001             cmp      dword ptr [rbp+10H], 1
       7509                 jne      SHORT G_M30597_IG06
       8B4518               mov      eax, dword ptr [rbp+18H]

G_M30597_IG05:
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

G_M30597_IG06:
       8B4510               mov      eax, dword ptr [rbp+10H]
       FFC8                 dec      eax
       8945FC               mov      dword ptr [rbp-04H], eax
       8B45FC               mov      eax, dword ptr [rbp-04H]
       894510               mov      dword ptr [rbp+10H], eax
       EBCA                 jmp      SHORT G_M30597_IG02

Are there explicit tail call cases where a tier 0 compilation would overflow the stack and a tier 1 compilation would not?

@kouvel
Copy link
Member Author

kouvel commented Apr 9, 2019

Although methods like above should be treated as though they contain a loop

@noahfalk
Copy link
Member

noahfalk commented Apr 10, 2019

I'm leaning towards (2) at the moment, with that perhaps I would rename StartupTier to QuickJittedTier and that would keep those config options more stable in the future. Thoughts?

Fine by me. To whatever limited degree I can predict the future, I agree this tier name sounds less likely than others to need a future change.

@kouvel
Copy link
Member Author

kouvel commented Apr 11, 2019

RE QuickJittedTier, using that would probably also mean adding a separate tier PrecompiledTier such that precompiled code would not be in the QuickJittedTier. I think it would be useful anyway in PerfView to be able to distinguish time spent in precompiled vs quick-jitted code, although there may be other ways to get that info. There may be other benefits to distinguish those. I figure if for a method the precompiled code is good enough to consider fully optimized, it would instead be in the OptimizedTier.

@kouvel
Copy link
Member Author

kouvel commented Apr 12, 2019

Closing for now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants