Hardware instruction set support for crossgen2 #33274

davidwrighton · 2020-03-06T05:32:54Z

Add support for the --instruction-set parameter as described in Allow specifying hardware support level for crossgen2 #226 .
- NOTE: As the abi for Vector parameters is not yet stable, support for the --instruction-set parameter is only enabled if --inputbubble is also enabled. Parallel work to stabilize the abi is in progress, but is not complete.
Add concept of baseline instruction set support to R2R file format
- Can be applied at a per method level or at the entire R2R file level
Refactor how support for hardware intrinsics beyond SSE2 support are handled in crossgen2
- Add feature to the JIT to detect which hardware features are actually used
- Tell the JIT unconditionally that SSE42+Lzcnt+Popcnt+Pclmulqdq are supported
- But if support beyond the --instruction-set specified baseline is used, notate the method with a per-method instruction set support fixup.
- This enables usage of many intrinsics in corelib with greater efficiency than today
- This enables usage of SSE42 and below intrinsics safely in non-CoreLib code. Use of higher level intrinsics in non CoreLib code will generate code which does not use the higher level intrinsic, and note that the method's code should not be used in the presence of hardware which does support greater CPU capabilities.
  - In the future a logical enhancement of this work would be to generate multiple bodies of code to handle these more complex cases.
- In combination with the --instruction-set argument, if Avx2 is enabled, then the logic gracefully adds a dependency on Avx2 capability and Vector<T> becomes useable by crossgen'd code.
  - As a tool to control the behavior of crossgen2 and test some of the behavior around hardware intrinsics, add a command line parameter --compile-aggressive-optimization-methods which changes the default R2R compiler suppression of compiling code marked with the MethodImplOption.AggressiveOptimization.

Work items remaining
[x] Finalize on internal JIT api to query/specify the instruction sets used
[x] Reduce the plethora of different representations of instruction set in the product
~~[ ] Add ReadyToRunRejectedPrecompiledCode_InstructionSetMismatch flag to ETW Method event.~~
[ ] Adjust SIMDSupportLevel to be dependent on Sse4.1 instead of 4.2 Also, change handling of intrinsic that generates pcmpgtq instruction to query on support for Sse42 instead of SIMDSupportLevel, so that change is legal to make.
[x] Update JITEEVersionIdentifier
[x] BotR chapter on using hardware intrinsics in AOT code which covers both crossgen1, crossgen2, and runtime behaviors, as well as how the JIT internal logic is supposed to work.

davidwrighton · 2020-03-06T05:44:31Z

@dotnet/crossgen-contrib This PR is a work in progress, and I haven't actually tested a large swath of code through this. Please review for general structure if you have the time, but I don't expect this to be the exact state of what I check in.
@CarolEidt @tannergooding - Carol/Tanner, could you take a look at how I detect usage of instructions in the Jit and tell me if this is a viable approach. In particular, note the new api compSupportsOptional, which needs to be used extremely carefully. In particular, it can only be used when a false result will not generate code that will not run correctly if the result would have been true at runtime.
@AntonLapounov As discussed, I have a new model for how to make corelib intrinsics use safe instead of the previous, risky strategy of trying to identify a magic set of problematic intrinsics. Please take a look.
@richlander This PR establishes the logic for specifying hardware capabilities to crossgen2. In addition to allowing specification of Avx2 I've reworked how intrinsics can be used in most R2R code (especially SSE42 and below intrinsics). I've left out any ability to control the set of speculatively available instruction sets, as there are complications with making Avx2 speculatively available.

MichalStrehovsky · 2020-03-06T11:16:52Z

src/coreclr/src/tools/Common/JitInterface/JitConfigProvider.cs

@@ -28,10 +28,12 @@ public static JitConfigProvider Instance
        private CorJitFlag[] _jitFlags;
        private Dictionary<string, string> _config = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
        private object _keepAlive; // Keeps callback delegates alive
+        private InstructionSetSupport _instructionSetSupport;


JitConfigProvider is a process-wide configuration (due to RyuJIT hosting limitations and no other reason, really). Putting more things in it than strictly necessary ties our hands when we e.g. want to add multi-versioning later.

I would add a InstructionSetSupport GetInstructionSetSupport(MethodWithGCInfo) method on Compilation and have CorInfoImpl call into that when computing the flags in getJitFlags. It would always return the same thing right now, but it will be an obvious point to update if we ever add multi-versioning. I like to leave obvious points like that because it avoids "creative" solutions like adding policies to JitConfigProvider.

(JitConfigProvider stores a bunch CORJIT_FLAGS because I didn't clean it up when we were moving it around in the past few months. I should have, to remove the precedent. It should only store the COMPlus flags for RyuJIT.)

:) Yeah, agreed, I'll admit I ran out of energy for threading this all around the compiler and wanted to see if it all worked.

MichalStrehovsky · 2020-03-06T12:02:05Z

src/coreclr/src/tools/crossgen2/ILCompiler.ReadyToRun/Compiler/InstructionSetSupport.cs

@@ -0,0 +1,350 @@
+// Licensed to the .NET Foundation under one or more agreements.


Could you move this file to tools\Common\Compiler?

MichalStrehovsky · 2020-03-06T12:04:05Z

src/coreclr/src/tools/crossgen2/ILCompiler.ReadyToRun/Compiler/InstructionSetSupport.cs

+
+namespace ILCompiler
+{
+    public class InstructionSetSupport : ICompilationRootProvider


Once this is in Common, it will trigger the rule that says "stuff in Common shouldn't reference the ReadyToRun namespace".

Instead of implementing ICompilationRootProvider, could you move the piece of logic in that method to the Compilation constructor where we can just _dependencyGraph.AddRoot it? It will also avoid having to expose NodeFactory on IRootingServiceProvider.

tannergooding · 2020-03-06T17:55:31Z

src/coreclr/src/inc/readytorun.h

@@ -342,6 +344,30 @@ enum ReadyToRunHelper
    READYTORUN_HELPER_StackProbe                = 0x111,
 };

+enum ReadyToRunInstructionSet


Should there be a note about keeping this in sync (generally speaking) with https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/instr.h#L295?

I'd imagine we wouldn't want to add support for a new ISA in one place and forget about adding it in the other.

I agree, there is a problem here. There are a thicket of different ways in which various closely related portions of the system treat these instruction sets. I currently count 5 different representations (The string one I introduced throughout crossgen2, the ReadyToRunInstructionSet enum, the CORJIT_Flags, the jit internal instruction set enum, and the managed hardware intrinsics classes.) At the moment, I have logic which mostly enforces that the managed hardware intrinsics and the string names are kept in sync, the logic mapping from jit internal instruction sets to strings is setup to produce noway asserts if there isn't a mapping, and the logic mapping string names to CORJIT flags, produces errors if there is no mapping. Thus there is a decent amount of enforcement, but its really messy, and its likely that any actual attempt to add a new instruction set will be really frustrating as a developer finds more and more enums and such to update.

Is it fundamentally necessary that these be distinct? Why can't they be shared in some way?

All constants and tables used in the ReadyToRun file format are described in the ReadyToRun file format headers. (And nothing else is.) By policy, the common portions of the compiler do not use any ReadyToRun enums or data structures. (Fortunately, the translation between R2R enum and the strings in crossgen is straightforward and transparent.)

More complex is the handling of CORJIT_Flags, and the jit internal instruction set enums. I'd be willing to code up consolidating them on the R2R enum, by removing all of the CORJIT_Flags that are instruction set specific, and either using the R2R enum internally in the JIT, or by making sure the R2R enum values match up with the jit internal enum. (This would involve removing some enum values that have already been defined, under the theory that they cannot be used without feeding them through the rest of the system in any case.) I'd be happy to do this, as the current morass of translations is extremely confusing and risky.

tannergooding · 2020-03-06T18:00:22Z

src/coreclr/src/jit/compiler.h

+    // Answer the question: Is a particular ISA supported?
+    // Use this api when asking the question so that future
+    // ISA questions can be asked correctly.
+    bool compSupportsNoReporting(InstructionSet isa) const


So this one asks "is the ISA supported" without changing any flags, etc
compSupports asks but also marks the ISA as being used
compSupportsOptional asks and marks it as used if it is considered part of the baseline?

Not quite.

compSupportsNoReporting asks is the ISA supported without reporting anything to the VM side. Its extremely dangerous to use, and really only valuable for setting up the set of instruction sets useable internally in the jit. Auditing usage of this is quite easy. If the result of this has any direct impact on codegen, it is a bug.

compSupportsOptional asks is the ISA supported. If it returns true, it will report the instruction set usage to the VM, but if false, it does no reporting. This function is only to be used when returning false will not cause the JIT to generate code that will not work correctly if the code is used on a machine which does support the instruction set. Use of this function should be audited with significant care, as incorrect usage of the return value may cause use of illegal instructions or other incorrect behavior at runtime. Unfortunately, I couldn't see a way to usefully do CPU feature usage detection without implementing an api like this, or more significant restructuring of the jit codebase.

compSupports asks is the ISA supported. The fact that the jit asked such a question is recorded and the generated code will not be used on hardware which does not exactly match the result of the call. For instance, if compSupports(Avx) is called, then if it returns false, the generated code will never be executed on hardware which supports Avx. This is a safe api to use at all times. Over-usage of this api may cause unnecessary prevention of functions from being used at runtime.

I wonder if we could come up with names that make their usage more clear as to the intended usage 🤔

I'm notoriously bad at naming (the names make sense to me...), so I'd welcome suggestions, but really I'd rather someone from the JIT team chime in on that, as they are the ones that will have to maintain them going forward, and I want it to make sense to them. @dotnet/jit-contrib

I totally agree with @tannergooding - these are extremely confusing, and the idea of having anything that's "extremely dangerous to use" in the JIT just sounds like a bad idea. I think we really need to think this through carefully and figure out how to add the appropriate validation.

This statement is quite confusing with the triple negation: "This function is only to be used when returning false will not cause the JIT to generate code that will not work correctly if the code is used on a machine which does support the instruction set." I had to look through the code before I understood what it meant. AFAICT the only reason for this is to prevent mixing code with different ideas of the size of Vector<T>.

How about:

bool compCanUse(InstructionSet isa) - this tells the JIT that it's allowed to use a given ISA, but reports no usage to the VM.

bool compSetUsed(InstructionSet isa) - this checks whether the JIT is allowed to use a given ISA and if so, tells the VM that it is doing so.

void compDisallow(InstructionSet isa) - this tells the VM not to use this code on a machine that supports the given isa. This would handle the Vector<T> case.

An interesting scenario I thought of that this wouldn't cover is if you did a Sse41 check for Math.Round, the JIT said "supported with check", and we didn't emit it (we treat it as "unsupported"); but some other part of the method had its own Sse41.IsSupported check and did emit an Sse41 instruction; then we would miss out on that information (which is that "some path" could have benefited from this).

This wouldn't be an issue if we treat it as "supported", but it is something to consider. Likewise, I think we could potentially miss cases with the other proposal if we only rely on compExactlyDependsOn and compOpportunisticallyDependsOn. Such as if some code path emits an Sse41 instruction without the appropriate check.

@tannergooding - I don't understand how the above captures the scenario where the JIT asked if XXX is supported, the answer was "no", but the code is still valid even if the answer had been "yes" (versus the case where it would be incorrect code for the "yes"/supported case).

@CarolEidt, Could you give a small example of such a scenario in psuedo code?

@tannergooding, @CarolEidt, also from the discussion, I am entirely unconvinced that the compSupports SupportedWithCheck return value is actually useful. It might be, but only as a possible optimization we could use in the future if we find that a true/false result is insufficient.

A scenario where there is a check for an instruction set, and the result was false, but the code should still be used is the Math.Round case, or Vector4.Dot case. More generally, while these cases are somewhat rare, they do exist. In our codebase, they can be seen in the spots where I used compSupportsOptional in my change, or where the code uses a construct like getSIMDSupportLevel() and has a viable fallback path that isn't that inefficient. Remember that R2R code isn't really required to be best in class performance, it is instead intended to be pretty good, and sometimes non-ideal output is ok if it works.

A pseudo code example might look like..

float ComplexMathThing(float a, float b, float c) { return MathF.Round(/*Some very complex expression involving a, b, and c*/ ); }

For all of those sorts of uses, I suspect we'd need to have a compSupportsLike function that doesn't record that an instruction is expected to be used. I suspect it be used in exactly the same locations and way as the compSupportsOptional function that started this whole discussion. Thus I believe the actual impact of following Tanner's model of how to think about the problem is just that we might get a better assert message, and it might be easier to explain the model, but the practical effect is much the same.

I still prefer the approach that I outlined about annotating the questions that are asked, but Tanner's approach has the benefit of allowing us to add an assert to the product that verifies that an instruction set was queried for before it was used by the jit.

I am entirely unconvinced that the compSupports SupportedWithCheck return value is actually useful

I might be inclined to agree. The initial reasoning around this was from the perspective of S.P.Corelib in that we could trust all code paths were "equivalent", and therefore IsSupported could be a cached CPUID check.
But, if we are saying we can't trust user code and therefore we must reJIT if the method could target AVX and the actual hardware supports AVX, then it isn't important.

It might be impactful to some AOT scenarios, but not without some external switch allowing users to opt into that behavior.

tannergooding · 2020-03-06T18:02:42Z

src/coreclr/src/tools/Common/JitInterface/CorInfoImpl.cs

@@ -263,6 +263,38 @@ private void PublishCode()
            _methodCodeNode.InitializeDebugVarInfos(_debugVarInfos);
 #if READYTORUN
            _methodCodeNode.InitializeInliningInfo(_inlinedMethods.ToArray());
+
+            // Detect cases where the instruction set support used is a superset of the baseline instruction set specification


Is this saying it will detect AVX as a superset of SSE2 or it only detects instruction sets that are a superset and share the same encoding?

Its more subtle than that.

The default baseline instruction set is Sse, Sse2. However, the jit compiler is told to generate code assuming that it may use Sse, Sse2, Sse3, Ssse3, Sse41, Sse42, Lzcnt, Popcnt, and Pclmqdq. But if the compiler actually uses any of Sse3, Ssse3, Sse41, Sse42, Lzcnt, Popcnt, or Pclmqdq it will notify the crossgen2 managed code. That notification feeds into this logic and if so, we will then mark the method such that the generated code can only be used if the actual hardware supports the instruction set. That's all well and good, and considering that just about all hardware does support these instruction sets, we get crossgen R2R to generate ideal code for all these cases.

The less optimal case is where code attempts to use an Avx, or Avx2 feature, or such. In thoses cases, we will generate code as if the processor does NOT support the feature. In corelib, the idea is that we will audit our code reviews such that such a decision does not make the logic wrong (just slow) (This code auditing is already necessary for our current crossgen1 intrinsics support for corelib), but in the general case we cannot assume that the code will behave correctly if it runs on hardware that does support Avx, Avx2, etc. In that case, what this logic does is record the decision that Avx use was attempted and forbidden. Then the generated code will not be used at runtime, and the JIT will be invoked, if the hardware actually supports Avx/Avx2.

In addition, the --instruction-set argument allows adjusting the baseline instruction set. For instance, if the compiler is run with --instruction-set Sse42, --instruction-set Popcnt --instruction-set Lzcnt, --instruction-set Pclmqdq, then this logic will never kick in, and do anything, as the generated code will never exceed that of the baseline.

Before this is merged, I'd really like to see copious documentation of the behavior. It is potentially quite confusing, and we need to be sure that the precise behavior is discoverable and understandable. Also, we should make sure that the JITDUMP indicates the information it has communicated to the VM, and that the log files at execution time include information about when and why methods are rejected for reasons such as this.

Agreed that this needs documentation somewhere, the existing scheme is very nearly undescribed, and this is a substantial difference (and when this feature is complete, I don't expect to change the crossgen1 model, and this more advanced handling of intrinsics and such will remain restricted to crossgen2. I expect I'll put a document in the BoTR about this.

Adding JITDUMP data should be easy.

For logging, what sort of thing are you looking for? That's a feature all by itself, as we don't currently have any reasonable form of logging around R2R load behavior for functions. I could add an event, or possibly a STRESSLOG entry or an addition the the debug only LOG. @CarolEidt @tannergooding I'd like some feedback here.

There's a method SetReadyToRunRejectedPrecompiledCode on PrepareCodeConfig that we already use to note when the precompiled R2R version of a method is not usable because of changes in dependent assemblies. This makes it into the ETL so jit event clients can see how many precompiled methods ended up not being usable.

Presumably you could extend this concept.

Ah, that's convenient. I wasn't aware of that bit, but it should be fairly straightforward to add one bit that indicates that it was dropped due to an instruction set mismatch, and from there looking at the r2rdump output to figure out which exact instruction set was at fault should be straightforward.

Also, the SetReadyToRunRejectedPrecompiledCode bit would be set for this condition as well. That bit is not tied to changes in dependent assemblies, its tied to changes in type layout, which are most likely triggered by changes in dependent assemblies, but there are other ways it could be different.

tannergooding · 2020-03-06T18:33:52Z

src/coreclr/src/tools/Common/JitInterface/CorInfoImpl.cs

-                // For ReadyToRun, this list needs to match up with the behavior of FilterNamedIntrinsicMethodAttribs
-                // In particular, that this list of supported hardware will not generate non-SSE2 safe instruction
-                // sequences when paired with the behavior in FilterNamedIntrinsicMethodAttribs
+                // For ReadyToRun we set these hardware features as enabled always, as the overwhelming majority


For reference, this is what the x86 emulator for Windows 10 on ARM reports (for both my Samsung GalaxyBook 2 and my Microsoft Surface Pro X):

Notably, SSE4.2, POPCNT, and LZCNT aren't supported.

SSE4.2 and POPCNT were introduced in November 2008, so 11 years old

LZCNT is newer than both and is June 2013, so 6 years old

AMD introduced LZCNT and POPCNT in 2007 (this was part of their ABM instruction set and uses the same CPUID bits, respectively, that Intel later used)

LZCNT, for Intel, is actually newer than AVX (2011) and same time frame as AVX2 (2013)

BMI1/BMI2 are the same time frame as AVX2 (2013, Haswell)

Oh darn, I had thought that SSE4.2 was supported on the emulator, the general strategy here is optimized based on supporting a set that is extremely widely available. As the Arm emulation system only supports SSE4.1, what are your thoughts around changing the SIMDSupportLevel of SIMD_SSE4_Supported to only require SSE4.1? As far as I can tell, the only difference that the jit takes is around support for the pcmpgtq instruction. If we change line 215 in simdcodegenxarch.cpp to check for compSupports(Sse42) instead of using getSIMDSupportLevel() >= SIMD_SSE4_Supported, and changing getSIMDSupportLevel to only require Sse41, I think that would be the only change needed.

One detail is that supporting a few extra hardware instruction sets isn't problematic, and causes only minimal jit behavior at runtime, but the difference between Sse41 and Sse42 support is very significant as the jit has the concept of simdsupport level which is quite impactful.

I think that would be fine, the biggest improvements (with regards to the S.N.Vector types) are likely to come from SSE4.1 anyways, but I would appreciate hearing @CarolEidt's thoughts as well.

Honestly, I don't feel strongly about this, but it seems reasonable to me. One wonders if there are any plans to update the emulator.

I find it interesting that QEMU's CPU emulator has exactly the same restriction to only supporting SSE4.1. I don't intend to put the change into this PR (as it is too large already), but will prepare something separate when this PR is complete.

src/coreclr/src/vm/jitinterface.cpp

src/coreclr/src/inc/corinfo.h

CarolEidt · 2020-03-14T01:12:45Z

I think that @AndyAyersMS is going in the right direction with compExactlyDependsOn and compOpportunisticallyDependsOn. To me those don't really imply that they will change the state of the method rather than just a query, but I can't come up with anything better, and I think that they convey the basic difference between the two cases.

davidwrighton · 2020-03-23T20:00:08Z

@tannergooding @CarolEidt @AndyAyersMS Could you take a read through the BotR chapter I wrote up for this? Its located at https://github.com/dotnet/runtime/blob/8ad656f3ea7fbad01165da80bb1c760c65a2fdc1/docs/design/coreclr/botr/vectors-and-intrinsics.md

AndyAyersMS

Some initial feedback -- need more time to look at the rest of the jit changes.

docs/design/coreclr/botr/vectors-and-intrinsics.md

AndyAyersMS · 2020-03-26T01:32:20Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+The JIT receives flags which instruct it on what instruction sets are valid to use at, and the jit interface api s`notifyInstructionSetUsage(isa, bool supportBehaviorRequired)`. 
+
+The notifyInstructionSetUsage api is used to notify the AOT compiler infrastructure that the code may only execute if the runtime environment of the code is exactly the same as the boolean parameter indicates it should be. For instance, if `notifyInstructionSetUsage(Avx, false)` is used, then the code generated must not be used if the `Avx` instruction set is useable. Similarly `notifyInstructionSetUsage(Avx, true)` will indicate that the code may only be used if the `Avx` instruction set is available.


Suggested change

The notifyInstructionSetUsage api is used to notify the AOT compiler infrastructure that the code may only execute if the runtime environment of the code is exactly the same as the boolean parameter indicates it should be. For instance, if `notifyInstructionSetUsage(Avx, false)` is used, then the code generated must not be used if the `Avx` instruction set is useable. Similarly `notifyInstructionSetUsage(Avx, true)` will indicate that the code may only be used if the `Avx` instruction set is available.

The notifyInstructionSetUsage api is used to notify the AOT compiler infrastructure that the code may only execute if the runtime environment of the code is exactly the same as the boolean parameter indicates it should be. For instance, if `notifyInstructionSetUsage(Avx, false)` is used, then the code generated must not be used if the `Avx` instruction set is available. Similarly `notifyInstructionSetUsage(Avx, true)` will indicate that the code may only be used if the `Avx` instruction set is available.

…roughout the compiler

…e is properly loadable

…reventing it

tannergooding · 2020-03-31T17:57:05Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+Most hardware intrinsics support is tied to the use of various Vector apis. There are 4 major api surfaces that are supported by the runtime
+
+- The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
+- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be any of the primitive types, and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer, and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.


The T type variable may be any of the primitive types

Maybe this could be reworded? It isn't allowed to be char or bool for example. Only the 8 basic integer types (signed and unsigned int8, int16, int32, and int64) and the 2 basic floating-point types (float32 and float64)

The length and alignment of Vector<T> is unknown to the developer, and Vector<T> may not exist in any interop signature

This isn't quite true, it is known at Runtime by checking Vector<T>.Count which specifies it is a struct of exactly Count T fields.

I agree with @tannergooding's comments, and would reword the last phrase (after the comma) as: although they are only hardware accelerated on platforms where the Vector.IsHardwareAccelerated property returns true.

tannergooding · 2020-03-31T18:11:18Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+    private static int UseAvx2(uint value)
+    {
+        // THIS IS A BUG!!!!!


Is it worth documenting that this is a bug because PopCount might have been rejitted to know Avx2.IsSupported but UseAvx2 might have not?

Likewise, is this "ok" to do if UseAvx2 is marked AggressiveOptimization?

I wouldn't think that this would be ok even if it is marked AggressiveOptimization - I don't think we should ever rely for correctness on the behavior of optimization.

I agree with Carol here. At some point we may change the policy on AggressiveOptimization, and that shouldn't break the correctness of applications. We do parse another attribute, the Ssytem.Runtime.BypassReadyToRunAttribute, which can be used to simply declare that a method shouldn't be ready to runned. This attribute isn't actually in the BCL, the application must define it itself, but it can be used to safely forbid a method from being precompiled. I've updated the document, and added a comment that this pattern IS acceptable when applied to a subset of the x86 and x64 instruction sets, as we have special treatment for those instruction sets in the aot compiler when compiling CoreLib.

tannergooding · 2020-03-31T18:19:58Z

Reviewed the botr section on vectors-and-intrinsics and overall LGTM 👍

AndyAyersMS

Did one more pass.

@CarolEidt please look over this PR again when you get a chance.

AndyAyersMS · 2020-03-31T23:11:07Z

src/coreclr/src/jit/compiler.h

+        bool     isaSupported = (opts.compSupportsISA & (1ULL << isa)) != 0;
+        if ((opts.compSupportsISAReported & isaBit) == 0)
+        {
+            notifyInstructionSetUsage(isa, isaSupported);


Maybe add a jitdump here? Eg

Suggested change

notifyInstructionSetUsage(isa, isaSupported);

JITDUMP("Notifying runtime that codegen for this method is %susable if runtime HW supports %s\n",

isaSupported ? "" : "NOT ", InstructionSetToString(isa));

notifyInstructionSetUsage(isa, isaSupported);

notifyInstructionSetUsage does a jitdump internally. (All calls to the api are dumped)

AndyAyersMS · 2020-03-31T23:15:52Z

src/coreclr/src/jit/simd.cpp

+            simdBaseType = TYP_UNKNOWN;
+        }
+#endif // TARGET_XARCH
+


Seems odd that we didn't need this before but need this now -- is it just for the R2R case because you modified largestEnregisterableStructSize ?

Yes, this only happens in the R2R case, and only because of my trickery in largestEnregisterableStructSize. It has the nice property of making the getBaseTypeAndSizeOfSIMDType safe to call at all times, which I thought was a nice upgrade.

AndyAyersMS · 2020-03-31T23:19:23Z

src/coreclr/src/jit/importer.cpp

@@ -1550,7 +1550,7 @@ var_types Compiler::impNormStructType(CORINFO_CLASS_HANDLE structHnd, var_types*
        {
            unsigned originalSize = info.compCompHnd->getClassSize(structHnd);

-            if ((originalSize >= minSIMDStructBytes()) && (originalSize <= maxSIMDStructBytes()))
+            if ((originalSize >= minSIMDStructBytes()) && (originalSize <= largestEnregisterableStructSize()))


These changes seem fairly subtle... would be easy enough to think the upper check should still be maxSIMDStructBytes().

I think we may need to make it clearer why maxSIMDStructBytes and largestEnregisterableStructSize differ and which one should be called when.

Looking at this it would make more sense to me to build a helper function which encapsulates the call to minSIMDStructBytes and largestEnregisterableStructSize. Something like structSizeMightRepresentSIMDType(size_t structSize) ?

Yes, I think that might make the intention clearer.

- Change command line to take the same parameters as the mono aot compiler - Properly handle references to the Vector128 and Vector256 apis

- Add structSizeMightRepresentSIMDType instead of using largestEnresterableStructSize directly - Leaves a good spot for a comment

CarolEidt

I've only reviewed the document so far; will review the code next.

CarolEidt · 2020-04-01T17:22:20Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+Most hardware intrinsics support is tied to the use of various Vector apis. There are 4 major api surfaces that are supported by the runtime
+
+- The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
+- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be any of the primitive types, and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer, and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.


I agree with @tannergooding's comments, and would reword the last phrase (after the comma) as: although they are only hardware accelerated on platforms where the Vector.IsHardwareAccelerated property returns true.

CarolEidt · 2020-04-01T17:25:32Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+- The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
+- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be any of the primitive types, and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer, and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
+- `Vector64<T>`, `Vector128<T>`, and `Vector256<T>` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported. These vector types define very few direct methods; most operations other than creation use the processor specific hardware intrinsics apis.


I'd suggest reword starting with "very few features" as "very few features are supported directly on these types, other than creation. They are primarily used in the processor specific hardware intrinsic apis."

CarolEidt · 2020-04-01T17:28:33Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+1. Usage of `Vector2`, `Vector3`, `Vector4`, and `Vector<T>`. For these, its always safe to just use the types. The jit will generate code that is as optimal as it can for the logic, and will do so unconditionally.
+2. Usage of `Vector64<T>`, `Vector128<T>`, and `Vector256<T>`. These types may be used unconditionally, but are only truly useful when also using the platform specific hardware intrinsics apis.
+3. Usage of platform intrinsics apis. All usage of these apis should be wrapped in an `IsSupported` check of the appropriate kind. Then, within the `IsSupported` check, other platform specific apis may be used.


This is a bit confusing as it tends to gloss over the fact that the developer must ensure that there are IsSupported checks for all apis being used.

CarolEidt · 2020-04-01T18:07:59Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+Hardware intrinsics have dramatic impacts on codegen, and the codegen of these hardware intrinsics is dependent on the ISA available for the target machine when the code is compiled.
+
+If the code is compiled at runtime by the JIT in a just-in-time manner, then the JIT will generate the best code it can based on the current processor's ISA when compiling tier 1 code, and generate functionally equivalent code when compiling tier 0 code. MethodImplOptions.AggressiveOptimization may be used to bypass compilation of tier 0 code and always produce tier 1 code for the method.


This tends to imply that the compiler will not recognize the intrinsics when compiling tier 0 code, but that's not the case. The recognition of intrinsics is independent of tier.

CarolEidt · 2020-04-01T18:09:57Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+If the code is compiled at runtime by the JIT in a just-in-time manner, then the JIT will generate the best code it can based on the current processor's ISA when compiling tier 1 code, and generate functionally equivalent code when compiling tier 0 code. MethodImplOptions.AggressiveOptimization may be used to bypass compilation of tier 0 code and always produce tier 1 code for the method.
+
+For ahead of time compilation, the situation is far more complex. This is due to the following principles of how our ahead of time compilation model works.


I would suggest to use "ahead-of-time" to make it clear that this term is basically an adjective applied to compilation. In fact, I'd suggest to define it early, along with the acronym, and then use AOT in the rest of the document.

CarolEidt · 2020-04-01T18:19:26Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+#### Characteristics which result from rules
+The rules here provide the following characteristics.
+- Some platform specific hardware intrinsics can be used in CoreLib without encountering a startup time penalty
+- Use of platform specific hardware intrinsics causing runtime jit can be mitigated by careful choice of which intrinsics are used.


This is a bit confusing, perhaps a little elaboration would be useful.

CarolEidt · 2020-04-01T18:20:06Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+The rules here provide the following characteristics.
+- Some platform specific hardware intrinsics can be used in CoreLib without encountering a startup time penalty
+- Use of platform specific hardware intrinsics causing runtime jit can be mitigated by careful choice of which intrinsics are used.
+- Use of `Vector<T>` causes runtime jit and startup time concerns. Current analysis indicates this is acceptable, but it is a perennial concern for applications with tight startup time requirements.


after "concerns" add "because it is never precompiled".

CarolEidt · 2020-04-01T18:27:28Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+
+    private static int UseAvx2(uint value)
+    {
+        // THIS IS A BUG!!!!!


I wouldn't think that this would be ok even if it is marked AggressiveOptimization - I don't think we should ever rely for correctness on the behavior of optimization.

CarolEidt · 2020-04-01T18:30:22Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+### Code written in other assemblies (both first and third party)
+
+#### Crossgen implementation rules
+- Any code which uses an intrinsic out of the `System.Runtime.Intrinsics.Arm` or `System.Runtime.Intrinsics.X86` namespace will not be compiled ahead of time. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`) 


Suggest "out of the" => "from the"

CarolEidt · 2020-04-01T18:33:38Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

+- The baseline instruction set which defaults to (Sse, Sse2), but may be adjusted via compiler option.
+- The optimistic instruction set which defaults to (Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, and Lzcnt).
+
+Code will be compiled using the optimistic instruction set to drive compilation, but any use of an instruction set beyond the baseline instruction set will be recorded, or any failure of use of an instruction set that would affect semantic behavior of code. If the baseline instruction set includes `Avx2` then the size and characteristics of of `Vector<T>` is known. Any other decisions about ABI may also be encoded. For instance, it is likely that the ABI of `Vector256<T>` will vary based on the presence/absence of `Avx` support.


I think the "will be recorded" belongs after the "or any failure" clause.

CarolEidt

A question and a comment; otherwise I think this is getting close.

CarolEidt · 2020-04-01T23:06:48Z

src/coreclr/src/inc/corinfoinstructionset.h

+        case READYTORUN_INSTRUCTION_Popcnt: return InstructionSet_POPCNT;
+#endif // TARGET_AMD64
+#ifdef TARGET_X86
+        case READYTORUN_INSTRUCTION_Sse: return InstructionSet_SSE;


I think you may have addressed this in an earlier discussion, but I believe that these need to be the same as the TARGET_AMD64 ones - is there a reason they can't share code here?

These are autogenerated, and it would add a fair amount of complexity to have the autogenerator write them out once. In this particular part of the code, they don't actually have to be the same (that's restricted to a small bit of logic that is in crossgen2, which have explicit asserts for the details that must remain the same.) At the same time, the way they are authored in the InstructionSetDesc.txt, the autogenerator will happen to keep things the same and as long as new code follows the pattern in there, stuff will remain the same.

Ah yes, I recall that now. Seems reasonable.

CarolEidt · 2020-04-01T23:13:14Z

src/coreclr/src/jit/compiler.h

@@ -8095,6 +8087,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        else
        {
            assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
+            compExactlyDependsOn(InstructionSet_AVX2); // Record that AVX2 isn't supported


This really feels uncomfortable. Might it be worth adding a compExactlyForbids? I know we've been over this before, but this looks like an assertion that it is present, not an assertion that it is not.

We haven't been over this particular usage before, as it might have been obscured by some of the other changes you were looking at. I share your discomfort here, Perhaps

// Ensure that code will not execute if an instruction set is useable. Call only // if the instruction set has previously reported as unuseable, but when // that that status has not yet been recorded to the AOT compiler void compVerifyInstructionSetUnuseable(CORINFO_InstructionSet isa) { // use compExactlyDependsOn to ensure that the exact state of the useability of isa is recorded bool isaUseable = compExactlyDependsOn(isa); // Assert that the useability of the isa is false. If true, this function should never be called. assert(!isaUseable); }

@davidwrighton - that sounds like a great alternative; wordy but certainly a context where a little wordiness is probably justified.

CarolEidt

LGTM - just a couple of minor rewording suggestions on the doc.

CarolEidt · 2020-04-02T21:12:01Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

 # Intrinsics apis
 Most hardware intrinsics support is tied to the use of various Vector apis. There are 4 major api surfaces that are supported by the runtime

 - The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be any of the primitive types, and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer, and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
- `Vector64<T>`, `Vector128<T>`, and `Vector256<T>` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported. These vector types define very few direct methods; most operations other than creation use the processor specific hardware intrinsics apis.
+- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be the following types (`System.Byte`, `System.SByte`, `System.Int16`, `System.UInt16`, `System.Int32`, `System.UInt32`, `System.Int64`, `System.UInt64`, `System.Single`, and `System.Double`), and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer at compile time (although discoverable at runtime by using the `Vector<T>.Count` api), and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations if the `Vector<T>.IsHardwareAccelerated` api.


Needs a "returns true at the end, I think.

CarolEidt · 2020-04-02T21:12:06Z

docs/design/coreclr/botr/vectors-and-intrinsics.md

- Use of platform specific hardware intrinsics causing runtime jit can be mitigated by careful choice of which intrinsics are used.
- Use of `Vector<T>` causes runtime jit and startup time concerns. Current analysis indicates this is acceptable, but it is a perennial concern for applications with tight startup time requirements.
- Ahead of time generated code which could take advantage of more advanced hardware support experiences a performance penalty until rejitted. (If a customer chooses to disable tiered compilation, then customer code may always run slowly).
+- Some use of platform specific hardware intrinsics will force the compiler to be unable to AOT compile the code. However, if care is taken to only use intrinsics from the Sse, Sse2, Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, or Lzcnt instruction sets, then the code may be AOT compiled. Preventing AOT compilation may cause a startup time penalty for important scenarios.


Some use => Some uses?

- This will work for now, and we can fix this later with an easier to review issue

CarolEidt · 2020-04-04T00:40:14Z

Congratulations @davidwrighton on getting this completed!

MichalStrehovsky · 2020-04-06T11:22:18Z

src/coreclr/src/tools/Common/JitInterface/CorInfoImpl.cs

@@ -2938,9 +2930,27 @@ private uint getJitFlags(ref CORJIT_FLAGS flags, uint sizeInBytes)
            return (uint)sizeof(CORJIT_FLAGS);
        }

+
+        InstructionSetFlags _actualInstructionSetSupported;
+        InstructionSetFlags _actualInstructionSetUnsupported;


@davidwrighton I was digging through this over the weekend and this will always have the default value of 0. Should we actually set it somewhere?

Sigh. Yes. Just below the !supportEnabled path should be setting this.

Dotnet-GitSync-Bot added the area-crossgen2-coreclr label Mar 6, 2020

MichalStrehovsky reviewed Mar 6, 2020

View reviewed changes

tannergooding reviewed Mar 6, 2020

View reviewed changes

dotnet deleted a comment from azure-pipelines bot Mar 6, 2020

AntonLapounov reviewed Mar 6, 2020

View reviewed changes

src/coreclr/src/vm/jitinterface.cpp Outdated Show resolved Hide resolved

BruceForstall reviewed Mar 12, 2020

View reviewed changes

src/coreclr/src/inc/corinfo.h Outdated Show resolved Hide resolved

davidwrighton mentioned this pull request Mar 13, 2020

Sse2 Intrinsics and Vector<T> paths no longer being output at R2R #33544

Closed

tannergooding mentioned this pull request Mar 17, 2020

use YMM registers on x64 for BlkUnroll. #33665

Closed

davidwrighton force-pushed the instruction_set_support branch 2 times, most recently from 6e546b4 to 8ad656f Compare March 23, 2020 19:58

davidwrighton force-pushed the instruction_set_support branch from 8ad656f to 619553e Compare March 24, 2020 17:50

davidwrighton changed the title ~~[WIP] Hardware instruction set support for crossgen2~~ Hardware instruction set support for crossgen2 Mar 25, 2020

davidwrighton marked this pull request as ready for review March 25, 2020 01:13

AndyAyersMS reviewed Mar 26, 2020

View reviewed changes

davidwrighton added 8 commits March 27, 2020 16:18

Progress towards -instruction-set command line argument

ad99cce

Specified instruction set support

b3231bd

Specification of instruction set on command line

79f6040

Take advantage of InstructionSetSupport class to reduce hardcoding th…

8427ef9

…roughout the compiler

Funnel baseline instruction set support to R2R image so that the imag…

f3afae3

…e is properly loadable

Basic infra for detecting advanced instruction set usage instead of p…

eff6176

…reventing it

R2RDump Support for ReadyToRunFixupKind.Check_InstructionSetSupport

2c1052a

Tweak build until it actually builds

e3bc096

tannergooding reviewed Mar 31, 2020

View reviewed changes

AndyAyersMS reviewed Mar 31, 2020

View reviewed changes

davidwrighton added 2 commits March 31, 2020 17:31

Code review feedback

617c2ac

- Change command line to take the same parameters as the mono aot compiler - Properly handle references to the Vector128 and Vector256 apis

Improve clarity around computing if a type is a SIMD type

670d042

- Add structSizeMightRepresentSIMDType instead of using largestEnresterableStructSize directly - Leaves a good spot for a comment

davidwrighton mentioned this pull request Apr 1, 2020

[mono] Implement Arm intrinsics: ArmBase, Crc32 #34240

Merged

Fix x86 and arm build break

f200242

CarolEidt reviewed Apr 1, 2020

View reviewed changes

Update BotR chapter in response to feedback

4138f7c

CarolEidt reviewed Apr 1, 2020

View reviewed changes

davidwrighton added 3 commits April 2, 2020 10:42

Code Review feedback from Carol

4814331

Fix formatting error

0ba367f

Michal feedback

e535940

MichalStrehovsky approved these changes Apr 2, 2020

View reviewed changes

CarolEidt approved these changes Apr 2, 2020

View reviewed changes

davidwrighton added 5 commits April 2, 2020 15:19

Last nits

b8e07bc

Fix uninitialized field

9feef51

Tweak additional discovered difference

0b8b9f4

#ifdef

f399ccd

Use conservative fix for isSubRegisterSIMDType

67ed9e4

- This will work for now, and we can fix this later with an easier to review issue

davidwrighton merged commit 5ac25ac into dotnet:master Apr 3, 2020

MichalStrehovsky reviewed Apr 6, 2020

View reviewed changes

This was referenced Apr 6, 2020

Fix unsupported instruction set handling in crossgen2 #34605

Merged

Don't notify crossgen2 about Vector<T> size when not needed #34614

Merged

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

davidwrighton deleted the instruction_set_support branch April 20, 2021 17:42

		@@ -0,0 +1,350 @@
		// Licensed to the .NET Foundation under one or more agreements.


		The JIT receives flags which instruct it on what instruction sets are valid to use at, and the jit interface api s`notifyInstructionSetUsage(isa, bool supportBehaviorRequired)`.

		The notifyInstructionSetUsage api is used to notify the AOT compiler infrastructure that the code may only execute if the runtime environment of the code is exactly the same as the boolean parameter indicates it should be. For instance, if `notifyInstructionSetUsage(Avx, false)` is used, then the code generated must not be used if the `Avx` instruction set is useable. Similarly `notifyInstructionSetUsage(Avx, true)` will indicate that the code may only be used if the `Avx` instruction set is available.


		Hardware intrinsics have dramatic impacts on codegen, and the codegen of these hardware intrinsics is dependent on the ISA available for the target machine when the code is compiled.

		If the code is compiled at runtime by the JIT in a just-in-time manner, then the JIT will generate the best code it can based on the current processor's ISA when compiling tier 1 code, and generate functionally equivalent code when compiling tier 0 code. MethodImplOptions.AggressiveOptimization may be used to bypass compilation of tier 0 code and always produce tier 1 code for the method.


		If the code is compiled at runtime by the JIT in a just-in-time manner, then the JIT will generate the best code it can based on the current processor's ISA when compiling tier 1 code, and generate functionally equivalent code when compiling tier 0 code. MethodImplOptions.AggressiveOptimization may be used to bypass compilation of tier 0 code and always produce tier 1 code for the method.

		For ahead of time compilation, the situation is far more complex. This is due to the following principles of how our ahead of time compilation model works.

Hardware instruction set support for crossgen2 #33274

Hardware instruction set support for crossgen2 #33274

Conversation

davidwrighton commented Mar 6, 2020 • edited Loading

davidwrighton commented Mar 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Mar 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt commented Mar 14, 2020

davidwrighton commented Mar 23, 2020

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Mar 31, 2020

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

davidwrighton commented Mar 6, 2020 •

edited

Loading

tannergooding Mar 6, 2020 •

edited

Loading