-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Introduce local variables with unknown size #122638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds lvaLclSymbolicSize for handling locals that may have an unknown size at compile time (SIZE_UNKNOWN). Adds various assertions to areas with assumptions based on concrete size values. Updates parts of value numbering and SSA building to use the symbolic size of the type.
|
@dotnet/arm64-contrib |
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
@dotnet/arm64-contrib @dotnet/jit-contrib Please could I have a review for this patch? It's a precursor for supporting the rest of the local variable implementation for SVE |
src/coreclr/jit/lclvars.cpp
Outdated
| // Return the size of the local variable, allowing the possibility for the local | ||
| // to have an unknown size. Returns the size of the local variable in bytes, or | ||
| // SIZE_UNKNOWN. | ||
| ValueSize Compiler::lvaLclValueSize(unsigned varNum) | ||
| { | ||
| assert(varNum < lvaCount); | ||
| return lvaGetDesc(varNum)->lvValueSize(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a proper function header (and the comment needs to be updated)
src/coreclr/jit/gentree.h
Outdated
| // Comparisons between ValueSize returns true when both values have known | ||
| // sizes and these sizes are equal. All other cases return false. | ||
| // Type information is needed to determine whether two unknown ValueSizes | ||
| // are equivalent. | ||
| bool operator==(const ValueSize& other) const | ||
| { | ||
| return (m_size == other.m_size) && !IsUnknown() && !other.IsUnknown(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this. This seems like it can return false negatives, I would leave it up to the caller whether that is the wanted behavior or whether a runtime equality needs to be produced.
| // TODO-SVE: What are the consequences of excluding Vector<T> here? | ||
| if (onStack && !varTypeHasUnknownSize(varDsc)) | ||
| { | ||
| frameSize += m_pCompiler->lvaLclStackHomeSize(lclNum); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, what is future the plan for structs that contain dynamically sized vectors and for their layout?
It seems here we just want some kind of estimate for the size. It makes me wonder if we should change "unknown" to "vector sized" in various places, e.g. ValueSize would have constant and vector, and this place would just estimate the size if it was a vector size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my development branch I have allocated Vector<T> locals in a separate region of the stack to other locals (just at the end of the usual local frame, before dynamic stack allocations). If I can prove that a value type consists only of Vector<T>, it will also be allocated in this stack region. Vector<T> are just bump allocated into this region, so the layout of the struct will be preserved.
Structs containing both Vector<T> and other known size types are more complicated and I haven't fully worked through them yet. In JIT mode, currently we're fine because the EE will discover the size of Vector<T> and hand it to the JIT. So things can be placed on the usual stack and offsets can be computed at compile time. Handling TYP_SIMD with an unknown size is not strictly necessary right now, but it's preempting any incompatibilities with AOT in future.
There will be a point where I need to update the ICorJitInfo implementation to report that Vector<T> has an unknown size, and update the JIT to support these kinds of layouts. I am expecting that another region on the stack will be required for this so they can be address exposed. The main consequence of this is that field accesses into these structs will need to add code to the tree for computing field offsets at runtime, as we no longer have them at compile time.
It seems here we just want some kind of estimate for the size.
I could provide a lower bound of 16-bytes or an upper bound of 256-bytes and multiply this by the number of allocated Vector<T>. In JIT mode the vector length will be available to provide an exact number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling TYP_SIMD with an unknown size is not strictly necessary right now
I should clarify 'right now' means while there is a config variable guarding the import of this type into the JIT, until it's fully supported in JIT and AOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a point where I need to update the
ICorJitInfoimplementation to report thatVector<T>has an unknown size, and update the JIT to support these kinds of layouts. I am expecting that another region on the stack will be required for this so they can be address exposed. The main consequence of this is that field accesses into these structs will need to add code to the tree for computing field offsets at runtime, as we no longer have them at compile time.
I think this becomes complicated very quickly. These structs can contain GC pointers at unknown offsets. I do not think we have a mechanism today that would allow us to report something like that to GC.
Is there an issue/document somewhere that captures some of the issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue/document somewhere that captures some of the issues?
These structs can contain GC pointers at unknown offsets.
Can we handle this by reordering struct fields, such that types with unknown sizes are pushed to the end of the structure and therefore we can compute offsets as normal on GC refs in the structure? Then it's a case of managing compatibility with StructLayoutAttribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle this by reordering struct fields, such that types with unknown sizes are pushed to the end of the structure and therefore we can compute offsets as normal on GC refs in the structure?
It would not solve cases like struct with struct fields.
This will need GC info extension to handle the variable offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not solve cases like struct with struct fields.
I see. So we'd have to teach the GC to read VL. And offsets to GC pointers need to be composed of a known size offset and some multiple of VL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: GC stack root enumeration is not part of the GC proper.
GC stack root enumeration operates on CPU context. VL is effectively part of the CPU context, so it is not unnatural to take it into account for GC stack root enumeration.
We will need to figure out how to encode this information in the GC info format. The encoding is probably the most complicated part of the whole flow.
src/coreclr/jit/valuenum.cpp
Outdated
| unsigned lclSize = lvaLclExactSize(defLclNum); | ||
| ValueSize lclSize = defVarDsc->lvValueSize(); | ||
|
|
||
| ValueNumPair newLclValue; | ||
| if (vnStore->LoadStoreIsEntire(lclSize, defOffset, defSize)) | ||
| if (vnStore->LoadStoreIsEntire(lclSize.GetSize(), defOffset, defSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly some work will be needed here. What if this is a TYP_SIMD store? @SingleAccretion do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can make a similar test here if the def has unknown size, but the def type is equivalent to the local type, and the offset is 0, then this is an entire store on that local. A partial store of a TYP_SIMD field in a struct would see a def with unknown size against a local of TYP_STRUCT.
I'm not sure if some sort of indexed assignment on Vector<T> falls into this path yet but a known def size against an unknown local size could also be relevant? E.g. vec[0] = 1 would have a known def size but unknown local size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have any thoughts on this?
The whole VN system of physical maps relies on all stores and loads having a definite size/offset. This is how we know that an older store does or does not overlap with a more recent store, that we do or don't need to modify the whole array for a store, etc.
For selection, I wonder if we just need to modify the selector to treat stores of a variable-sized type as if it was "the largest possible" size.
I think the "hardest" question here is what to do about the out-of-bounds store/load cases. In principle, unless we can prove that offset + largest-possible-variable-sized-type-size <= locationSize, all of them need to be treated as out-of-bounds if we accept the current system as-is (it means that we need to address-expose most indirect access to such locals). Or we could bend the physicality rules a bit and say it's actually UB to do out-of-bounds loads/stores (it is already kind of so).
Then it is just a question of which stores are partial or not. It's OK for all stores to be partial, it just makes the optimizer a bit conservative. Or it could be more precise, like @snickolls-arm proposed above. The important thing is that the overall system (liveness, SSA + VN) is self-consistent and we have tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important thing is that the overall system (liveness, SSA + VN) is self-consistent and we have tests for it.
For testing, there is a fairly large coverage of Vector<T> in src/tests/JIT/SIMD and src/tests/JIT/HardwareIntrinsics. The tests in the latter exercise Vector<T> as a struct field regularly as well as various other scenarios.
While this is all guarded by DOTNET_JitUseScalableVectorT the CI won't be exercising some of the logic introduced by these code changes, but I could add another test project that runs the tests in src/tests/JIT/HardwareIntrinsics with this config variable set, if that helps with verification. Ultimately the goal is to remove this config variable as soon as possible so normal CI runs, but we're still a few patches away from the CI being green.
@tannergooding FYI as we've discussed this^ previously on #121548
In principle, unless we can prove that offset + largest-possible-variable-sized-type-size <= locationSize, all of them need to be treated as out-of-bounds if we accept the current system as-is
Ignoring struct fields for now and just considering an indirect access somewhere inside of a Vector<T>, we have a guaranteed minimum vector size (128-bit) and therefore can decide if offset + accessSize <= minimumVectorSize. This could cover small accesses/small offsets by proving they are within bounds.
Extending this to a struct, if we have N Vector<T> fields in the struct, we could similarly try to prove offset + accessSize <= SUM(known size members) + N * minimumVectorSize.
In short, try to get a conservative estimate for the size of the structure using the minimum bound of sizeof(Vector<T>) and use it to prove that accesses are within bounds, as opposed to out-of-bounds.
src/coreclr/jit/valuenum.cpp
Outdated
| lclFld->gtVNPair = vnStore->VNPairForLoad(lclVarValue, lvaLclExactSize(lclNum), lclFld->TypeGet(), | ||
| lclFld->GetLclOffs(), lclFld->GetSize()); | ||
| lclFld->gtVNPair = | ||
| vnStore->VNPairForLoad(lclVarValue, lvaLclValueSize(lclNum).GetSize(), lclFld->TypeGet(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change feels unnecessary.
src/coreclr/jit/valuenum.cpp
Outdated
| { | ||
| GenTreeLclVarCommon* lcl = store->AsLclVarCommon(); | ||
| fgValueNumberLocalStore(store, lcl, 0, lvaLclExactSize(lcl->GetLclNum()), valueVNPair, | ||
| fgValueNumberLocalStore(store, lcl, 0, lvaLclValueSize(lcl->GetLclNum()).GetSize(), valueVNPair, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that fgValueNumberLocalStore should be updated to take ValueSize.
After looking through things it looks like it would be better to be more explicit in ValueSize and make it able to represent "constant" and "vector" sizes instead of "constant" and "unknown" sizes. I expect we will have number of IR patterns that will be disallowed for TYP_SIMD -- for instance LCL_FLD with TYP_SIMD. It would be nice for fgValueNumberLocalStore to be able to use these invariants to know that size.IsVector() means the store is not partial, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that fgValueNumberLocalStore should be updated to take ValueSize.
Agreed, do you want to see this in this PR?
"constant" and "vector" sizes instead of "constant" and "unknown" sizes
I've kept it generic so far thinking that TYP_MASK will also fall in this category, but you could argue this is also a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept it generic so far thinking that
TYP_MASKwill also fall in this category, but you could argue this is also a vector.
ValueSize could allow for both IsVector() and IsMask(), e.g. represent VECTOR_SIZE = 0xFFFFFFFF, MASK_SIZE = 0xFFFFFFFE internally.
It is not 100% clear to me whether this ends up being beneficial or not, but it seems there is some evidence that it may be beneficial even in this PR already. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have sizeof(TYP_SIMD) = VL = 8 * sizeof(TYP_MASK), so there should be a different value number for them both and I think what you're suggesting is beneficial.
Initially I thought that value numbers contain type information already that would be enough to make this difference without further specialization in the size field (do correct me if I haven't understood this correctly, it's quite complicated). But it would make it more robust if their size components are also different, and this is true to the hardware which is another plus.
The ValueSize class represents the size of type for optimization purposes. It allows symbolic representations of sizes that don't have an exact numerical value at compile time, but have some fixed value at runtime. This value must be equivalent for all instances of the same type. The patch replaces any instances of exact sizes being used to construct value numbers with processing that supports ValueSize instead. SSA Semantics are applied with the following considerations: * A load/store can be considered entire when the load/store ValueSize is equivalent to the source/destination ValueSize, and the offset of the operation is zero. Creates a complete definition. * A load/store on a structure can be considered a partial load/store when the ValueSize of the load/store is not equivalent to the source/destination local variable's ValueSize and/or the offset of the operation is not zero. Creates a use-assign definition. * Complete definitions of non-exact ValueSizes propagate definitions as usual. * Partial definitions of non-exact ValueSizes create unique value numbers, as the bounds of the load/store operation cannot be determined. The patch also applies ValueSize to TYP_SIMD on ARM64. ValueSize kinds are added for Vector and Mask types which have unknown size at compile time when supported by the ARM64 Scalable Vector Extension.
|
I've pushed some further value numbering implementation based on your suggestions, and some assertions firing in my branch after introducing |
| @@ -4,8 +4,6 @@ | |||
| #ifndef _SIMD_H_ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding and others, what is the plan for constant masks generated by PTRUE and it's various enum constants?
With the current code, at import the jit turns CreateTrueMask(type) into a constant value, CNS_MSK. This constant can then be subject to standard optimisations. At codegen, any CNS_MSK nodes are converted back into the relevant ptrue instruction (or a load from memory if no enum pattern can be matched).
With variable vector lengths this becomes tricky.
Consider LargestMultipleOf3 for a UINT vector. With 128bits, it sets the first 3 lanes and clears the rest. On 256bits it's 6 lanes. On 512bits it's 15 lanes.
To represent as a constant we need to know the vector length at compile time. That is one way we could go, but it breaks the whole vector length agnostic nature.
We can't have a mask that's big enough for any possible vector length, because LargestMultipleOf3 needs to know the max value.
There is no repeating pattern, so that can't be used.
The only other way (that I can see) is to go back to keeping everything as a TrueMask node with attached enum throughout the compilation. That means we loose all the constant folding code that I added last year, which is a shame. However, it reality it will be rare for CreateTrueMask() to be called with anything other than allTrue.
src/coreclr/jit/compiler.hpp
Outdated
| noway_assert(TypeGet(value) != TYP_SIMD); | ||
| noway_assert(TypeGet(value) != TYP_MASK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| noway_assert(TypeGet(value) != TYP_SIMD); | |
| noway_assert(TypeGet(value) != TYP_MASK); | |
| assert(TypeGet(value) != TYP_SIMD); | |
| assert(TypeGet(value) != TYP_MASK); |
I don't think we need these checks in all builds.
src/coreclr/jit/valuenum.cpp
Outdated
| assert(genTypeSize(type) != SIZE_UNKNOWN); | ||
| return ValueSize(genTypeSize(type)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in gentree.cpp instead?
jakobbotsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me now, with some nits
* Move ValueSize::FromJitType to gentree.cpp * Demote assertions to debug only
src/coreclr/jit/valuenum.cpp
Outdated
| printf(" VNForHandle(%s) is " FMT_VN ", fieldType is %s", fldName, fldHndVN, varTypeName(fieldType)); | ||
|
|
||
| if (size != 0) | ||
| if (fieldType == TYP_STRUCT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not ideal to lose this logging for primitive fields. Use pSize->IsExact() instead?
src/coreclr/jit/valuenum.cpp
Outdated
| assert(!LoadStoreIsEntire(locationSize, offset, storeSize)); | ||
|
|
||
| unsigned storeOffset = static_cast<unsigned>(offset); | ||
| #ifdef TARGET_ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition of ARM64 ifdefs is inconsistent. Hide them under IsExact instead?
src/coreclr/jit/valuenum.cpp
Outdated
| { | ||
| // TODO-SVE: This is conservative, we may be able to select better value numbers for stores | ||
| // that we can prove are within the bounds of the minimum size of a vector/mask. | ||
| assert(locationSize.IsVector() || storeSize.IsVector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an important assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this can be removed.
src/coreclr/jit/valuenum.cpp
Outdated
| // that we can prove are within the bounds of the minimum size of a vector/mask. | ||
| assert(locationSize.IsVector() || storeSize.IsVector()); | ||
| JITDUMP(" *** VNForStore: location or store size is unknown\n"); | ||
| return VNForExpr(m_pComp->compCurBB, TYP_SIMD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should follow the logic for out-of-bounds stores with return NoVN.
This will force the callers to invalidate the whole heap. It is meant to handle cases like:
class X
{
int Y;
int Z;
}
X x = ...;
*(long*)&x.Y = 1; // Invalidates both heap[Y] and heap[Z].
Here we can have a similar situation with a too-small locationSize or too-large storeSize.
src/coreclr/jit/gentree.cpp
Outdated
| } | ||
| } | ||
|
|
||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the answer for unknown sizes here be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the size of the field is unknown, and the store offset != the offset of the field, then we can't prove that the store offset is within bounds of the field (but we could work on inequalities related to the maximum/minimum bounds).
What would the consequences of this be, if we incorrectly number the field store as if it is not affected by a store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the consequences of this be, if we incorrectly number the field store as if it is not affected by a store?
It would "miss" the store. SSA/VN would number the following uses as if they represented the preceding definition.
Something like:
struct X {
int A;
int B;
// Bunch more fields.
};
X x = ...;
x.B = 1;
*(Vector*)&x = Vector.Zero.
Use(x.B); // Should be '0' not '1'By the way, for these VN issues we're discussing, it would really be a good idea to add them as tests too (even say disabled ones). I doubt the general testing we have with vectors would cover these trickier scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I see what you mean.
So we should be assuming all stores to unknown-size locations are going to define that location, and allowing VNForStore to produce NoVN where relevant? In this case, maybe we should rename to gtStoreMayDefineField as it's no longer certain?
it would really be a good idea to add them as tests too (even say disabled ones)
I'll add a test project for this, we'll be able to enable it after a few more patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should be assuming all stores to unknown-size locations are going to define that location, and allowing VNForStore to produce NoVN where relevant?
Right. You will need to add some handling for this possibility in fgValueNumberLocalStore.
In this case, maybe we should rename to gtStoreMayDefineField as it's no longer certain?
That would make sense to me.
* Remove TARGET_ARM64 ifdefs * Remove assertion * Return NoVN for non-entire vector load/store * Assume inexact field stores are always definitions * Add tests for value numbering Vector<T>
| Vector<int> data = new Vector<int>(); | ||
| *(Vector<int>*)&data = Vector<int>.One; | ||
| *(Vector<int>*)&data = Vector<int>.Zero; | ||
| *(Vector<int>*)&data = Vector<int>.One; | ||
| *(Vector<int>*)&data = Vector<int>.Zero; | ||
| return (data == Vector<int>.Zero); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem valuable. All these whole stores will be simplified to STORE_LCL_VAR, and even then, the last and first one will all be zeroes.
Adds
ValueSizeclass for handling sizes that may be unknown at compile time during phases such as value numbering, that aren't concerned with the exact numeric value (only properties such as equivalence etc.). Adds various assertions to areas of value numbering and SSA building with assumptions based on exact size values. AddsLclVarDsc::lvValueSizeto produce aValueSizefor a local variable, which provides the criteria for which locals have unknown size at compile time.