Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make normal statics simpler #99183

Merged
merged 103 commits into from
Jun 13, 2024

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Mar 2, 2024

This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster.

This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the MethodTable structures instead of storing them in a DomainLocalModule as was done before.

Particularly notable changes

  • All statics are now considered "dynamic" statics.
  • Statics for collectible assemblies now have an identical path for lookup of the static variable addresses as compared to statics for non-collectible assemblies. It is now reasonable for the process of reading static variables to be inlined into shared generic code, although this PR does not attempt to do so.
  • Lifetime management for collectible non-thread local statics is managed via a combination of a LOADERHANDLE to keep the static alive, and a new handle type called a HNDTYPE_WEAK_INTERIOR_POINTER which will keep the pointers to managed objects in the MethodTable structures up to date with the latest addresses of the static variables.
  • Each individual type in thread statics has a unique object holding the statics for the type. This means that each type has a separate object[](for gc statics), and/or double[](for non-gc statics) per thread for TLS statics. This isn't necessarily ideal for non-collectible types, but its not terrible either.
  • Thread statics for collectible types are reported directly to the GC instead of being handled via a GCHandle. While needed to avoid complex lifetime rules for collectible types, this may not be ideal for non-collectable types.
  • Since the DomainLocalModule no longer exists, the ISOSDacInterface has been augmented with a new api called ISOSDacInterface14 which adds the ability to query for the static base/initialization status of an individual type directly.
  • Significant changes for generated code include
    • All the helpers are renamed
    • The statics of generics which have not yet been initialized can now be referenced using a single constant pointer + a helper call instead of needing a pair of pointers. In practice, this was a rare condition in perf-critical code due to the presence of tiered compilation, so this is not a significant change to optimized code.
    • The pre-initialization of statics can now occur for types which have non-primitive valuetype statics as long as the type does not have a class constructor.
    • Thread static non-gc statics are now returned as byrefs. (It turns out that for collectible assemblies, there is currently a small GC hole if a function returns the address of a non-gc threadstatic. CoreCLR at this time does not attempt to keep the collectible assembly alive if that is the only live pointer to the collectible static in the system)

With this change, the pointers to normal static data are located at a fixed offset from the start of the MethodTableAuxiliaryData, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as the DomainLocalModule , ThreadLocalModule, ModuleId and ModuleIndex no longer exist.

Lifetime management for collectible statics

  • For normal collectible statics, each type will allocate a separate object[] for the GC statics and a double[] for the non-GC statics. A pointer to the data of these arrays will be stored in the DynamicStaticsInfo structure, and when relocation occurs, if the collectible types managed LoaderAllocator is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type.
  • For collectible thread-local statics, the lifetime management is substantially more complicated due the issue that it is possible for either a thread or a collectible type to be collected first. Thus the collection algorithm is as follows.
    • The system shall maintain a global mapping of TLS indices to MethodTable structures
    • When a native LoaderAllocator is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managed LoaderAllocator object is destroyed, the mapping from TLS indices to collectible LoaderAllocator structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping)
    • When a GC promotion or collection scan occurs, for every TLS index which was freed to point at a GC index the relevant entry in the TLS table shall be set to NULL in preparation for that entry in the table being reused in the future. In addition, if the TLS index refers to a MethodTable which is in a collectible assembly, and the associated LoaderAllocator has been freed, then set the relevant entry to NULL.
    • When allocating new entries from the TLS mapping table for new collectible thread local structures, do not re-use an entry in the table until at least 2 GCs have occurred. This is to allow every thread to have NULL'd out the relevant entry in its thread local table.
    • When allocating new TLS entries for collectible TLS statics on a per-thread basis allocate a LOADERHANDLE for each object allocated, and associate it with the TLS index on that thread.
    • When cleaning up a thread, for each collectible thread static which is still allocated, we will have a LOADERHANDLE. If the collectible type still has a live managed LoaderAllocator free the LOADERHANDLE.

Expected cost model for extra GC interactions associated with this change

This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do.

  1. For normal statics on collectible types, it uses the a weak interior pointer GC handle for each of these that is allocated. This is purely pay for play and trades off performance of accessing collectible statics at runtime to the cost of maintaining a GCHandle in the GC. As the number of statics increases, this could in theory become a performance problem, but given the typical usages of collectible assemblies, we do not expect this to be significant.
  2. For non-collectible thread statics, there is 1 GC pointer that is unconditionally reported for each thread. Usage of this removes a single indirection from every non-collectible thread local access. Given that this pointer is reported unconditionally, and is only a single pointer, this is not expected to be a significant cost.
  3. For collectible thread statics, there is a complex protocol to keep thread statics alive for just long enough, and to clean them up as needed. This is expected to be completely pay for play with regard to usage of thread local variables in collectible assemblies, and while slightly more expensive to run than the current logic, will reduce the cost of creation/destruction of threads by a much more significant factor. In addition, if there are no collectible thread statics used on the thread, the cost of this is only a few branches per lookup.

Perf impact of this change

I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.

- Delete DomainLocalModule and ThreadLocalModule
- Replumb the JIT to use a new set of helpers (in progress)
- Allocate static data on a per type basis instead of a per module basis
- Thread Local Statics are now stored in the same structures that the JIT can optimize (in progress)
- More scenarios can support pre-init, notably support for pre-init for cases with valuetype statics, but no Cctor
- Remove ModuleForStatics concept
- Remove ModuleId concept
- Remove ModuleIndex concept
- Remove ClassDomainID concept

Work still to be done
1. Finish support for R2R, and see if we can make it backcompat with the old R2R version
2. Support for the more optimized helpers (dynamic and pinned)
3. Re-enable jit helper expansions
4. Make sure SOS and the debugger continue to work
- GenericDictionaryExpansion re-used the DomainLocalBlck Crst type, so it now has a new one with the same Crst rules as it used to have
Use <= instead of < for TLS index compare
Unallocated TLSIndex is not 0xFFFFFFFF, which will make the existing checks fall back to doing the full work for generic TLS lookups.
- While I didn't do this for most of the Microsoft maintained architectures, there isn't much evidence at the moment that the hand coded assembly actually provides any value
…cInterface14

- This is to compensate for the existing GetDomainLocalModule* api no longer working
Copy link
Member

@mikem8361 mikem8361 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new DAC API look good to me. The implementation looks good too.

Merge branch 'main' of github.com:dotnet/runtime into make_normal_statics_simpler
@davidwrighton davidwrighton removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 5, 2024
Static variables in CoreCLR are handled by a combination of getting the "static base", and then adjusting it by an offset to get a pointer to the actual value.
We define the statics base as either non-gc or gc for each field.
Currently non-gc statics are any statics which are represented by primitive types (byte, sbyte, char, int, uint, long, ulong, float, double, pointers of various forms), and enums.
GC statics are any statics which are represented by classes or by non-primitive valuetypes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "non-primitive" in this case mean non-enums or non-blittable or some other concept? I assume it isn't the same as C# unmanaged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's any valuetype which when examined via GetVerifierCorElementType will return an CorElementType which is not ELEMENT_TYPE_VALUETYPE or a pointer type. Effectively, its defined as the set of valuetypes which are not eligible for being a non-gc static.

We define the statics base as either non-gc or gc for each field.
Currently non-gc statics are any statics which are represented by primitive types (byte, sbyte, char, int, uint, long, ulong, float, double, pointers of various forms), and enums.
GC statics are any statics which are represented by classes or by non-primitive valuetypes.
For struct statics, the static variable is actually a pointer to a boxed instance of the structure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see "valuetypes" above but "struct" here. Just want to make sure we are being consistent about the topic. I personally prefer "valuetypes", but "struct" works if that is clearer.

#endif //!DACCESS_COMPILE

// Do not use except in DAC and profiler scenarios
inline PTR_BYTE GetNonGCThreadStaticsBasePointer(PTR_Thread pThread);
inline PTR_BYTE GetGCThreadStaticsBasePointer(PTR_Thread pThread);

inline DWORD IsDynamicStatics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this?


void MethodTable::GetStaticsOffsets(StaticsOffsetType offsetType, bool fGenericStatics, uint32_t *dwGCOffset, uint32_t *dwNonGCOffset)
{
if (offsetType == StaticsOffsetType::Normal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract?

@AaronRobinsonMSFT
Copy link
Member

One last suggestion. I would built a Checked runtime and run a few tests locally with DOTNET_GCStress=3 or DOTNET_GCStress=C.

@AaronRobinsonMSFT
Copy link
Member

/azp list

Copy link

CI/CD Pipelines for this repository:

@AaronRobinsonMSFT
Copy link
Member

run runtime-coreclr gcstress0x3-gcstress0xc

@davidwrighton
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidwrighton
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidwrighton
Copy link
Member Author

Failures in GCStress are not new, it turns out that the tests on Windows Arm64 run somewhat faster with this change under GCStress which actually makes the test failures be reported. Test failures are simply a timeout, which is difficult to reproduce locally.

Also this fix now contains a change for the src/tests/JIT/Regression/CLR-x86-JIT/V1.2-M02/b138117/b138117.il test, which was testing an invalid scenario. In conjunction with Jakob, I went back to the original bug this was a regression for, and changed the test to both cover problem that bug covered as well as make it be fully legal and reliable IL.

Remaining test failure in normal PR leg is a known issue.

@@ -509,6 +509,7 @@ void BulkStaticsLogger::LogAllStatics()
CONTRACTL_END;

{
// TODO: This code does not appear to find all generic instantiations of types, and thus does not log ALL statics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwrighton should this be a GH issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants