[mono][interp] Reduce false pinning from interp stack #100400

BrzVlad · 2024-03-28T11:33:06Z

Interpreter opcodes operate on the interp stack, an area of memory separately allocated. Each interp var will have an allocated stack offset in the current interpreter stack frame. When we allocate the storage for an interp var we can take into account the var type. If the type can represent a potential ref to an object or an interior ref then we mark the pointer slot as potentially containing refs, for the method that is being compiled.

During GC, we used to conservatively scan the entire interp stack space used by each thread. After this change, in the first stage, we do a stack walkwhere we detect slots in each interp frame where no refs can reside. We mark these slots in a bit array. Afterwards we conservatively scan the interp stack of the thread, while ignoring slots that were previously marked as not containing any refs.

System.Runtime.Tests suite was used for testing the effectiveness of the change, by computing the cumulative number of pinned objects throughout all GCs (about 1100).
minijit - avg 702000 pinned objects
old-interp - avg 641000 pinned objects
precise-interp - avg 578000 pinned objects

This resulted in 10% reduction in the number of pinned objects during collection. This change is meant to reduce memory usage of apps by making objects die earlier. We could further improve by being more precise. For example, for call sites we could reuse liveness information to precisely know which slots actually contain refs. This is a bit more complex to implement and it is unclear yet how impactful it would be.

lambdageek · 2024-04-02T23:20:03Z

src/mono/mono/mini/interp/interp.c

+
+	int slot_index = 0;
+	for (gpointer *p = (gpointer*)context->stack_start; p < (gpointer*)context->stack_pointer; p++) {
+		if (context->no_ref_slots && (context->no_ref_slots [slot_index / 8] & (1 << (slot_index % 8))))
+			;// This slot is marked as no ref, we don't scan it
+		else
+			func (p, gc_data);
+		slot_index++;


Just summarizing so I understand how this all works.

The way this works is that sgen calls interp_mark_stack twice: once during the conservative phase and once during the precise phase (if the precise phase is enabled). This code runs during the conservative phase. The no_ref_slots bits are conservative: we're either sure there's definitely no managed pointer in the slot, or we're not sure what's in the slot. This is because sometimes we push managed pointers with MONO_TYPE_I.

So if we're sure the slot isn't a pointer, we don't scan it at all. otherwise if we're not sure - we scan it, and possibly create false pinning.

So it's not important that this PR precisely tracks every single managed pointer opcode. But if we see a slot that can't possibly contain a managed pointer, we may mark it and potentially avoid some false pinning.

/cc @cshung FYI

I would add for clarity that the ambiguity of MONO_TYPE_I is not really that relevant to the pinning story. Let's say we have two vars, one int32 and one of type object that are allocated at the same offset, since the liveness doesn't interesect. Because we mark these ref bits for the whole scope of the method, the offset will be marked as potentially containing a ref. At the moment of suspend, in theory, we could tell exactly whether the int32 var or the object var is alive at the moment of execution, or maybe none of them. This would probably reduce most of the remaining false pinning.

kotlarmilos · 2024-04-03T07:58:35Z

src/mono/mono/mini/interp/interp.c

@@ -412,6 +412,8 @@ get_context (void)
 	if (context == NULL) {
 		context = g_new0 (ThreadContext, 1);
 		context->stack_start = (guchar*)mono_valloc_aligned (INTERP_STACK_SIZE, MINT_STACK_ALIGNMENT, MONO_MMAP_READ | MONO_MMAP_WRITE, MONO_MEM_ACCOUNT_INTERP_STACK);
+		// A bit for every pointer sized slot in the stack. FIXME don't allocate whole bit array
+		context->no_ref_slots = (guchar*)mono_valloc (NULL, INTERP_STACK_SIZE / (8 * sizeof (gpointer)), MONO_MMAP_READ | MONO_MMAP_WRITE, MONO_MEM_ACCOUNT_INTERP_STACK);


Is INTERP_STACK_SIZE the max stack size?

Yes. It is 1MB statically allocated and never changed / increased.

kotlarmilos · 2024-04-03T08:01:26Z

src/mono/mono/mini/interp/interp.c

+			gsize global_slot_index = current - (gpointer*)context->stack_start;
+			gsize table_index = global_slot_index / 8;
+			int bit_index = global_slot_index % 8;
+			context->no_ref_slots [table_index] |= 1 << bit_index;


Can we use mono_bitset_set_fast here?

We could, but the use case was simple enough that I went with the manual implementation.

src/mono/mono/mini/interp/transform.c

kotlarmilos · 2024-04-03T08:51:34Z

src/mono/mono/mini/interp/transform.c

+		guint32 old_size = td->ref_slots ? (guint32)td->ref_slots->size : 0;
+		guint32 new_size = old_size ? old_size * 2 : 32;
+
+		gpointer mem = mono_mempool_alloc0 (td->mempool, mono_bitset_alloc_size (new_size, 0));


How do you decide whether to use mono_mem_manager_alloc0 or mono_mem_manager_alloc0?

td->mempool is a mempool used by interpreter compiler for any temporary data used during compilation. The mempool is destroyed once method is finished compiling. mono_mem_manager_alloc0 is used for memory used during execution, so it remains alive for the entire duration of the application.

Interpreter opcodes operate on the interp stack, an area of memory separately allocated. Each interp var will have an allocated stack offset in the current interpreter stack frame. When we allocate the storage for an interp var we can take into account the var type. If the type can represent a potential ref to an object or an interior ref then we mark the pointer slot as potentially containing refs, for the method that is being compiled. During GC, we used to conservatively scan the entire interp stack space used by each thread. After this change, in the first stage, we do a stack walkwhere we detect slots in each interp frame where no refs can reside. We mark these slots in a bit array. Afterwards we conservatively scan the interp stack of the thread, while ignoring slots that were previously marked as not containing any refs. System.Runtime.Tests suite was used for testing the effectiveness of the change, by computing the cumulative number of pinned objects throughout all GCs (about 1100). minijit - avg 702000 pinned objects old-interp - avg 641000 pinned objects precise-interp - avg 578000 pinned objects This resulted in 10% reduction in the number of pinned objects during collection. This change is meant to reduce memory usage of apps by making objects die earlier. We could further improve by being more precise. For example, for call sites we could reuse liveness information to precisely know which slots actually contain refs. This is a bit more complex to implement and it is unclear yet how impactful it would be.

A lot of times, when we were pushing a byref type on the stack during compilation, we would first get the mint_type which would be MINT_TYPE_I4/I8. From the mint_type we would then obtain the STACK_TYPE_I4/I8, losing information because it should have been STACK_TYPE_MP. Because of this, the underlying interp var would end up being created as MONO_TYPE_I4/I8 instead of MONO_TYPE_I. Add another method for pushing directly a MonoType, with less confusing indirections. Code around here could further be refactored. This is only relevant for GC stack scanning, since we would want to scan only slots containing MONO_TYPE_I.

* [mono][interp] Reduce false pinning from interp stack Interpreter opcodes operate on the interp stack, an area of memory separately allocated. Each interp var will have an allocated stack offset in the current interpreter stack frame. When we allocate the storage for an interp var we can take into account the var type. If the type can represent a potential ref to an object or an interior ref then we mark the pointer slot as potentially containing refs, for the method that is being compiled. During GC, we used to conservatively scan the entire interp stack space used by each thread. After this change, in the first stage, we do a stack walkwhere we detect slots in each interp frame where no refs can reside. We mark these slots in a bit array. Afterwards we conservatively scan the interp stack of the thread, while ignoring slots that were previously marked as not containing any refs. System.Runtime.Tests suite was used for testing the effectiveness of the change, by computing the cumulative number of pinned objects throughout all GCs (about 1100). minijit - avg 702000 pinned objects old-interp - avg 641000 pinned objects precise-interp - avg 578000 pinned objects This resulted in 10% reduction in the number of pinned objects during collection. This change is meant to reduce memory usage of apps by making objects die earlier. We could further improve by being more precise. For example, for call sites we could reuse liveness information to precisely know which slots actually contain refs. This is a bit more complex to implement and it is unclear yet how impactful it would be. * [mono][interp] Add option to disable precise scanning of stack * [mono][interp] Fix pushing of byrefs on execution stack A lot of times, when we were pushing a byref type on the stack during compilation, we would first get the mint_type which would be MINT_TYPE_I4/I8. From the mint_type we would then obtain the STACK_TYPE_I4/I8, losing information because it should have been STACK_TYPE_MP. Because of this, the underlying interp var would end up being created as MONO_TYPE_I4/I8 instead of MONO_TYPE_I. Add another method for pushing directly a MonoType, with less confusing indirections. Code around here could further be refactored. This is only relevant for GC stack scanning, since we would want to scan only slots containing MONO_TYPE_I.

dotnet-issue-labeler bot added the area-Codegen-Interpreter-mono label Mar 28, 2024

dotnet-policy-service bot assigned BrzVlad Mar 28, 2024

BrzVlad force-pushed the feature-interp-semi-precise branch 6 times, most recently from 697faf5 to fa185fc Compare April 2, 2024 09:53

build-analysis bot mentioned this pull request Apr 2, 2024

Wasm.Build.Tests.MT.Blazor.SimpleMultiThreadedTests timed out #100373

Open

BrzVlad marked this pull request as ready for review April 2, 2024 18:04

BrzVlad requested review from kotlarmilos, lambdageek and thaystg as code owners April 2, 2024 18:04

lambdageek reviewed Apr 2, 2024

View reviewed changes

lambdageek approved these changes Apr 2, 2024

View reviewed changes

lambdageek requested a review from cshung April 2, 2024 23:21

kotlarmilos approved these changes Apr 3, 2024

View reviewed changes

BrzVlad added 3 commits April 4, 2024 11:46

[mono][interp] Add option to disable precise scanning of stack

076a17e

BrzVlad force-pushed the feature-interp-semi-precise branch from fa42813 to 7efd380 Compare April 4, 2024 08:46

This was referenced Apr 4, 2024

The hosted runner encountered an error while running your job. (Error Type: Disconnect). dotnet/dnceng#1919

Open

[MT][browser] MutexTests.AbandonExisting #98830

Closed

BrzVlad merged commit 997f09c into dotnet:main Apr 5, 2024
76 of 79 checks passed

This was referenced Apr 5, 2024

Blazor WASM does not free memory - possibly a memory leak #64047

Open

iOS: "MONO interpreter: NIY encountered in method ReadTestCase_DataFromJson" when deserializing class with 200 fields #100753

Closed

lewing mentioned this pull request Apr 9, 2024

[Perf] Linux/x64: 37 Improvements on 4/5/2024 11:33:11 AM dotnet/perf-autofiling-issues#32481

Open

kotlarmilos mentioned this pull request Apr 11, 2024

[Perf] Linux/x64: 60 Improvements on 4/5/2024 11:33:11 AM dotnet/perf-autofiling-issues#32495

Closed

github-actions bot locked and limited conversation to collaborators May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono][interp] Reduce false pinning from interp stack #100400

[mono][interp] Reduce false pinning from interp stack #100400

BrzVlad commented Mar 28, 2024

lambdageek Apr 2, 2024

lambdageek Apr 2, 2024

BrzVlad Apr 3, 2024

kotlarmilos Apr 3, 2024

BrzVlad Apr 3, 2024

kotlarmilos Apr 3, 2024

BrzVlad Apr 3, 2024

kotlarmilos Apr 3, 2024

BrzVlad Apr 3, 2024

[mono][interp] Reduce false pinning from interp stack #100400

[mono][interp] Reduce false pinning from interp stack #100400

Conversation

BrzVlad commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment