Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add randomized allocation sampling #100356

Closed
wants to merge 95 commits into from
Closed
Changes from 59 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
3815aa8
This is an incomplete implementation of a new randomized allocation s…
noahfalk Feb 8, 2024
6a92fcb
Initial update
chrisnas Feb 21, 2024
cdbffc3
Update based on feedback
chrisnas Feb 22, 2024
924afd0
Rename fast_alloc_helper_limit_ptr to alloc_sampling
chrisnas Feb 23, 2024
e0ce093
Remove trailing whitespace in markdown
chrisnas Feb 23, 2024
396c05e
Add sampling threshold computation
chrisnas Feb 23, 2024
91b8b79
Take feedback into account and start to implement AllocationSampled e…
chrisnas Feb 27, 2024
65ff391
Fix typo
chrisnas Feb 27, 2024
b25adc6
Take review into account
chrisnas Feb 29, 2024
d939cdd
Emit the AllocationSampled event
chrisnas Mar 1, 2024
9b0f7da
Fix threshold computation error
chrisnas Mar 14, 2024
8875a4d
Deal with empty allocation context when fixing them
chrisnas Mar 16, 2024
f39a732
Handle the case of objects larger than an allocation context
chrisnas Mar 18, 2024
fbc0b6b
Take review into account
chrisnas Mar 21, 2024
32eb2fa
Take review into account for dynamically sampling
chrisnas Mar 22, 2024
518b4f7
Adding a first test that AllocationSampled is emitted when keyword/ve…
chrisnas Mar 26, 2024
b596f33
Check type name in AllocationSampled test
chrisnas Mar 27, 2024
984ed95
Compare perf impact of allocation sampling
chrisnas Mar 28, 2024
1c46a5d
Fix tests
chrisnas Apr 10, 2024
44e914b
Update tests and doc
chrisnas Apr 16, 2024
9eba36e
Update review
chrisnas Apr 19, 2024
b7614fd
Update based on review
chrisnas Apr 19, 2024
cbc5a12
Add tests and simple framework to measure statistical distribution of…
chrisnas Apr 19, 2024
826d8eb
Add x1/x2/x3 ratio check
chrisnas Apr 22, 2024
a56a5ee
Update array of double implementation for 32 bits
chrisnas Apr 22, 2024
48dbfa1
Updates
chrisnas May 7, 2024
7280c13
Fix markdown
chrisnas May 7, 2024
fd8a1b9
Add AllocationsRunEventSource to compute percentiles more easily
chrisnas May 8, 2024
a4aef1a
Allow percentiles computation
chrisnas May 9, 2024
08ec139
Add ratio based array allocations
chrisnas May 10, 2024
0bbbfa9
take review into account
chrisnas May 13, 2024
cf29e9b
Refactor the code and update upscaling method
chrisnas May 14, 2024
26e8fe0
Add runs for allocations
chrisnas May 14, 2024
52d4071
Handle padding for objects alignment
chrisnas May 16, 2024
db506a3
take GC padding into account
chrisnas May 16, 2024
8522866
Update allocation sampling distribution results
chrisnas May 16, 2024
e1f8ece
Update benchmarks with events
chrisnas May 16, 2024
32f6d31
Beginning of NativeAOT support
chrisnas May 18, 2024
9e31a6e
Fix trailing spaces
chrisnas May 21, 2024
f9f5224
Updates based on PR comments
chrisnas May 21, 2024
97a3bbd
Proposed changes
noahfalk May 21, 2024
734b773
Fix README
noahfalk May 21, 2024
a6120ac
- fix Alloc helper to get the address of the allocated object for All…
chrisnas May 22, 2024
3d0bcd1
Update the design doc
noahfalk May 23, 2024
549e7b0
Merge branch 'main' into chrisnas/alloc_sampling
chrisnas May 23, 2024
147447a
Fix compilation errors for Linux and related to merged changes in Thr…
chrisnas May 24, 2024
2b41997
Fix compilation issue
chrisnas May 24, 2024
4618b37
Restore Thread randomizer initialization
chrisnas May 24, 2024
918fe0c
Remove the randomizer from Thread to have just a singleton
chrisnas May 27, 2024
e82162d
Add a lazily allocated CLRRandom in Thread class
chrisnas May 28, 2024
09eae74
Start a no op implementation of randomized sampling for NativeAOT
chrisnas May 28, 2024
6badfb2
Fix typos in markdown
chrisnas May 29, 2024
d0e46a4
Fix linux compilation errors
chrisnas May 29, 2024
ef5b819
Make alloc context randomizer per thread lazily allocated
chrisnas May 29, 2024
ff77f7a
Fix missing rename in assembly code to use combined_limit instead of …
chrisnas May 29, 2024
77cb736
Fix build errors dues to explicit net8.0 usage in tests
chrisnas May 29, 2024
c40b1a2
Filter out AllocationTick when AllocationSampled is enabled
chrisnas May 30, 2024
a62ea75
Try a fix for build errors related to .csproj that are not really tests
chrisnas May 30, 2024
07ab88b
Avoid touching the IGCToCLREventSink interface because AllocationSampled
chrisnas May 30, 2024
d04b655
Update AllocationTick filtering implementation
chrisnas May 31, 2024
d565045
Fix double[] implementation
chrisnas May 31, 2024
b7bd18f
Fix test due to added field in the event payload
chrisnas May 31, 2024
7cbcb43
Implement emitting the AllocationSampled event in NativeAOT
chrisnas May 31, 2024
86f24c6
First no op implementation for NativeAOT with updated assembly code
chrisnas May 31, 2024
51d9935
Fix typos
chrisnas Jun 3, 2024
a9ef0e5
Fix DEBUG compilation error
chrisnas Jun 3, 2024
dd7e00d
Fix linux/x86 compilation errors
chrisnas Jun 3, 2024
a86a98b
Fix build error
chrisnas Jun 3, 2024
512f011
Try to fix "static" related compilation errors
chrisnas Jun 3, 2024
532f095
Fix test in 32 bit
chrisnas Jun 4, 2024
4fc9b8d
Avoid running AllocationSampled test with Mono + Fix compilation issue
chrisnas Jun 8, 2024
9174212
Start to implement the randomizer with existing 32 bit random
chrisnas Jun 8, 2024
541f59f
Fix compilation error
chrisnas Jun 12, 2024
40f96f3
Allow eventing functions to be called from thread.cpp
chrisnas Jun 14, 2024
29c16b5
Merge branch 'main' into chrisnas/alloc_sampling
chrisnas Jun 17, 2024
20c6d51
Fix assert in AOT
chrisnas Jun 17, 2024
c5ccad8
Fix link issue
chrisnas Jun 24, 2024
39afdf2
Fix NativeAOT build
chrisnas Jun 25, 2024
b2db19e
Merge branch 'main' into chrisnas/alloc_sampling
chrisnas Jun 26, 2024
e4ef681
Fix missing references to gc_alloc_context
chrisnas Jun 26, 2024
abfe6cb
Fix compilation issues
chrisnas Jun 27, 2024
997a293
Update x86 implementation
chrisnas Jun 28, 2024
041e397
Fix
chrisnas Jun 28, 2024
e312bf6
Fixes
chrisnas Jul 1, 2024
ec16fa0
Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling
chrisnas Jul 5, 2024
fd382d3
Fix rebase
chrisnas Jul 5, 2024
a2e30cc
Fix linux gcc build issues
chrisnas Jul 5, 2024
83611b8
Add missing alloc context fixup and randomizer code
chrisnas Jul 9, 2024
c908fe4
Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling
chrisnas Jul 9, 2024
dc02b7c
Fix issue when TLS for alloc context is not initialized
chrisnas Jul 10, 2024
85625cd
Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling
chrisnas Jul 10, 2024
911d5d7
Change randomizer to use sxoshiro128++ for better statistical distrib…
chrisnas Jul 11, 2024
91d58dc
Fix possible crash in GCInterface_GetTotalAllocatedBytesPrecise
chrisnas Jul 11, 2024
11177d9
Fix possible desync between gc_alloc_context and combined_limit
chrisnas Jul 12, 2024
ce40d3d
Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling
chrisnas Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
332 changes: 332 additions & 0 deletions docs/design/features/RandomizedAllocationSampling.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions src/coreclr/debug/daccess/dacdbiimpl.cpp
Original file line number Diff line number Diff line change
@@ -6543,10 +6543,10 @@ HRESULT DacHeapWalker::Init(CORDB_ADDRESS start, CORDB_ADDRESS end)
j++;
}
}
if ((&g_global_alloc_context)->alloc_ptr != nullptr)
if (g_global_alloc_context->alloc_ptr != nullptr)
{
mAllocInfo[j].Ptr = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_ptr;
mAllocInfo[j].Limit = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_limit;
mAllocInfo[j].Ptr = (CORDB_ADDRESS)g_global_alloc_context->alloc_ptr;
mAllocInfo[j].Limit = (CORDB_ADDRESS)g_global_alloc_context->alloc_limit;
}

mThreadCount = j;
12 changes: 6 additions & 6 deletions src/coreclr/debug/daccess/request.cpp
Original file line number Diff line number Diff line change
@@ -720,8 +720,8 @@ ClrDataAccess::GetThreadAllocData(CLRDATA_ADDRESS addr, struct DacpAllocData *da

Thread* thread = PTR_Thread(TO_TADDR(addr));

data->allocBytes = TO_CDADDR(thread->m_alloc_context.alloc_bytes);
data->allocBytesLoh = TO_CDADDR(thread->m_alloc_context.alloc_bytes_uoh);
data->allocBytes = TO_CDADDR(thread->m_alloc_context.gc_alloc_context.alloc_bytes);
data->allocBytesLoh = TO_CDADDR(thread->m_alloc_context.gc_alloc_context.alloc_bytes_uoh);

SOSDacLeave();
return hr;
@@ -775,8 +775,8 @@ ClrDataAccess::GetThreadData(CLRDATA_ADDRESS threadAddr, struct DacpThreadData *
threadData->osThreadId = (DWORD)thread->m_OSThreadId;
threadData->state = thread->m_State;
threadData->preemptiveGCDisabled = thread->m_fPreemptiveGCDisabled;
threadData->allocContextPtr = TO_CDADDR(thread->m_alloc_context.alloc_ptr);
threadData->allocContextLimit = TO_CDADDR(thread->m_alloc_context.alloc_limit);
threadData->allocContextPtr = TO_CDADDR(thread->m_alloc_context.gc_alloc_context.alloc_ptr);
threadData->allocContextLimit = TO_CDADDR(thread->m_alloc_context.gc_alloc_context.alloc_limit);

threadData->fiberData = (CLRDATA_ADDRESS)NULL;

@@ -5347,8 +5347,8 @@ HRESULT ClrDataAccess::GetGlobalAllocationContext(
}

SOSDacEnter();
*allocPtr = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_ptr);
*allocLimit = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_limit);
*allocPtr = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_ptr);
*allocLimit = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_limit);
SOSDacLeave();
return hr;
}
8 changes: 1 addition & 7 deletions src/coreclr/gc/gc.cpp
Original file line number Diff line number Diff line change
@@ -44076,7 +44076,7 @@ size_t gc_heap::decommit_region (heap_segment* region, int bucket, int h_number)
{
#ifdef MULTIPLE_HEAPS
// In return_free_region, we set heap_segment_heap (region) to nullptr so we cannot use it here.
// but since all heaps share the same mark array we simply pick the 0th heap to use. 
// but since all heaps share the same mark array we simply pick the 0th heap to use.
gc_heap* hp = g_heaps [0];
#else
gc_heap* hp = pGenGCHeap;
@@ -49293,7 +49293,6 @@ bool GCHeap::StressHeap(gc_alloc_context * context)
} \
} while (false)

#ifdef FEATURE_64BIT_ALIGNMENT
// Allocate small object with an alignment requirement of 8-bytes.
Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags)
{
@@ -49359,7 +49358,6 @@ Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t

return newAlloc;
}
#endif // FEATURE_64BIT_ALIGNMENT

Object*
GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_DCL)
@@ -49420,15 +49418,11 @@ GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_
}
else
{
#ifdef FEATURE_64BIT_ALIGNMENT
if (flags & GC_ALLOC_ALIGN8)
{
newAlloc = AllocAlign8 (acontext, hp, size, flags);
}
else
#else
assert ((flags & GC_ALLOC_ALIGN8) == 0);
#endif
{
newAlloc = (Object*) hp->allocate (size + ComputeMaxStructAlignPad(requiredAlignment), acontext, flags);
}
6 changes: 6 additions & 0 deletions src/coreclr/gc/gcee.cpp
Original file line number Diff line number Diff line change
@@ -308,6 +308,12 @@ void gc_heap::fire_etw_allocation_event (size_t allocation_amount,
uint8_t* object_address,
size_t object_size)
{
// do not emit AllocationTick event if AllocationSampled is enabled
if (EVENT_ENABLED(AllocationSampled))
{
return;
}

#ifdef FEATURE_NATIVEAOT
FIRE_EVENT(GCAllocationTick_V1, (uint32_t)allocation_amount, (uint32_t)gen_to_oh (gen_number));
#else
5 changes: 5 additions & 0 deletions src/coreclr/gc/gcevents.h
Original file line number Diff line number Diff line change
@@ -52,5 +52,10 @@ DYNAMIC_EVENT(CommittedUsage, GCEventLevel_Information, GCEventKeyword_GC, 1)
DYNAMIC_EVENT(HeapCountTuning, GCEventLevel_Information, GCEventKeyword_GC, 1)
DYNAMIC_EVENT(HeapCountSample, GCEventLevel_Information, GCEventKeyword_GC, 1)

// non GC event that overrides AllocationTick
// KNOWN_EVENT macro cannot be used because we don't want to emit the event; i.e. adding a method to IGCToCLREventSink
// KNOWN_EVENT(AllocationSampled, GCEventProvider_Default, GCEventLevel_Information, EventKeyword_AllocationSampling)
inline bool GCEventEnabledAllocationSampled() { return GCEventStatus::IsEnabled(GCEventProvider_Default, EventKeyword_AllocationSampling, GCEventLevel_Information); }

#undef KNOWN_EVENT
#undef DYNAMIC_EVENT
8 changes: 7 additions & 1 deletion src/coreclr/gc/gcinterface.h
Original file line number Diff line number Diff line change
@@ -265,7 +265,7 @@ enum GCEventLevel
// Event keywords corresponding to events that can be fired by the GC. These
// numbers come from the ETW manifest itself - please make changes to this enum
// if you add, remove, or change keyword sets that are used by the GC!
enum GCEventKeyword
enum GCEventKeyword : uint64_t
Copy link
Member

@jkotas jkotas May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum is used by the GC/EE interface. Changing the size of the enum to uint64_t would require revving GC_INTERFACE_MAJOR_VERSION

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that @Maoni0 would prefer to avoid revving the GC interface major version and do this in some other way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk @brianrob: the current implementation of the provider/keyword/verbosity status update in EtwCallbackCommon casts the 64 bit received keyword into a 32 bit enum for the GC. Unfortunately, the AllocationSamplingKeyword is beyond this range.
I did not realize (thanks for pointing this out @jkotas) that it would require to change the signature of the GC interface to support it.
Any idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I'm not seeing something that I would recommend. I can think of at least one way to do this, but I don't even want to say it because I think it could break us into eventual jail. The idea would be to use a bit that is already used on the ETW keyword mask, but not sent to the GC because the GC doesn't currently use it. The risk is that if the GC ever wanted to use said bit, then there would be a problem.

So far, revving the interface seems best to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just do the filtering on the VM side in the GCToCLREventSink no?

LIMITED_METHOD_CONTRACT;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be fine. I had been trying to avoid the interface call cost, but I think it will not be a big deal, and certainly more preferrable to changing the interface.

Copy link
Contributor Author

@chrisnas chrisnas May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By following the execution path of GCHeapUtilities::RecordEventStateChange (for Core and NativeAOT), I think that it should be possible to call a new GCHeapUtilities::AllocationSampledEventState(bool enable) based on the value of the received keyword. This would be non generic processing like what is done for triggering a full GC.
On the GC side, a simple boolean field in GCEventStatus would keep track of the state and be exposed like in my current code via GCEventEnabledAllocationSampled().
This would also allow enable/disable detection that does not seem possible today (once enabled, a provider/keyword/verbosity stays enabled) and... avoid the virtual call ;^)

Anyway... I've pushed your idea @noahfalk

{
GCEventKeyword_None = 0x0,
GCEventKeyword_GC = 0x1,
@@ -280,6 +280,12 @@ enum GCEventKeyword
GCEventKeyword_ManagedHeapCollect = 0x800000,
GCEventKeyword_GCHeapAndTypeNames = 0x1000000,
GCEventKeyword_GCSampledObjectAllocationLow = 0x2000000,

// Keyword outside of the GC where AllocationSampled event is defined.
// This is used to disable AllocationTick when AllocationSampling is enabled.
// Not counted in GCEventKeyword_All
EventKeyword_AllocationSampling = 0x80000000000,

GCEventKeyword_All = GCEventKeyword_GC
| GCEventKeyword_GCPrivate
| GCEventKeyword_GCHandle
2 changes: 0 additions & 2 deletions src/coreclr/gc/gcpriv.h
Original file line number Diff line number Diff line change
@@ -1465,9 +1465,7 @@ class gc_heap
friend struct ::alloc_context;
friend void ProfScanRootsHelper(Object** object, ScanContext *pSC, uint32_t dwFlags);
friend void GCProfileWalkHeapWorker(BOOL fProfilerPinned, BOOL fShouldWalkHeapRootsForEtw, BOOL fShouldWalkHeapObjectsForEtw);
#ifdef FEATURE_64BIT_ALIGNMENT
friend Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags);
#endif //FEATURE_64BIT_ALIGNMENT
friend class t_join;
friend class gc_mechanisms;
friend class seg_free_spaces;
2 changes: 1 addition & 1 deletion src/coreclr/inc/dacvars.h
Original file line number Diff line number Diff line change
@@ -140,7 +140,7 @@ DEFINE_DACVAR(ProfControlBlock, dac__g_profControlBlock, ::g_profControlBlock)
DEFINE_DACVAR(PTR_DWORD, dac__g_card_table, ::g_card_table)
DEFINE_DACVAR(PTR_BYTE, dac__g_lowest_address, ::g_lowest_address)
DEFINE_DACVAR(PTR_BYTE, dac__g_highest_address, ::g_highest_address)
DEFINE_DACVAR(gc_alloc_context, dac__g_global_alloc_context, ::g_global_alloc_context)
DEFINE_DACVAR(UNKNOWN_POINTER_TYPE, dac__g_global_alloc_context, ::g_global_alloc_context)

DEFINE_DACVAR(IGCHeap, dac__g_pGCHeap, ::g_pGCHeap)

12 changes: 7 additions & 5 deletions src/coreclr/inc/eventtracebase.h
Original file line number Diff line number Diff line change
@@ -1323,17 +1323,19 @@ namespace ETW
#define ETWLoaderStaticLoad 0 // Static reference load
#define ETWLoaderDynamicLoad 1 // Dynamic assembly load

#if defined(FEATURE_EVENT_TRACE)
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context;
#endif // FEATURE_EVENT_TRACE

#if defined(FEATURE_EVENT_TRACE) && !defined(HOST_UNIX)
//
// The ONE and only ONE global instantiation of this class
//
extern ETW::CEtwTracer * g_pEtwTracer;

EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context;
EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context;

//
// Special Handling of Startup events
//
1 change: 1 addition & 0 deletions src/coreclr/nativeaot/Runtime/AsmOffsets.h
Original file line number Diff line number Diff line change
@@ -55,6 +55,7 @@ ASM_OFFSET( 44, 70, Thread, m_pvHijackedReturnAddress)
ASM_OFFSET( 48, 78, Thread, m_uHijackedReturnValueFlags)
ASM_OFFSET( 4c, 80, Thread, m_pExInfoStackHead)
ASM_OFFSET( 50, 88, Thread, m_threadAbortException)
ASM_OFFSET( 54, 90, Thread, m_combined_limit)

ASM_SIZEOF( 14, 20, EHEnum)

50 changes: 50 additions & 0 deletions src/coreclr/nativeaot/Runtime/GCHelpers.cpp
Original file line number Diff line number Diff line change
@@ -29,6 +29,16 @@

#include "gcdesc.h"

#ifdef FEATURE_EVENT_TRACE
#include "clretwallmain.h"
#else // FEATURE_EVENT_TRACE
#include "etmdummy.h"
#endif // FEATURE_EVENT_TRACE

// TODO: used for dynamic allocation sampling
// but generate duplicated symbols
//#include "..\..\..\inc\sstring.h"

#define RH_LARGE_OBJECT_SIZE 85000

MethodTable g_FreeObjectEEType;
@@ -471,6 +481,43 @@ EXTERN_C int64_t QCALLTYPE RhGetTotalAllocatedBytesPrecise()
return allocated;
}

inline void FireAllocationSampled(GC_ALLOC_FLAGS flags, size_t size, size_t samplingBudgetOffset, Object* orObject)
{
// Note: this code is duplicated from GCToCLREventSink::FireGCAllocationTick_V4
void* typeId = nullptr;
const WCHAR* name = nullptr;
// TODO: this does not compile due to duplicated symbols when sstring.h is included
//InlineSString<MAX_CLASSNAME_LENGTH> strTypeName;
//EX_TRY
//{
// TypeHandle th = GetThread()->GetTHAllocContextObj();

// if (th != 0)
// {
// th.GetName(strTypeName);
// name = strTypeName.GetUnicode();
// typeId = th.GetMethodTable();
// }
//}
//EX_CATCH{}
//EX_END_CATCH(SwallowAllExceptions)
// end of duplication

if (typeId != nullptr)
{
unsigned int allocKind =
(flags & GC_ALLOC_PINNED_OBJECT_HEAP) ? 2 :
(flags & GC_ALLOC_LARGE_OBJECT_HEAP) ? 1 :
0; // SOH
unsigned int heapIndex = 0;
#ifdef BACKGROUND_GC
gc_heap* hp = gc_heap::heap_of((BYTE*)orObject);
heapIndex = hp->heap_number;
#endif
FireEtwAllocationSampled(allocKind, GetClrInstanceId(), typeId, name, heapIndex, (BYTE*)orObject, size, samplingBudgetOffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may already have this on a list somewhere, but name is always null (I'm not sure that nativeaot even has the type name data available...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may already have this on a list somewhere, but name is always null (I'm not sure that nativeaot even has the type name data available...)

I'm reusing the same code as AllocationTick to get the type name and it is working in the tests for Core. I'll try the same code for NativeAOT when the compilations errors are fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I suspect that nativeaot doesn't even have the type names, so it may just always need to be null, but wanted to call it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @brianrob!
I missed that your point was for NativeAOT only!

}
}

static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t numElements, Thread* pThread)
{
ASSERT(!pThread->IsDoNotTriggerGcSet());
@@ -539,6 +586,9 @@ static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t
// Save the MethodTable for instrumentation purposes.
tls_pLastAllocationEEType = pEEType;

// TODO: handle dynamic allocation sampling
//ee_alloc_context* acontext = pThread->GetEEAllocContext();

Object* pObject = GCHeapUtilities::GetGCHeap()->Alloc(pThread->GetAllocContext(), cbSize, uFlags);
if (pObject == NULL)
return NULL;
6 changes: 3 additions & 3 deletions src/coreclr/nativeaot/Runtime/amd64/AllocFast.S
Original file line number Diff line number Diff line change
@@ -28,7 +28,7 @@ NESTED_ENTRY RhpNewFast, _TEXT, NoHandler

mov rsi, [rax + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
add rdx, rsi
cmp rdx, [rax + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp rdx, [rax + OFFSETOF__Thread__m_combined_limit]
ja LOCAL_LABEL(RhpNewFast_RarePath)

// set the new alloc pointer
@@ -143,7 +143,7 @@ NESTED_ENTRY RhNewString, _TEXT, NoHandler
// rcx == Thread*
// rdx == string size
// r12 == element count
cmp rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp rax, [rcx + OFFSETOF__Thread__m_combined_limit]
ja LOCAL_LABEL(RhNewString_RarePath)

mov [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
@@ -226,7 +226,7 @@ NESTED_ENTRY RhpNewArray, _TEXT, NoHandler
// rcx == Thread*
// rdx == array size
// r12 == element count
cmp rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp rax, [rcx + OFFSETOF__Thread__m_combined_limit]
ja LOCAL_LABEL(RhpNewArray_RarePath)

mov [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
6 changes: 3 additions & 3 deletions src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@ LEAF_ENTRY RhpNewFast, _TEXT

mov rax, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
add r8, rax
cmp r8, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp r8, [rdx + OFFSETOF__Thread__m_combined_limit]
ja RhpNewFast_RarePath

;; set the new alloc pointer
@@ -118,7 +118,7 @@ LEAF_ENTRY RhNewString, _TEXT
; rdx == element count
; r8 == array size
; r10 == thread
cmp rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp rax, [r10 + OFFSETOF__Thread__m_combined_limit]
ja RhpNewArrayRare

mov [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
@@ -179,7 +179,7 @@ LEAF_ENTRY RhpNewArray, _TEXT
; rdx == element count
; r8 == array size
; r10 == thread
cmp rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit]
cmp rax, [r10 + OFFSETOF__Thread__m_combined_limit]
ja RhpNewArrayRare

mov [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
2 changes: 0 additions & 2 deletions src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc
Original file line number Diff line number Diff line change
@@ -337,8 +337,6 @@ TSF_DoNotTriggerGc equ 10h
;; Rename fields of nested structs
;;
OFFSETOF__Thread__m_alloc_context__alloc_ptr equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
OFFSETOF__Thread__m_alloc_context__alloc_limit equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit



;; GC type flags
10 changes: 5 additions & 5 deletions src/coreclr/nativeaot/Runtime/arm/AllocFast.S
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@ LEAF_ENTRY RhpNewFast, _TEXT

ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_ptr]
add r2, r3
ldr r1, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
ldr r1, [r0, #OFFSETOF__Thread__m_combined_limit]
cmp r2, r1
bhi LOCAL_LABEL(RhpNewFast_RarePath)

@@ -132,7 +132,7 @@ LEAF_ENTRY RhNewString, _TEXT
adds r6, r12
bcs LOCAL_LABEL(RhNewString_RarePath) // if we get a carry here, the string is too large to fit below 4 GB

ldr r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
ldr r12, [r0, #OFFSETOF__Thread__m_combined_limit]
cmp r6, r12
bhi LOCAL_LABEL(RhNewString_RarePath)

@@ -213,7 +213,7 @@ LOCAL_LABEL(ArrayAlignSize):
adds r6, r12
bcs LOCAL_LABEL(RhpNewArray_RarePath) // if we get a carry here, the array is too large to fit below 4 GB

ldr r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
ldr r12, [r0, #OFFSETOF__Thread__m_combined_limit]
cmp r6, r12
bhi LOCAL_LABEL(RhpNewArray_RarePath)

@@ -349,7 +349,7 @@ LEAF_ENTRY RhpNewFastAlign8, _TEXT
// Determine whether the end of the object would lie outside of the current allocation context. If so,
// we abandon the attempt to allocate the object directly and fall back to the slow helper.
add r2, r3
ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
ldr r3, [r0, #OFFSETOF__Thread__m_combined_limit]
cmp r2, r3
bhi LOCAL_LABEL(Alloc8Failed)

@@ -412,7 +412,7 @@ LEAF_ENTRY RhpNewFastMisalign, _TEXT
// Determine whether the end of the object would lie outside of the current allocation context. If so,
// we abandon the attempt to allocate the object directly and fall back to the slow helper.
add r2, r3
ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
ldr r3, [r0, #OFFSETOF__Thread__m_combined_limit]
cmp r2, r3
bhi LOCAL_LABEL(BoxAlloc8Failed)

Loading