diff --git a/docs/design/features/RandomizedAllocationSampling.md b/docs/design/features/RandomizedAllocationSampling.md
new file mode 100644
index 00000000000000..b27ff7cd208e46
--- /dev/null
+++ b/docs/design/features/RandomizedAllocationSampling.md
@@ -0,0 +1,332 @@
+# Randomized Allocation Sampling
+
+Christophe Nasarre (@chrisnas), Noah Falk (@noahfalk) - 2024
+
+## Introduction
+
+.NET developers want to understand the GC allocation behavior of their programs both for general observability and specifically to better understand performance costs. Although the runtime has a very high performance GC, reducing the number of bytes allocated in a scenario can have notable impact on the total execution time and frequency of GC pauses. Some ways developers understand these costs are measuring allocated bytes in:
+1. Microbenchmarks such as Benchmark.DotNet
+2. .NET APIs such as [GC.GetAllocatedBytesForCurrentThread()](https://learn.microsoft.com/dotnet/api/system.gc.getallocatedbytesforcurrentthread)
+3. Memory profiling tools such as VS profiler, PerfView, and dotTrace
+4. Metrics or other production telemetry
+
+Analysis of allocation behavior often starts simple using the total bytes allocated while executing a block of code or during some time duration. However for any non-trivial scenario gaining a deeper understanding requires attributing allocations to specific lines of source code, callstacks, types, and object sizes. .NET's current state of the art technique for doing this is using a profiling tool to sample using the [AllocationTick](https://learn.microsoft.com/en-us/dotnet/fundamentals/diagnostics/runtime-garbage-collection-events#gcallocationtick_v3-event) event. When enabled this event triggers approximately every time 100KB has been allocated. However this sampling is not a random sample. It has a fixed starting point and stride which can lead to significant sampling error for allocation patterns that are periodic. This has been observed in practice so it isn't merely a theoretical concern. The new randomized allocation sampling feature is intended to address the shortcomings of AllocationTick and offer more rigorous estimations of allocation behavior and probabilistic error bounds. We do this by creating a new `AllocationSampled` event that profilers can opt into via any of our standard event tracing technologies (ETW, EventPipe, Lttng, EventListener). The new event is completely independent of AllocationTick and we expect profilers will prefer to use the AllocationSampled event on runtime versions where it is available.
+
+The initial part of this document describe the conceptual sampling model and how we suggest the data be interpretted by profilers. The latter portion describes how the sampling model is implemented in runtime code efficiently.
+
+## The sampling model
+
+When the new AllocationSampled event is enabled, each managed thread starts sampling independent of one another. For a given thread there will be a sequence of allocated objects Object_1, Object_2, etc that may continue indefinitely. Each object has a corresponding .NET type and size. The size of an object includes the object header, method table, object fields, and trailing padding that aligns the size to be a multiple of the pointer size. It does not include any additional memory the GC may optionally allocate for more than pointer-sized alignment, filling gaps that are impossible/inefficient to use for objects, or other GC bookkeeping data structures. Also note that .NET does have a non-GC heap where some objects that stay alive for process lifetime are allocated. Those non-GC heap objects are ignored by this feature.
+
+When each new object is allocated, conceptually the runtime starts doing independent [Bernoulli Trials](https://en.wikipedia.org/wiki/Bernoulli_trial) (weighted coin flips) for every byte in the object. Each trial has probability p = 1/102,400 of being considered a success. As soon as one successful trial is observed no more trials are performed for that object and an AllocationSampled event is emitted. This event contains the object type, its size, and the 0-based offset of the byte where the successful trial occured. This means for a given object if an event was generated `offset` failed trials occured followed by a successful trial, and if no event is generated `size` failed trials occured. This process continues indefinitely for each new allocated object.
+
+This sampling process is closely related to the [Bernoulli process](https://en.wikipedia.org/wiki/Bernoulli_process) and is a well studied area of statistics. Skipping ahead to the end of an object once a successful sample has been produced does require some accomodations in the analysis, but many key results are still applicable.
+
+## Using the feature
+
+### Enabling sample events
+
+The allocation sampled events are enabled on the `Microsoft-Windows-DotNetRuntime` provider using keyword `0x80000000000` at informational or higher level. For more details on how to do this using different event tracing technologies see [here](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventsource-collect-and-view-traces).
+
+### Interpretting sample events
+
+Although diagnostic tools are free to interpret the data in whatever way, we have some recommendations for analysis that we expect is useful and statistically sound.
+
+#### Definitions
+
+For all of this section assume that we enabled the AllocationSampling events and have observed `s` such sample events were generated from a specific thread - `event_1`, `event_2`, ... `event_s`. Each `event_i` contains corresponding fields `type_i`, `size_i`, and `offset_i`. Let `u_i = size_i - offset_i`. `u_i` represents the successful trial byte + the number bytes which remained after it in the same object. Let `u` = the sum of all the `u_i`, `i` = 1 to `s`. `p` is the constant 1/102400, the probability that each trial is successful. `q` is the complement 1 - 1/102400.
+
+#### Estimation strategies
+
+We have explored two different mathematical approaches for [estimating](https://en.wikipedia.org/wiki/Estimator) the number of bytes that were allocated given a set of observed sample events. Both approaches are unbiased which means if we repeated the same sampling procedure many times we expect the average of the estimates to match the number of bytes allocated. Where the approaches differ is on the particular distribution of the estimates.
+
+#### Estimation Approach 1: Weighted samples
+
+We believe this approach gives estimates with lower [Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error) but the exact shape of the distribution is hard to calculate so we don't know a good way to produce [confidence intervals](https://en.wikipedia.org/wiki/Confidence_interval) based on small numbers of samples. The distribution does approach a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) as the number of samples increase ([Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem)) but we haven't done any analysis attempting to define how rapidly that convergence occurs.
+
+To estimate the number of bytes using this technique let `estimate_i = size_i/(1 - q^size_i)` for each sample `i`. Then sum `estimate_i` over all samples to get a total estimate of the allocated bytes. With sufficiently many samples the estimate distribution should converge to a normal distribution with variance at most `N*q/p` for `N` total bytes of allocated objects.
+
+##### Statistics stuff
+
+Understanding this part isn't necessary to use the estimation formula above but may be helpful.
+
+Proving the weighted sample estimator is unbiased:
+Consider the sequence of all objects allocated on a thread. Let `X_j` be a random indicator variable that has value `size_j/(1 - q^size_j)` if the `j`th object is sampled, otherwise zero. Our estimation formula above is the sum of all `X_j` because only the sampled objects will contribute a non-zero term. Based on our sampling procedure the probability for an object to be sampled is `1-q^size_j` which means `E(X_j) = size_j/(1 - q^size_j) * Pr(object j is sampled) = size_j/(1 - q^size_j) * (1 - q^size_j) = size_j`. By linearity of expectation, the expected value of the sum is the sum of the expected values = sum of `size_j` for all `j` = total size of allocated objects.
+
+The variance for this estimation is the sum of variances for each `X_j` term, `(size_j^2)*(q^size_j)/(1-q^size_j)`. If we assume there are `N` total bytes of objects divided up into `N/n` objects of size `n` then total variance for that set of objects would be `(N/n)*n^2*q^n/(1-q^n) = N*n*q^n/(1-q^n)`. That expression is maximized when `n=1` so the maximum variance for any collection objects with total size `N` is `N*1*q^1/(1-q^1) = N*q/(1-q) = N*q/p`.
+
+#### Estimation Approach 2: Estimating failed trials
+
+This is an alternate estimate that has a more predictable distribution, but potentially higher [Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error). You could use this approach to produce both estimates and confidence intervals, or use the weighted sample formula to produce estimates and use this one solely as a conservative confidence interval for the estimate.
+
+The estimation formula is `sq/p + u`.
+
+This estimate is based on the [Negative Bernoulli distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution) with `s` successes and `p` chance of success. The `sq/p` term is the mean of this distribution and represents the expected number of failed trials necessary to observe `s` successful trials. The `u` term then adds in the number of successful trials (1 per sample) and the number of bytes for which no trials were performed (`u_i-1` per sample).
+
+Here is an approach to calculate a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) estimate based on this distribution:
+
+1. Decide on some probability `C` that the actual number of allocated bytes `N` should fall within the interval. You can pick a probability arbitrarily close to 1 however the higher the probability is the wider the estimated interval will be. For the remaining `1-C` probability that `N` is not within the interval we will pick the upper and lower bounds so that there is a `(1-C)/2` chance that `N` is below the interval and `(1-C)/2` chance that `N` is above the interval. We think `C=0.95` would be a reasonable choice for many tools which means there would be a 2.5% chance the actual value is below the lower bound, a 95% chance it is between the lower and upper bound, and a 2.5% chance it is above the upper bound.
+
+2. Implement some method to calculate the Negative Binomial [CDF](https://en.wikipedia.org/wiki/Cumulative_distribution_function). Unfortunately there is no trivial formula for this but there are a couple potential approaches:
+    a. The Negative Binomial Distribution has a CDF defined based on the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function). There are various numerical libraries such as scipy in Python that will calculate this for you. Alternately you could directly implement numerical approximation techniques to evaluate the function, either approximating the integral form or approximating the continued fraction expansion.
+    b. The Camp-Paulson approximation described in [Barko(66)](https://www.stat.cmu.edu/technometrics/59-69/VOL-08-02/v0802345.pdf). We validated that for p=0.00001 this approximation was within ~0.01 of the true CDF for any number of failures at s=1, within ~0.001 of the true CDF at s=5, and continues to get more accurate as the sample count increases.
+
+3. Do binary search on the CDF to locate the input number of failures for which `CDF(failures, s, p)` is closest to `(1-C)/2` and `C + (1-C)/2`. Assuming that `CDF(low_failures, s, p) = (1-C)/2` and `CDF(high_failures, s, p) = C + (1-C)/2` then the confidence interval for `N` is `[low_failures+u, high_failures+u]`.
+
+For example if we select `C=0.95`, observed 8 samples and `u=10,908` then we'd use binary search to find `CDF(353666, 8, 1/102400) ~= 0.025` and `CDF(1476870, 8, 1/102400) ~= 0.975`. Our interval estimate for the number of bytes allocated would be `[353666 + 10908, 1476870 + 10908]`.
+
+To get a rough idea of the error in proportion to the number of samples, here is table of calculated Negative Binomial failed trials for the 0.025 and 0.975 thresholds of the CDF:
+
+| # of samples (successes)   | CDF() = 0.025           | CDF() = 0.975 |
+| ---------------------------| ------------------------| --------------------------- |
+| 1                          | 2591                    | 377738 |
+| 2                          | 24800                   | 570531 |
+| 3                          | 63349                   | 739802 |
+| 4                          | 111599                  | 897761 |
+| 5                          | 166241                  | 1048730 |
+| 6                          | 225469                  | 1194827 |
+| 7                          | 288185                  | 1337279 |
+| 8                          | 353666                  | 1476870 |
+| 9                          | 421407                  | 1614137 |
+| 10                         | 491039                  | 1749469 |
+| 20                         | 1250954                 | 3038270 |
+| 30                         | 2072639                 | 4264804 |
+| 40                         | 2926207                 | 5459335 |
+| 50                         | 3800118                 | 6633475 |
+| 100                        | 8331581                 | 12342053 |
+| 200                        | 17739679                | 23413825 |
+| 300                        | 27341465                | 34291862 |
+| 400                        | 37043463                | 45069676 |
+| 500                        | 46809487                | 55783459 |
+| 1000                       | 96149867                | 108842093 |
+| 2000                       | 195919830               | 213870137 |
+| 3000                       | 296301551               | 318286418 |
+| 4000                       | 396999923               | 422386047 |
+| 5000                       | 497900649               | 526283322 |
+| 10000                      | 1004017229              | 1044156743 |
+
+Notice that if we compare the expected total number of trials (102400 * # of samples) to the estimated ranges, at 10 samples the error bars extend more than 50% in each direction showing the predictions on so few samples are very imprecise. However at 1,000 samples the error is ~6% in each direction and at 10,000 samples ~2% in each direction.
+
+The variance for the Negative Binomial Distribution is `sq/p^2`. In the limit where all allocated objects have size 1 byte, `E(s)=Np` which gives an expected variance of `Nq/p`, the same as with the weighted sampled approach. However as object sizes increase the variance on approach 1 decreases more rapidly than in this approach.
+
+#### Compensating for bytes allocated on a thread in between events
+
+It is likely you want to estimate allocations starting and ending at arbitrary points in time that do not correspond exactly with the moment a sampling event was emitted. This means the initial sampling event covered more time than the allocation period we are interested in and the allocations at the end aren't included in any sampling event. You can conservatively adjust the error bounds to account for the uncertainty in the starting and ending allocations. If the starting point is not aligned with a sampling event calculate the lower bound of allocated bytes as if there was one fewer sample received. If the ending point is not aligned with a sampling event calculate the upper bound as if there was one more sample received.
+
+#### Estimating the total number of bytes allocated on all threads
+
+The per-thread estimations can be repeated for all threads and summed up.
+
+#### Estimating the number of bytes allocated for objects of a specific type, size or other characteristic
+
+Select from the sampling events only those events which occured in objects matching your criteria. For example if you want to estimate the number of bytes allocated for Foo typed objects, select the samples in Foo-typed objects. Using this reduced set of samples do the same estimation technique as above. The error on this estimation will also be based on the number of samples in your filtered subset. If there were 1000 initial samples but only 3 of those samples were in Foo-typed objects that might generate an estimate of 310K bytes of Foo objects but beware that the potential sampling error for a small number of samples is very large.
+
+## Implementation design
+
+Overall the implementation needs to do a few steps:
+1. Determine if sampling events are enabled. If no, there is nothing else to do, but if yes we need to do steps (2) and (3).
+2. Use a random number generator to simulate random trials for each allocated byte and determine which objects contain the successful trials
+3. When a successful trial occurs, emit the AllocationSampled event
+
+Steps (1) and (3) are straightforward but step (2) is non-trivial to do correctly and performantly. For step (1) we use the existing macro ETW_TRACING_CATEGORY_ENABLED() which despite its name works for all our event tracing technologies. For step (3) we defined a method FireAllocationSampled() in gchelpers.cpp and the code to emit the event is in there. Like all runtime events the definition for the event itself is in ClrEtwAll.man. All the remaining discussion is how we accomplish step (2).
+
+Our conceptual sampling model involves doing Bernoulli trials for every byte of an object. In theory we could implement that very literally. Each object allocation would run a for loop n iterations for an n byte object and generate random coin flips with a pseudo random number generator (PRNG). However doing this would be incredibly slow. A good way to understand the actual implementation is to imagine we started with this simple slow approach and then did several iterative transformations to make it run faster while maintaining the same output. Imagine that we have some function `bool GetNextTrialResult(CLRRandom* pRandom)` that takes a PRNG and should randomly return true with probability 1 in 102,400. It might be implemented:
+
+```
+bool GetNextTrialResult(CLRRandom* pRandom)
+{
+    return pRandom->NextDouble() < 1/102400;
+}
+```
+
+We don't have to generate random numbers at the instant we need them however, we are allowed to cache a batch of them at a time and dispense them later. For simplicity treat all the apparent global variables in these examples as being thread-local. In pseudo-code that looks like:
+
+```
+List<bool> _cachedTrials = PopulateTrials(pRandom);
+
+List<bool> PopulateTrials(CLRRandom* pRandom)
+{
+    List<bool> trials = new List<bool>();
+    for(int i = 0; i < 100; i++)
+    {
+        trials.Push(pRandom->NextDouble() < 1/102400);
+    }
+    return trials;
+}
+
+
+bool GetNextTrialResult(CLRRandom* pRandom)
+{
+    bool ret = _cachedTrials.Pop();
+
+    // if we are out of trials, cache some more for next time
+    if(_cachedTrials.Count == 0)
+    {
+        _cachedTrials = PopulateTrials(pRandom);
+    }
+
+    return ret;
+}
+```
+
+Notice that almost the every entry in the cached list will be false so this is an inefficient way to store it. Rather than storing a large number of false bits we could store a single number that represents a run of zero or more contiguous false bools followed by a single true bool. There is also no requirement that our cached batches of trials are the same size so we could cache exactly one run of false results. In pseudo-code that looks like:
+
+```
+BigInteger _cachedFailedTrials = PopulateTrials(pRandom);
+
+BigInteger PopulateTrials(CLRRandom* pRandom)
+{
+    BigInteger failedTrials = 0;
+    while(pRandom->NextDouble() >= 1/102400)
+    {
+        failedTrials++;
+    }
+    return failedTrials;
+}
+
+bool GetNextTrialResult(CLRRandom* pRandom)
+{
+    bool ret = (_cachedFailedTrials == 0);
+    _cachedFailedTrials--;
+
+    // if we are out of trials, cache some more for next time
+    if(cachedTrials < 0)
+    {
+        _cachedFailedTrials = PopulateTrials(pRandom);
+    }
+
+    return ret;
+}
+```
+
+Rather than generating `_cachedFailedTrials` by doing many independent queries to a random number generator we can use some math to speed this up. The probability `_cachedFailedTrials` has some particular value `X` is given by the [Geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution). We can use [Inverse Transform Sampling](https://en.wikipedia.org/wiki/Inverse_transform_sampling) to generate random values for this distribution directly. The CDF for the Geometric distribution is `1-(1-p)^(floor(x)+1)` which means the inverse is `floor(ln(1-y)/ln(1-p))`.
+
+We've been using BigInteger so far because mathmatically there is a non-zero probability of getting an arbitrarily large number of failed trials in a row. In practice however our PRNG has its outputs constrained to return a floating point number with value k/MAX_INT for an integer value of k between 0 and MAX_INT-1. The largest value PopulateTrials() can return under these constraints is ~2.148M which means a 32bit integer can easily accomodate the value. The perfect mathematical model of the Geometric distribution has a 0.00000005% chance of getting a larger run of failed trials but our PRNG rounds that incredibly unlikely case to zero probability.
+
+Both of these changes combined give the pseudo-code
+
+```
+int _cachedFailedTrials = CalculateGeometricRandom(pRandom);
+
+// Previously this method was called PopulateTrials()
+// Use Inverse Transform Sampling to calculate a random value from the Geometric distribution
+int CalculateGeometricRandom(CLRRandom* pRandom)
+{
+    return floor(log(1-pRandom->NextDouble())/log(1-1/102400));
+}
+
+bool GetNextTrialResult(CLRRandom* pRandom)
+{
+    bool ret = (_cachedFailedTrials == 0);
+    _cachedFailedTrials--;
+
+    // if we are out of trials, cache some more for next time
+    if(_cachedFailedTrials < 0)
+    {
+        _cachedFailedTrials = CalculateGeometricRandom(pRandom);
+    }
+
+    return ret;
+}
+```
+
+When allocating an object we need to do many trials at once, one for each byte. A naive implementation of that would look like:
+
+```
+bool DoesAnyTrialSucceed(CLRRandom* pRandom, int countOfTrials)
+{
+    for(int i = 0; i < countOfTrials; i++)
+    {
+        if(GetNextTrialResult(pRandom)) return true;
+    }
+    return false;
+}
+```
+
+However the `_cachedFailedTrials` representation lets us speed this up by checking if the number of failed trials in the cache covers the number of trials we need to perform without iterating through them one at a time:
+
+```
+bool DoesAnyTrialSucceed(CLRRandom* pRandom, int countOfTrials)
+{
+    bool ret = _cachedFailedTrials < countOfTrials;
+    _cachedFailedTrials -= countOfTrials;
+
+    // if we are out of trials, cache some more for next time
+    if(ret)
+    {
+        _cachedFailedTrials = CalculateGeometricRandom(pRandom);
+    }
+
+    return ret;
+}
+```
+
+
+We are getting closer to mapping our pseudo-code implementation to the real CLR code. The current CLR implementation for memory allocation has the GC sub-allocate blocks of memory 8KB in size which the runtime is allowed to sub-allocate from. The GC gives out an `alloc_context` to each thread which has a `alloc_ptr` and 'alloc_limit' fields. These fields define the memory range [alloc_ptr, alloc_limit) which can be used to sub-allocate objects. The runtime has optimized assembly code helper functions to increment `alloc_ptr` directly for objects that are small enough to fit in the current range and don't require any special handling. For all other objects the runtime invokes a slower allocation path that ultimately calls the GC's Alloc() function. If the alloc_context is exhausted, calling GC Alloc() also allocates a new 8KB block for future fast object allocations to use. In order to allocate objects we could naively do this:
+
+```
+void* FastAssemblyAllocate(int objectSize)
+{
+    Thread* pThread = GetThread();
+    CLRRandom* pRandom = pThread->GetRandom();
+    alloc_context* pAllocContext = pThread->GetAllocContext();
+    void* alloc_end = pAllocContext->alloc_ptr + objectSize;
+    if(IsSamplingEnabled() && DoesAnyTrialSucceed(pRandom, objectSize))
+        PublishSamplingEvent();
+    if(alloc_limit < alloc_end)
+        return SlowAlloc(objectSize);
+    else
+        void* objectAddr = pAllocContext->alloc_ptr;
+        pAllocContext->alloc_ptr = alloc_end;
+        *objectAddr = methodTable
+        return objectAddr;
+}
+```
+
+Although orders of magnitude faster than where we started, this is still too slow. We don't want to put extra conditional checks for IsSamplingEnabled() and DoesAnyTrialSucceed() in the fast path of every allocation. Instead we want to combine the two if conditions down to a single compare and jump, then handle publishing a sample event as part of the slow allocation path. Note that the value of the expression `alloc_ptr + _cachedFailedTrials` doesn't change by repeated calls to the FastAssemblyAllocate() as long as we don't go down the SlowAlloc path or the PublishSamplingEvent() path. Each invocation increments `alloc_ptr` by `objectSize` and decrements `_cachedFailedTrials` by the same amount leaving the sum unchanged. Lets define that sum `alloc_ptr + _cachedFailedTrials = sampling_limit`. You can imagine that if we started allocating objects contiguously from `alloc_ptr`, `sampling_limit` represents the point in the memory range where whatever object overlaps it contains the successful trial and emits the sampling event. A little more rigorously `DoesAnyTrialSucceed()` returns true when `_cachedFailedTrials < objectSize`. Adding `alloc_ptr` to each side shows this is the same as the condition `sampling_limit < alloc_end`:
+
+```
+_cachedFailedTrials < objectSize
+_cachedFailedTrials + alloc_ptr < objectSize + alloc_ptr
+sampling_limit < alloc_end
+```
+
+Last to combine the two if conditionals we can define a new field `combined_limit = min(sampling_limit, alloc_limit)`. If sampling events aren't enabled then we define `combined_limit = alloc_limit`. This means that a single check `if(alloc_end < combined_limit)` detects when either the object exceeds `alloc_limit` or it exceeds `sampling_limit`. The runtime actually has a bunch of different fast paths depending on the type of the object being allocated and the CPU architecture, but converted to pseudo-code they all look approximately like this:
+
+```
+void* FastAssemblyAllocate(int objectSize)
+{
+    Thread* pThread = GetThread();
+    alloc_context* pAllocContext = pThread->GetAllocContext();
+    void* alloc_end = pAllocContext->alloc_ptr + objectSize;
+    if(combined_limit < alloc_end)
+        return SlowAlloc(objectSize);
+    else
+        void* objectAddr = pAllocContext->alloc_ptr;
+        pAllocContext->alloc_ptr = alloc_end;
+        *objectAddr = methodTable
+        return objectAddr;
+}
+```
+
+The only change we've made in the assembly helpers is doing a comparison against combined_limit instead of alloc_limit which should have no performance impact. Look at [JIT_TrialAllocSFastMP_InlineGetThread](https://github.com/dotnet/runtime/blob/5c8bb402e6a8274e8135dd00eda2248b4f57102f/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm#L38) for an example of what one of these helpers looks like in assembly code.
+
+The pseudo-code and concepts we've been describing here are now close to matching the runtime code but there are still some important differences to call out to map it more exactly:
+
+1. In the real runtime code the assembly helpers call a variety of different C++ helpers depending on object type and all of those helpers in turn call into [Alloc()](https://github.com/dotnet/runtime/blob/5c8bb402e6a8274e8135dd00eda2248b4f57102f/src/coreclr/vm/gchelpers.cpp#L201). Here we've omitted the different per-type intermediate functions and represented all of them as the SlowAlloc() function in the pseudo-code.
+
+2. The combined_limit field is a member of ee_alloc_context rather than alloc_context. This was done to avoid creating a breaking change in the EE<->GC interface. The ee_alloc_context contains an alloc_context within it as well as any additional fields we want to add that are only visible to the EE.
+
+3. In order to reduce the number of per-thread fields being managed the real implementation doesn't have an explicit `sampling_limit`. Instead this only exists as the transient calculation of `alloc_ptr + CalculateGeometricRandom()` that is used when computing an updated value for `combined_limit`. Whenever `combined_limit < alloc_limit` then it is implied that `sampling_limit = combined_limit` and `_cachedFailedTrials = combined_limit - alloc_ptr`. However if `combined_limit == alloc_limit` that represents one of two possible states:
+- Sampling is disabled
+- Sampling is enabled and we have a batch of cached failed trials with size `alloc_limit - alloc_ptr`. In the examples above our batches were N failures followed by a success but this is just N failures without any success at the end. This means no objects allocated in the current AC are going to be sampled and whenever we allocate the N+1st byte we'll need to generate a new batch of trial results to determine whether that byte was sampled.
+If it turns out to be easier to track `sampling_limit` with an explicit field when sampling is enabled we could do that, it just requires an extra pointer per-thread. As memory overhead its not much, but it will probably land in the L1 cache and wind up evicting some other field on the Thread object that now no longer fits in the cache line. The current implementation tries to minimize this cache impact. We never did any perf testing on alternative implementations that do track sampling_limit explicitly so its possible the difference isn't that meaningful.
+
+4. When we generate batches of trial results in the examples above we always used all the results before generating a new batch, however the real implementation sometimes discards part of a batch. Implicitly this happens when we calculate a value for `sampling_limit=alloc_ptr+CalculateGeometricRandom()`, determine that `alloc_limit` is smaller than `sampling_limit`, and then set `combined_limit=alloc_limit`. Discarding also happens any time we recompute the `sampling_limit` based on a new random value without having fully allocated bytes up to `combined_limit`. It may seem suspicious that we can do this and still generate the correct distribution of samples but it is OK if done properly. Bernoulli trials are independent from one another so it is legal to discard trials from our cache as long as the decision to discard a given trial result is independent of what that trial result is. For example in the very first pseudo-code sample with the List<bool>, it would be legal to generate 100 boolean trials and then arbitrarily truncate the list to size 50. The first 50 values in the list are still valid bernoulli trials with the original p=1/102,400 of being true, as will be all the future ones from the batches that are populated later. However if we scanned the list and conditionally discarded any trials that we observed had a success result that would be problematic. This type of selective removal changes the probability distribution for the items that remain.
+
+5. The GC Alloc() function isn't the only time that the GC updates alloc_ptr and alloc_limit. They also get updated during a GC in the callback inside of GCToEEInterface::GcEnumAllocContexts(). This is another place where combined_limit needs to be updated to ensure it stays synchronized with alloc_ptr and alloc_limit.
+
+
+## Thanks
+
+Thanks to Christophe Nasarre (@chrisnas) at DataDog for implementing this feature and Mikelle Rogers for doing the investigation of the Camp-Paulson approximation.
\ No newline at end of file
diff --git a/src/coreclr/debug/daccess/dacdbiimpl.cpp b/src/coreclr/debug/daccess/dacdbiimpl.cpp
index a6dda591278557..d49cfecca6379c 100644
--- a/src/coreclr/debug/daccess/dacdbiimpl.cpp
+++ b/src/coreclr/debug/daccess/dacdbiimpl.cpp
@@ -6551,10 +6551,10 @@ HRESULT DacHeapWalker::Init(CORDB_ADDRESS start, CORDB_ADDRESS end)
                 j++;
             }
         }
-        if ((&g_global_alloc_context)->alloc_ptr != nullptr)
+        if (g_global_alloc_context->alloc_ptr != nullptr)
         {
-            mAllocInfo[j].Ptr = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_ptr;
-            mAllocInfo[j].Limit = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_limit;
+            mAllocInfo[j].Ptr = (CORDB_ADDRESS)g_global_alloc_context->alloc_ptr;
+            mAllocInfo[j].Limit = (CORDB_ADDRESS)g_global_alloc_context->alloc_limit;
         }
 
         mThreadCount = j;
diff --git a/src/coreclr/debug/daccess/request.cpp b/src/coreclr/debug/daccess/request.cpp
index 2dc737db2e7007..69c68309099d08 100644
--- a/src/coreclr/debug/daccess/request.cpp
+++ b/src/coreclr/debug/daccess/request.cpp
@@ -5493,8 +5493,8 @@ HRESULT ClrDataAccess::GetGlobalAllocationContext(
     }
 
     SOSDacEnter();
-    *allocPtr = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_ptr);
-    *allocLimit = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_limit);
+    *allocPtr = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_ptr);
+    *allocLimit = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_limit);
     SOSDacLeave();
     return hr;
 }
diff --git a/src/coreclr/gc/gc.cpp b/src/coreclr/gc/gc.cpp
index 66e9efcaa15872..5d22871191ca42 100644
--- a/src/coreclr/gc/gc.cpp
+++ b/src/coreclr/gc/gc.cpp
@@ -44127,7 +44127,7 @@ size_t gc_heap::decommit_region (heap_segment* region, int bucket, int h_number)
     {
 #ifdef MULTIPLE_HEAPS
         // In return_free_region, we set heap_segment_heap (region) to nullptr so we cannot use it here.
-        // but since all heaps share the same mark array we simply pick the 0th heap to use. 
+        // but since all heaps share the same mark array we simply pick the 0th heap to use. 
         gc_heap* hp = g_heaps [0];
 #else
         gc_heap* hp = pGenGCHeap;
@@ -49370,7 +49370,6 @@ bool GCHeap::StressHeap(gc_alloc_context * context)
     }                                                                                       \
 } while (false)
 
-#ifdef FEATURE_64BIT_ALIGNMENT
 // Allocate small object with an alignment requirement of 8-bytes.
 Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags)
 {
@@ -49436,7 +49435,6 @@ Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t
 
     return newAlloc;
 }
-#endif // FEATURE_64BIT_ALIGNMENT
 
 Object*
 GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_DCL)
@@ -49497,15 +49495,11 @@ GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_
     }
     else
     {
-#ifdef FEATURE_64BIT_ALIGNMENT
         if (flags & GC_ALLOC_ALIGN8)
         {
             newAlloc = AllocAlign8 (acontext, hp, size, flags);
         }
         else
-#else
-        assert ((flags & GC_ALLOC_ALIGN8) == 0);
-#endif
         {
             newAlloc = (Object*) hp->allocate (size + ComputeMaxStructAlignPad(requiredAlignment), acontext, flags);
         }
diff --git a/src/coreclr/gc/gcpriv.h b/src/coreclr/gc/gcpriv.h
index 1005d002029379..ed26fd10fc1b81 100644
--- a/src/coreclr/gc/gcpriv.h
+++ b/src/coreclr/gc/gcpriv.h
@@ -1465,9 +1465,7 @@ class gc_heap
     friend struct ::alloc_context;
     friend void ProfScanRootsHelper(Object** object, ScanContext *pSC, uint32_t dwFlags);
     friend void GCProfileWalkHeapWorker(BOOL fProfilerPinned, BOOL fShouldWalkHeapRootsForEtw, BOOL fShouldWalkHeapObjectsForEtw);
-#ifdef FEATURE_64BIT_ALIGNMENT
     friend Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags);
-#endif //FEATURE_64BIT_ALIGNMENT
     friend class t_join;
     friend class gc_mechanisms;
     friend class seg_free_spaces;
diff --git a/src/coreclr/inc/dacvars.h b/src/coreclr/inc/dacvars.h
index 03995176313c24..18fb2c382313b9 100644
--- a/src/coreclr/inc/dacvars.h
+++ b/src/coreclr/inc/dacvars.h
@@ -140,7 +140,7 @@ DEFINE_DACVAR(ProfControlBlock, dac__g_profControlBlock, ::g_profControlBlock)
 DEFINE_DACVAR(PTR_DWORD, dac__g_card_table, ::g_card_table)
 DEFINE_DACVAR(PTR_BYTE, dac__g_lowest_address, ::g_lowest_address)
 DEFINE_DACVAR(PTR_BYTE, dac__g_highest_address, ::g_highest_address)
-DEFINE_DACVAR(gc_alloc_context, dac__g_global_alloc_context, ::g_global_alloc_context)
+DEFINE_DACVAR(UNKNOWN_POINTER_TYPE, dac__g_global_alloc_context, ::g_global_alloc_context)
 
 DEFINE_DACVAR(IGCHeap, dac__g_pGCHeap, ::g_pGCHeap)
 
diff --git a/src/coreclr/inc/eventtracebase.h b/src/coreclr/inc/eventtracebase.h
index 316104f649a1d8..ca3a559aa235da 100644
--- a/src/coreclr/inc/eventtracebase.h
+++ b/src/coreclr/inc/eventtracebase.h
@@ -1333,17 +1333,19 @@ namespace ETW
 #define ETWLoaderStaticLoad 0 // Static reference load
 #define ETWLoaderDynamicLoad 1 // Dynamic assembly load
 
+#if defined(FEATURE_EVENT_TRACE)
+EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context;
+EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context;
+EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context;
+EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context;
+#endif // FEATURE_EVENT_TRACE
+
 #if defined(FEATURE_EVENT_TRACE) && !defined(HOST_UNIX)
 //
 // The ONE and only ONE global instantiation of this class
 //
 extern ETW::CEtwTracer *  g_pEtwTracer;
 
-EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context;
-EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context;
-EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context;
-EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context;
-
 //
 // Special Handling of Startup events
 //
diff --git a/src/coreclr/nativeaot/Runtime/AsmOffsets.h b/src/coreclr/nativeaot/Runtime/AsmOffsets.h
index 32abd406175e76..36efed6a3d4951 100644
--- a/src/coreclr/nativeaot/Runtime/AsmOffsets.h
+++ b/src/coreclr/nativeaot/Runtime/AsmOffsets.h
@@ -46,15 +46,16 @@ ASM_OFFSET(    0,     0, MethodTable, m_uFlags)
 ASM_OFFSET(    4,     4, MethodTable, m_uBaseSize)
 ASM_OFFSET(   14,    18, MethodTable, m_VTable)
 
-ASM_OFFSET(    0,     0, Thread, m_rgbAllocContextBuffer)
-ASM_OFFSET(   28,    38, Thread, m_ThreadStateFlags)
-ASM_OFFSET(   2c,    40, Thread, m_pTransitionFrame)
-ASM_OFFSET(   30,    48, Thread, m_pDeferredTransitionFrame)
-ASM_OFFSET(   40,    68, Thread, m_ppvHijackedReturnAddressLocation)
-ASM_OFFSET(   44,    70, Thread, m_pvHijackedReturnAddress)
-ASM_OFFSET(   48,    78, Thread, m_uHijackedReturnValueFlags)
-ASM_OFFSET(   4c,    80, Thread, m_pExInfoStackHead)
-ASM_OFFSET(   50,    88, Thread, m_threadAbortException)
+ASM_OFFSET(    0,     0, Thread, m_combined_limit)
+ASM_OFFSET(    4,     8, Thread, m_rgbAllocContextBuffer)
+ASM_OFFSET(   2c,    40, Thread, m_ThreadStateFlags)
+ASM_OFFSET(   30,    48, Thread, m_pTransitionFrame)
+ASM_OFFSET(   34,    50, Thread, m_pDeferredTransitionFrame)
+ASM_OFFSET(   44,    70, Thread, m_ppvHijackedReturnAddressLocation)
+ASM_OFFSET(   48,    78, Thread, m_pvHijackedReturnAddress)
+ASM_OFFSET(   4c,    80, Thread, m_uHijackedReturnValueFlags)
+ASM_OFFSET(   50,    88, Thread, m_pExInfoStackHead)
+ASM_OFFSET(   54,    90, Thread, m_threadAbortException)
 
 ASM_SIZEOF(   14,    20, EHEnum)
 
diff --git a/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt b/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt
index f9b390e18d117a..fa3f5d0f8112c0 100644
--- a/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt
+++ b/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt
@@ -127,4 +127,4 @@ if (CLR_CMAKE_TARGET_ARCH_AMD64)
   if (CLR_CMAKE_TARGET_WIN32)
     install_static_library(Runtime.VxsortEnabled.GuardCF aotsdk nativeaot)
   endif (CLR_CMAKE_TARGET_WIN32)
-endif (CLR_CMAKE_TARGET_ARCH_AMD64)
\ No newline at end of file
+endif (CLR_CMAKE_TARGET_ARCH_AMD64)
diff --git a/src/coreclr/nativeaot/Runtime/GCHelpers.cpp b/src/coreclr/nativeaot/Runtime/GCHelpers.cpp
index b038d9d33541bd..a5952315900bfa 100644
--- a/src/coreclr/nativeaot/Runtime/GCHelpers.cpp
+++ b/src/coreclr/nativeaot/Runtime/GCHelpers.cpp
@@ -29,6 +29,12 @@
 
 #include "gcdesc.h"
 
+#ifdef FEATURE_EVENT_TRACE
+   #include "clretwallmain.h"
+#else // FEATURE_EVENT_TRACE
+   #include "etmdummy.h"
+#endif // FEATURE_EVENT_TRACE
+
 #define RH_LARGE_OBJECT_SIZE 85000
 
 MethodTable g_FreeObjectEEType;
@@ -471,6 +477,32 @@ EXTERN_C int64_t QCALLTYPE RhGetTotalAllocatedBytesPrecise()
     return allocated;
 }
 
+inline void FireAllocationSampled(GC_ALLOC_FLAGS flags, size_t size, size_t samplingBudgetOffset, Object* orObject)
+{
+    void* typeId = GetLastAllocEEType();
+    // Note: like for AllocationTick, the type name cannot be retrieved
+    WCHAR* name = nullptr;
+
+    if (typeId != nullptr)
+    {
+        unsigned int allocKind =
+            (flags & GC_ALLOC_PINNED_OBJECT_HEAP) ? 2 :
+            (flags & GC_ALLOC_LARGE_OBJECT_HEAP) ? 1 :
+            0;  // SOH
+        unsigned int heapIndex = 0;
+#ifdef BACKGROUND_GC
+        gc_heap* hp = gc_heap::heap_of((BYTE*)orObject);
+        heapIndex = hp->heap_number;
+#endif
+        FireEtwAllocationSampled(allocKind, GetClrInstanceId(), typeId, name, heapIndex, (BYTE*)orObject, size, samplingBudgetOffset);
+    }
+}
+
+inline size_t AlignUp(size_t value, uint32_t alignment)
+{
+    return (value + alignment - 1) & ~(size_t)(alignment - 1);
+}
+
 static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t numElements, Thread* pThread)
 {
     ASSERT(!pThread->IsDoNotTriggerGcSet());
@@ -539,10 +571,66 @@ static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t
     // Save the MethodTable for instrumentation purposes.
     tls_pLastAllocationEEType = pEEType;
 
+    // check for dynamic allocation sampling
+    gc_alloc_context* acontext = pThread->GetAllocContext();
+    bool isSampled = false;
+    size_t availableSpace = 0;
+    size_t aligned_size = 0;
+    size_t samplingBudget = 0;
+
+    bool isRandomizedSamplingEnabled = Thread::IsRandomizedSamplingEnabled();
+    if (isRandomizedSamplingEnabled)
+    {
+        // object allocations are always padded up to pointer size
+        aligned_size = AlignUp(cbSize, sizeof(uintptr_t));
+
+        // The number bytes we can allocate before we need to emit a sampling event.
+        // This calculation is only valid if combined_limit < alloc_limit.
+        samplingBudget = (size_t)(*pThread->GetCombinedLimit() - acontext->alloc_ptr);
+
+        // The number of bytes available in the current allocation context
+        availableSpace = (size_t)(acontext->alloc_limit - acontext->alloc_ptr);
+
+        // Check to see if the allocated object overlaps a sampled byte
+        // in this AC. This happens when both:
+        // 1) The AC contains a sampled byte (combined_limit < alloc_limit)
+        // 2) The object is large enough to overlap it (samplingBudget < aligned_size)
+        //
+        // Note that the AC could have no remaining space for allocations (alloc_ptr =
+        // alloc_limit = combined_limit). When a thread hasn't done any SOH allocations
+        // yet it also starts in an empty state where alloc_ptr = alloc_limit =
+        // combined_limit = nullptr. The (1) check handles both of these situations
+        // properly as an empty AC can not have a sampled byte inside of it.
+        isSampled =
+            (*pThread->GetCombinedLimit() < acontext->alloc_limit) &&
+            (samplingBudget < aligned_size);
+
+        // if the object overflows the AC, we need to sample the remaining bytes
+        // the sampling budget only included at most the bytes inside the AC
+        if (aligned_size > availableSpace && !isSampled)
+        {
+            samplingBudget = pThread->ComputeGeometricRandom() + availableSpace;
+            isSampled = (samplingBudget < aligned_size);
+        }
+    }
+
     Object* pObject = GCHeapUtilities::GetGCHeap()->Alloc(pThread->GetAllocContext(), cbSize, uFlags);
     if (pObject == NULL)
         return NULL;
 
+    if (isSampled)
+    {
+        FireAllocationSampled((GC_ALLOC_FLAGS)uFlags, aligned_size, samplingBudget, pObject);
+    }
+
+    // There are a variety of conditions that may have invalidated the previous combined_limit value
+    // such as not allocating the object in the AC memory region (UOH allocations), moving the AC, adding
+    // extra alignment padding, allocating a new AC, or allocating an object that consumed the sampling budget.
+    // Rather than test for all the different invalidation conditions individually we conservatively always
+    // recompute it. If sampling isn't enabled this inlined function is just trivially setting
+    // combined_limit=alloc_limit.
+    pThread->UpdateCombinedLimit(isRandomizedSamplingEnabled);
+
     pObject->set_EEType(pEEType);
     if (pEEType->HasComponentSize())
     {
@@ -555,7 +643,6 @@ static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t
 
 #ifdef _DEBUG
     // We assume that the allocation quantum is never big enough for LARGE_OBJECT_SIZE.
-    gc_alloc_context* acontext = pThread->GetAllocContext();
     ASSERT(acontext->alloc_limit - acontext->alloc_ptr <= RH_LARGE_OBJECT_SIZE);
 #endif
 
diff --git a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S
index 6cb85bcc507a09..e6891cb26d61a2 100644
--- a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S
+++ b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S
@@ -28,7 +28,7 @@ NESTED_ENTRY RhpNewFast, _TEXT, NoHandler
 
         mov         rsi, [rax + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
         add         rdx, rsi
-        cmp         rdx, [rax + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         rdx, [rax + OFFSETOF__Thread__m_combined_limit]
         ja          LOCAL_LABEL(RhpNewFast_RarePath)
 
         // set the new alloc pointer
@@ -143,7 +143,7 @@ NESTED_ENTRY RhNewString, _TEXT, NoHandler
         // rcx == Thread*
         // rdx == string size
         // r12 == element count
-        cmp         rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         rax, [rcx + OFFSETOF__Thread__m_combined_limit]
         ja          LOCAL_LABEL(RhNewString_RarePath)
 
         mov         [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
@@ -226,7 +226,7 @@ NESTED_ENTRY RhpNewArray, _TEXT, NoHandler
         // rcx == Thread*
         // rdx == array size
         // r12 == element count
-        cmp         rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         rax, [rcx + OFFSETOF__Thread__m_combined_limit]
         ja          LOCAL_LABEL(RhpNewArray_RarePath)
 
         mov         [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
diff --git a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm
index 37be558c3cef1d..ad3dd89821a97c 100644
--- a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm
+++ b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm
@@ -25,7 +25,7 @@ LEAF_ENTRY RhpNewFast, _TEXT
 
         mov         rax, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
         add         r8, rax
-        cmp         r8, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         r8, [rdx + OFFSETOF__Thread__m_combined_limit]
         ja          RhpNewFast_RarePath
 
         ;; set the new alloc pointer
@@ -118,7 +118,7 @@ LEAF_ENTRY RhNewString, _TEXT
         ; rdx == element count
         ; r8 == array size
         ; r10 == thread
-        cmp         rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         rax, [r10 + OFFSETOF__Thread__m_combined_limit]
         ja          RhpNewArrayRare
 
         mov         [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
@@ -179,7 +179,7 @@ LEAF_ENTRY RhpNewArray, _TEXT
         ; rdx == element count
         ; r8 == array size
         ; r10 == thread
-        cmp         rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         rax, [r10 + OFFSETOF__Thread__m_combined_limit]
         ja          RhpNewArrayRare
 
         mov         [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax
diff --git a/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc b/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc
index 33089b6643d382..96d3be1ee31a8a 100644
--- a/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc
+++ b/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc
@@ -337,8 +337,6 @@ TSF_DoNotTriggerGc              equ 10h
 ;; Rename fields of nested structs
 ;;
 OFFSETOF__Thread__m_alloc_context__alloc_ptr        equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-OFFSETOF__Thread__m_alloc_context__alloc_limit      equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
-
 
 
 ;; GC type flags
diff --git a/src/coreclr/nativeaot/Runtime/arm/AllocFast.S b/src/coreclr/nativeaot/Runtime/arm/AllocFast.S
index 31b54d1bca313a..501923cc77f204 100644
--- a/src/coreclr/nativeaot/Runtime/arm/AllocFast.S
+++ b/src/coreclr/nativeaot/Runtime/arm/AllocFast.S
@@ -26,7 +26,7 @@ LEAF_ENTRY RhpNewFast, _TEXT
 
         ldr         r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_ptr]
         add         r2, r3
-        ldr         r1, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         r1, [r0, #OFFSETOF__Thread__m_combined_limit]
         cmp         r2, r1
         bhi         LOCAL_LABEL(RhpNewFast_RarePath)
 
@@ -132,7 +132,7 @@ LEAF_ENTRY RhNewString, _TEXT
         adds        r6, r12
         bcs         LOCAL_LABEL(RhNewString_RarePath) // if we get a carry here, the string is too large to fit below 4 GB
 
-        ldr         r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         r12, [r0, #OFFSETOF__Thread__m_combined_limit]
         cmp         r6, r12
         bhi         LOCAL_LABEL(RhNewString_RarePath)
 
@@ -213,7 +213,7 @@ LOCAL_LABEL(ArrayAlignSize):
         adds        r6, r12
         bcs         LOCAL_LABEL(RhpNewArray_RarePath) // if we get a carry here, the array is too large to fit below 4 GB
 
-        ldr         r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         r12, [r0, #OFFSETOF__Thread__m_combined_limit]
         cmp         r6, r12
         bhi         LOCAL_LABEL(RhpNewArray_RarePath)
 
@@ -349,7 +349,7 @@ LEAF_ENTRY RhpNewFastAlign8, _TEXT
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         r2, r3
-        ldr         r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         r3, [r0, #OFFSETOF__Thread__m_combined_limit]
         cmp         r2, r3
         bhi         LOCAL_LABEL(Alloc8Failed)
 
@@ -412,7 +412,7 @@ LEAF_ENTRY RhpNewFastMisalign, _TEXT
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         r2, r3
-        ldr         r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         r3, [r0, #OFFSETOF__Thread__m_combined_limit]
         cmp         r2, r3
         bhi         LOCAL_LABEL(BoxAlloc8Failed)
 
diff --git a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S
index 966b052a2b9f9e..6cd6f044965b8d 100644
--- a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S
+++ b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S
@@ -11,8 +11,6 @@ GC_ALLOC_FINALIZE               = 1
 // Rename fields of nested structs
 //
 OFFSETOF__Thread__m_alloc_context__alloc_ptr        = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-OFFSETOF__Thread__m_alloc_context__alloc_limit      = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
-
 
 
 // Allocate non-array, non-finalizable object. If the allocation doesn't fit into the current thread's
@@ -44,7 +42,7 @@ OFFSETOF__Thread__m_alloc_context__alloc_limit      = OFFSETOF__Thread__m_rgbAll
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x13, [x1, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x13, [x1, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x13
         bhi         LOCAL_LABEL(RhpNewFast_RarePath)
 
@@ -139,7 +137,7 @@ LOCAL_LABEL(NewOutOfMemory):
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x12, [x3, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x12
         bhi         LOCAL_LABEL(RhNewString_Rare)
 
@@ -207,7 +205,7 @@ LOCAL_LABEL(RhNewString_Rare):
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x12, [x3, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x12
         bhi         LOCAL_LABEL(RhpNewArray_Rare)
 
diff --git a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm
index e6849b87312669..54176ad2920e6f 100644
--- a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm
+++ b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm
@@ -30,7 +30,7 @@
         ;; Determine whether the end of the object would lie outside of the current allocation context. If so,
         ;; we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x13, [x1, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x13, [x1, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x13
         bhi         RhpNewFast_RarePath
 
@@ -118,7 +118,7 @@ NewOutOfMemory
         ;; Determine whether the end of the object would lie outside of the current allocation context. If so,
         ;; we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x12, [x3, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x12
         bhi         RhpNewArrayRare
 
@@ -179,7 +179,7 @@ StringSizeOverflow
         ;; Determine whether the end of the object would lie outside of the current allocation context. If so,
         ;; we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add         x2, x2, x12
-        ldr         x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        ldr         x12, [x3, #OFFSETOF__Thread__m_combined_limit]
         cmp         x2, x12
         bhi         RhpNewArrayRare
 
diff --git a/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h b/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h
index 94a559df719e02..8bce14dd02a3e4 100644
--- a/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h
+++ b/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h
@@ -88,7 +88,6 @@ STATUS_REDHAWK_THREAD_ABORT      equ 0x43
 ;; Rename fields of nested structs
 ;;
 OFFSETOF__Thread__m_alloc_context__alloc_ptr        equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-OFFSETOF__Thread__m_alloc_context__alloc_limit      equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
 
 ;;
 ;; IMPORTS
diff --git a/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp b/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp
index f0944fdf295179..886c9bb5cbb091 100644
--- a/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp
+++ b/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp
@@ -13,6 +13,8 @@
 void EventTracing_Initialize() { }
 
 void ETW::GCLog::FireGcStart(ETW_GC_INFO * pGcInfo) { }
+bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword) { return false; }
+
 
 #ifdef FEATURE_ETW
 BOOL ETW::GCLog::ShouldTrackMovementForEtw() { return FALSE; }
diff --git a/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst b/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst
index 901af659ff84b6..77c9d8cb15a3da 100644
--- a/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst
+++ b/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst
@@ -1,5 +1,6 @@
 # Native runtime events supported by aot runtime.
 
+AllocationSampled
 BGC1stConEnd
 BGC1stNonConEnd
 BGC1stSweepEnd
diff --git a/src/coreclr/nativeaot/Runtime/eventtrace.cpp b/src/coreclr/nativeaot/Runtime/eventtrace.cpp
index 8b3d134f5c4f24..a7d72b55fca53c 100644
--- a/src/coreclr/nativeaot/Runtime/eventtrace.cpp
+++ b/src/coreclr/nativeaot/Runtime/eventtrace.cpp
@@ -39,6 +39,23 @@ DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Con
 
 volatile LONGLONG ETW::GCLog::s_l64LastClientSequenceNumber = 0;
 
+bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword)
+{
+    // EventPipe is always taken into account
+    bool isEnabled = DotNETRuntimeProvider_IsEnabled(level, keyword);
+
+#ifdef FEATURE_ETW
+    // ETW is also taken into account on Windows
+    isEnabled |= (
+        ETW_TRACING_INITIALIZED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_Context.RegistrationHandle) &&
+        ETW_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_Context, level, keyword)
+        );
+#endif // FEATURE_ETW
+
+    return isEnabled;
+}
+
+
 //---------------------------------------------------------------------------------------
 //
 // Helper to fire the GCStart event.  Figures out which version of GCStart to fire, and
@@ -245,4 +262,4 @@ void EventPipeEtwCallbackDotNETRuntimePrivate(
     _Inout_opt_ PVOID CallbackContext)
 {
     EtwCallbackCommon(DotNETRuntimePrivate, ControlCode, Level, MatchAnyKeyword, FilterData, true);
-}
\ No newline at end of file
+}
diff --git a/src/coreclr/nativeaot/Runtime/eventtrace.h b/src/coreclr/nativeaot/Runtime/eventtrace.h
index 72f0ffa0f7a1fc..2483b692ee02ae 100644
--- a/src/coreclr/nativeaot/Runtime/eventtrace.h
+++ b/src/coreclr/nativeaot/Runtime/eventtrace.h
@@ -50,6 +50,8 @@ struct ProfilingScanContext : ScanContext
 };
 #endif // defined(FEATURE_EVENT_TRACE)
 
+bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword);
+
 namespace ETW
 {
     // Class to wrap all GC logic for ETW
diff --git a/src/coreclr/nativeaot/Runtime/eventtracebase.h b/src/coreclr/nativeaot/Runtime/eventtracebase.h
index 241c795c0d02fc..f0c1a6a99cfa12 100644
--- a/src/coreclr/nativeaot/Runtime/eventtracebase.h
+++ b/src/coreclr/nativeaot/Runtime/eventtracebase.h
@@ -102,6 +102,7 @@ struct ProfilingScanContext;
 #define CLR_GCHEAPSURVIVALANDMOVEMENT_KEYWORD 0x400000
 #define CLR_MANAGEDHEAPCOLLECT_KEYWORD 0x800000
 #define CLR_GCHEAPANDTYPENAMES_KEYWORD 0x1000000
+#define CLR_ALLOCATIONSAMPLING_KEYWORD 0x80000000000
 
 //
 // Using KEYWORDZERO means when checking the events category ignore the keyword
diff --git a/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp b/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp
index f041e499c11d4b..0fdf4642f22a34 100644
--- a/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp
+++ b/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp
@@ -132,11 +132,41 @@ void GCToEEInterface::GcScanRoots(ScanFunc* fn, int condemned, int max_gen, Scan
     sc->thread_under_crawl = NULL;
 }
 
+void InvokeGCAllocCallback(Thread* pThread, enum_alloc_context_func* fn, void* param)
+{
+    // NOTE: Its possible that alloc_ptr = alloc_limit = combined_limit = NULL at this point
+    gc_alloc_context* pAllocContext = pThread->GetAllocContext();
+
+    // The allocation context might be modified by the callback, so we need to save
+    // the remaining sampling budget and restore it after the callback if needed.
+    size_t currentSamplingBudget = (size_t)(*pThread->GetCombinedLimit() - pAllocContext->alloc_ptr);
+    size_t currentSize = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr);
+
+    fn(pAllocContext, param);
+
+    // If the GC changed the size of the allocation context, we need to recompute the sampling limit
+    // This includes the case where the AC was initially zero-sized/uninitialized.
+    // Functionally we'd get valid results if we called UpdateCombinedLimit() unconditionally but its
+    // empirically a little more performant to only call it when the AC size has changed.
+    if (currentSize != (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr))
+    {
+        pThread->UpdateCombinedLimit();
+    }
+    else
+    {
+        // Restore the remaining sampling budget as the size is the same.
+        *pThread->GetCombinedLimit() = pAllocContext->alloc_ptr + currentSamplingBudget;
+    }
+}
+
 void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* param)
 {
     FOREACH_THREAD(thread)
     {
-        (*fn) (thread->GetAllocContext(), param);
+        //(*fn) (thread->GetAllocContext(), param);
+
+        // update the combined limit is needed
+        InvokeGCAllocCallback(thread, fn, param);
     }
     END_FOREACH_THREAD
 }
diff --git a/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm b/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm
index 8d28e94c944177..4ddfab93ed1dbe 100644
--- a/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm
+++ b/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm
@@ -29,7 +29,7 @@ FASTCALL_FUNC   RhpNewFast, 4
         ;;
 
         add         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
-        cmp         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         eax, [edx + OFFSETOF__Thread__m_combined_limit]
         ja          AllocFailed
 
         ;; set the new alloc pointer
@@ -165,7 +165,7 @@ FASTCALL_FUNC   RhNewString, 8
         mov         ecx, eax
         add         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
         jc          StringAllocContextOverflow
-        cmp         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         eax, [edx + OFFSETOF__Thread__m_combined_limit]
         ja          StringAllocContextOverflow
 
         ; ECX == allocation size
@@ -282,7 +282,7 @@ ArrayAlignSize:
         mov         ecx, eax
         add         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr]
         jc          ArrayAllocContextOverflow
-        cmp         eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit]
+        cmp         eax, [edx + OFFSETOF__Thread__m_combined_limit]
         ja          ArrayAllocContextOverflow
 
         ; ECX == array size
diff --git a/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc b/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc
index 896bf8e67dab53..f22b8f0bb5b814 100644
--- a/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc
+++ b/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc
@@ -141,7 +141,6 @@ STATUS_REDHAWK_THREAD_ABORT      equ 43h
 ;; Rename fields of nested structs
 ;;
 OFFSETOF__Thread__m_alloc_context__alloc_ptr        equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-OFFSETOF__Thread__m_alloc_context__alloc_limit      equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
 
 ;;
 ;; CONSTANTS -- SYMBOLS
diff --git a/src/coreclr/nativeaot/Runtime/inc/rhbinder.h b/src/coreclr/nativeaot/Runtime/inc/rhbinder.h
index db238e24acbc16..6cf67845d86d30 100644
--- a/src/coreclr/nativeaot/Runtime/inc/rhbinder.h
+++ b/src/coreclr/nativeaot/Runtime/inc/rhbinder.h
@@ -496,15 +496,15 @@ struct PInvokeTransitionFrame
 #define PInvokeTransitionFrame_MAX_SIZE (sizeof(PInvokeTransitionFrame) + (POINTER_SIZE * PInvokeTransitionFrame_SaveRegs_count))
 
 #ifdef TARGET_AMD64
-#define OFFSETOF__Thread__m_pTransitionFrame 0x40
+#define OFFSETOF__Thread__m_pTransitionFrame 0x48
 #elif defined(TARGET_ARM64)
-#define OFFSETOF__Thread__m_pTransitionFrame 0x40
+#define OFFSETOF__Thread__m_pTransitionFrame 0x48
 #elif defined(TARGET_LOONGARCH64)
-#define OFFSETOF__Thread__m_pTransitionFrame 0x40
+#define OFFSETOF__Thread__m_pTransitionFrame 0x48
 #elif defined(TARGET_X86)
-#define OFFSETOF__Thread__m_pTransitionFrame 0x2c
+#define OFFSETOF__Thread__m_pTransitionFrame 0x30
 #elif defined(TARGET_ARM)
-#define OFFSETOF__Thread__m_pTransitionFrame 0x2c
+#define OFFSETOF__Thread__m_pTransitionFrame 0x30
 #endif
 
 typedef DPTR(MethodTable) PTR_EEType;
diff --git a/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S b/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S
index dc344183e927ba..6974bebfb829bf 100644
--- a/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S
+++ b/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S
@@ -11,9 +11,7 @@ GC_ALLOC_FINALIZE               = 1
 // Rename fields of nested structs
 //
 OFFSETOF__Thread__m_alloc_context__alloc_ptr        = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-OFFSETOF__Thread__m_alloc_context__alloc_limit      = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
-
-
+// OFFSETOF__Thread__m_combined_limit is the sampling limit of the allocation context (or the end of it if no sampling - former alloc_limit)
 
 // Allocate non-array, non-finalizable object. If the allocation doesn't fit into the current thread's
 // allocation context then automatically fallback to the slow allocation path.
@@ -44,7 +42,7 @@ OFFSETOF__Thread__m_alloc_context__alloc_limit      = OFFSETOF__Thread__m_rgbAll
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add.d  $a2, $a2, $t3
-        ld.d   $t4, $a1, OFFSETOF__Thread__m_alloc_context__alloc_limit
+        ld.d   $t4, $a1, OFFSETOF__Thread__m_combined_limit
         bltu  $t4, $a2, RhpNewFast_RarePath
 
         // Update the alloc pointer to account for the allocation.
@@ -137,7 +135,7 @@ NewOutOfMemory:
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add.d  $a2, $a2, $t3
-        ld.d  $t3, $a3, OFFSETOF__Thread__m_alloc_context__alloc_limit
+        ld.d  $t3, $a3, OFFSETOF__Thread__m_combined_limit
         bltu  $t3, $a2, RhNewString_Rare
 
         // Reload new object address into r12.
@@ -199,7 +197,7 @@ RhNewString_Rare:
         // Determine whether the end of the object would lie outside of the current allocation context. If so,
         // we abandon the attempt to allocate the object directly and fall back to the slow helper.
         add.d  $a2, $a2, $t3
-        ld.d  $t3, $a3, OFFSETOF__Thread__m_alloc_context__alloc_limit
+        ld.d  $t3, $a3, OFFSETOF__Thread__m_combined_limit
         bltu  $t3, $a2, RhpNewArray_Rare
 
         // Reload new object address into t3.
diff --git a/src/coreclr/nativeaot/Runtime/thread.cpp b/src/coreclr/nativeaot/Runtime/thread.cpp
index b796b052182260..2ba31535d1703a 100644
--- a/src/coreclr/nativeaot/Runtime/thread.cpp
+++ b/src/coreclr/nativeaot/Runtime/thread.cpp
@@ -28,6 +28,9 @@
 #include "RhConfig.h"
 #include "GcEnum.h"
 
+#include "eventtracebase.h"
+#include "eventtrace.h"
+
 #ifndef DACCESS_COMPILE
 
 static int (*g_RuntimeInitializationCallback)();
@@ -193,10 +196,13 @@ void * Thread::GetCurrentThreadPInvokeReturnAddress()
 }
 #endif // !DACCESS_COMPILE
 
-#if defined(FEATURE_GC_STRESS) & !defined(DACCESS_COMPILE)
 void Thread::SetRandomSeed(uint32_t seed)
 {
+#ifndef FEATURE_GC_STRESS
     ASSERT(!IsStateSet(TSF_IsRandSeedSet));
+#endif
+
+    m_rng.InitSeed(seed);
     m_uRand = seed;
     SetState(TSF_IsRandSeedSet);
 }
@@ -243,7 +249,6 @@ bool Thread::IsRandInited()
 {
     return IsStateSet(TSF_IsRandSeedSet);
 }
-#endif // FEATURE_GC_STRESS & !DACCESS_COMPILE
 
 PTR_ExInfo Thread::GetCurExInfo()
 {
@@ -300,11 +305,19 @@ void Thread::Construct()
     ASSERT(m_pGCFrameRegistrations == NULL);
 
     ASSERT(m_threadAbortException == NULL);
+    ASSERT(m_combined_limit == NULL);
 
 #ifdef FEATURE_SUSPEND_REDIRECTION
     ASSERT(m_redirectionContextBuffer == NULL);
 #endif //FEATURE_SUSPEND_REDIRECTION
     ASSERT(m_interruptedContext == NULL);
+
+    if (!IsStateSet(TSF_IsRandSeedSet))
+    {
+        // Initialize the random number generator seed
+        uint32_t seed = (uint32_t)PalGetTickCount64();
+        SetRandomSeed(seed);
+    }
 }
 
 bool Thread::IsInitialized()
@@ -347,12 +360,15 @@ uint64_t Thread::GetDeadThreadsNonAllocBytes()
 #endif
 }
 
+uint32_t SamplingDistributionMean = (100 * 1024);
+
 void Thread::Detach()
 {
     // clean up the alloc context
     gc_alloc_context* context = GetAllocContext();
     s_DeadThreadsNonAllocBytes += context->alloc_limit - context->alloc_ptr;
     GCHeapUtilities::GetGCHeap()->FixAllocContext(context, NULL, NULL);
+    m_combined_limit = NULL;
 
     SetDetached();
 }
@@ -1321,6 +1337,45 @@ FCIMPL1(void, RhpReversePInvokeReturn, ReversePInvokeFrame * pFrame)
 }
 FCIMPLEND
 
+
+bool Thread::IsRandomizedSamplingEnabled()
+{
+    return IsRuntimeProviderEnabled(TRACE_LEVEL_INFORMATION, CLR_ALLOCATIONSAMPLING_KEYWORD);
+}
+
+int Thread::ComputeGeometricRandom()
+{
+    const double maxValue = 0xFFFFFFFF;
+
+    // compute a random sample from the Geometric distribution
+    double probability = (maxValue - (double)m_rng.next()) / maxValue;
+    int threshold = (int)(-log(1 - probability) * SamplingDistributionMean);
+    return threshold;
+}
+
+void Thread::UpdateCombinedLimit(bool samplingEnabled)
+{
+    gc_alloc_context* alloc_context = GetAllocContext();
+    if (!samplingEnabled)
+    {
+        m_combined_limit = alloc_context->alloc_limit;
+    }
+    else
+    {
+        // compute the next sampling limit based on a geometric distribution
+        uint8_t* sampling_limit = alloc_context->alloc_ptr + ComputeGeometricRandom();
+
+        // if the sampling limit is larger than the allocation context, no sampling will occur in this AC
+        m_combined_limit = (sampling_limit < alloc_context->alloc_limit) ? sampling_limit : alloc_context->alloc_limit;
+    }
+}
+
+// Regenerate the randomized sampling limit and update the m_combined_limit field.
+void Thread::UpdateCombinedLimit()
+{
+    UpdateCombinedLimit(IsRandomizedSamplingEnabled());
+}
+
 #ifdef USE_PORTABLE_HELPERS
 
 FCIMPL1(void, RhpPInvoke2, PInvokeTransitionFrame* pFrame)
diff --git a/src/coreclr/nativeaot/Runtime/thread.h b/src/coreclr/nativeaot/Runtime/thread.h
index 4c0a21e9f9ab7f..f26cd3b3413813 100644
--- a/src/coreclr/nativeaot/Runtime/thread.h
+++ b/src/coreclr/nativeaot/Runtime/thread.h
@@ -6,6 +6,7 @@
 
 #include "StackFrameIterator.h"
 #include "slist.h" // DefaultSListTraits
+#include "xoshiro128plusplus.h"
 
 struct gc_alloc_context;
 class RuntimeInstance;
@@ -83,8 +84,33 @@ struct InlinedThreadStaticRoot
     TypeManager* m_typeManager;
 };
 
+extern uint32_t SamplingDistributionMean;
+
 struct RuntimeThreadLocals
 {
+    // Any allocation that would overlap combined_limit needs to be handled by the allocation slow path.
+    // combined_limit is the minimum of:
+    //  - gc_alloc_context.alloc_limit (the end of the current AC)
+    //  - the sampling_limit
+    //
+    // In the simple case that randomized sampling is disabled, combined_limit is always equal to alloc_limit.
+    //
+    // There are two different useful interpretations for the sampling_limit. One is to treat the sampling_limit
+    // as an address and when we allocate an object that overlaps that address we should emit a sampling event.
+    // The other is that we can treat (sampling_limit - alloc_ptr) as a budget of how many bytes we can allocate
+    // before emitting a sampling event. If we always allocated objects contiguously in the AC and incremented
+    // alloc_ptr by the size of the object, these two interpretations would be equivalent. However, when objects
+    // don't fit in the AC we allocate them in some other address range. The budget interpretation is more
+    // flexible to handle those cases.
+    //
+    // The sampling limit isn't stored in any separate field explicitly, instead it is implied:
+    // - if combined_limit == alloc_limit there is no sampled byte in the AC. In the budget interpretation
+    //   we can allocate (alloc_limit - alloc_ptr) unsampled bytes. We'll need a new random number after
+    //   that to determine whether future allocated bytes should be sampled.
+    //   This occurs either because the sampling feature is disabled, or because the randomized selection
+    //   of sampled bytes didn't select a byte in this AC.
+    // - if combined_limit < alloc_limit there is a sample limit in the AC. sample_limit = combined_limit.
+    uint8_t*                m_combined_limit;
     uint8_t                 m_rgbAllocContextBuffer[SIZEOF_ALLOC_CONTEXT];
     uint32_t volatile       m_ThreadStateFlags;                     // see Thread::ThreadStateFlags enum
     PInvokeTransitionFrame* m_pTransitionFrame;
@@ -99,6 +125,7 @@ struct RuntimeThreadLocals
 #endif // FEATURE_HIJACK
     PTR_ExInfo              m_pExInfoStackHead;
     Object*                 m_threadAbortException;                 // ThreadAbortException instance -set only during thread abort
+
 #ifdef TARGET_X86
     PCODE                   m_LastRedirectIP;
     uint64_t                m_SpinCount;
@@ -115,9 +142,9 @@ struct RuntimeThreadLocals
     uint8_t*                m_redirectionContextBuffer;             // storage for redirection context, allocated on demand
 #endif //FEATURE_SUSPEND_REDIRECTION
 
-#ifdef FEATURE_GC_STRESS
     uint32_t                m_uRand;                                // current per-thread random number
-#endif // FEATURE_GC_STRESS
+    // TODO: replace m_uRand with m_rng
+    sxoshiro128pp           m_rng;                                  // random number generator
 };
 
 struct ReversePInvokeFrame
@@ -144,9 +171,7 @@ class Thread : private RuntimeThreadLocals
         TSF_DoNotTriggerGc      = 0x00000010,       // Do not allow hijacking of this thread, also intended to
                                                     // ...be checked during allocations in debug builds.
         TSF_IsGcSpecialThread   = 0x00000020,       // Set to indicate a GC worker thread used for background GC
-#ifdef FEATURE_GC_STRESS
-        TSF_IsRandSeedSet       = 0x00000040,       // set to indicate the random number generator for GCStress was inited
-#endif // FEATURE_GC_STRESS
+        TSF_IsRandSeedSet       = 0x00000040,       // set to indicate the random number generator was inited (used by GCSTRESS and AllocationSampled)
 
 #ifdef FEATURE_SUSPEND_REDIRECTION
         TSF_Redirected          = 0x00000080,       // Set to indicate the thread is redirected and will inevitably
@@ -216,6 +241,12 @@ class Thread : private RuntimeThreadLocals
     bool                IsInitialized();
 
     gc_alloc_context *  GetAllocContext();
+    static bool         IsRandomizedSamplingEnabled();
+    uint8_t**           GetCombinedLimit();
+    int                 ComputeGeometricRandom();
+    void                UpdateCombinedLimit();
+    // TODO: probably private
+    void                UpdateCombinedLimit(bool samplingEnabled);
 
     uint64_t            GetPalThreadIdForLogging();
 
@@ -256,11 +287,9 @@ class Thread : private RuntimeThreadLocals
 #ifndef DACCESS_COMPILE
     void                SetThreadStressLog(void * ptsl);
 #endif // DACCESS_COMPILE
-#ifdef FEATURE_GC_STRESS
     void                SetRandomSeed(uint32_t seed);
     uint32_t            NextRand();
     bool                IsRandInited();
-#endif // FEATURE_GC_STRESS
     PTR_ExInfo          GetCurExInfo();
 
     bool                IsCurrentThreadInCooperativeMode();
diff --git a/src/coreclr/nativeaot/Runtime/thread.inl b/src/coreclr/nativeaot/Runtime/thread.inl
index 2daffd06922134..fb148d5e8c6faa 100644
--- a/src/coreclr/nativeaot/Runtime/thread.inl
+++ b/src/coreclr/nativeaot/Runtime/thread.inl
@@ -1,6 +1,16 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+#ifndef __thread_inl__
+#define __thread_inl__
+
+// TODO: try to find out where the events symbols are defined
+//#include "eventtracebase.h"
+//#include "ClrEtwAll.h"
+
+#include "thread.h"
+
+
 #ifndef DACCESS_COMPILE
 // Set the m_pDeferredTransitionFrame field for GC allocation helpers that setup transition frame
 // in assembly code. Do not use anywhere else.
@@ -64,6 +74,12 @@ inline gc_alloc_context* Thread::GetAllocContext()
     return (gc_alloc_context*)m_rgbAllocContextBuffer;
 }
 
+inline uint8_t** Thread::GetCombinedLimit()
+{
+    return &m_combined_limit;
+}
+
+
 inline bool Thread::IsStateSet(ThreadStateFlags flags)
 {
     return ((m_ThreadStateFlags & flags) == (uint32_t)flags);
@@ -156,3 +172,5 @@ FORCEINLINE bool Thread::InlineTryFastReversePInvoke(ReversePInvokeFrame* pFrame
 
     return true;
 }
+
+#endif // __thread_inl__
diff --git a/src/coreclr/nativeaot/Runtime/threadstore.cpp b/src/coreclr/nativeaot/Runtime/threadstore.cpp
index fb6255ba118a8e..10687f08ae1eeb 100644
--- a/src/coreclr/nativeaot/Runtime/threadstore.cpp
+++ b/src/coreclr/nativeaot/Runtime/threadstore.cpp
@@ -127,7 +127,7 @@ void ThreadStore::AttachCurrentThread(bool fAcquireThreadStoreLock)
     // Init the thread buffer
     //
     pAttachingThread->Construct();
-    ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_Unknown);
+    ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_IsRandSeedSet);
 
     // fAcquireThreadStoreLock is false when threads are created/attached for GC purpose
     // in such case the lock is already held and GC takes care to ensure safe access to the threadstore
@@ -138,7 +138,7 @@ void ThreadStore::AttachCurrentThread(bool fAcquireThreadStoreLock)
     //
     // Set thread state to be attached
     //
-    ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_Unknown);
+    ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_IsRandSeedSet);
     pAttachingThread->m_ThreadStateFlags = Thread::TSF_Attached;
 
     pTS->m_ThreadList.PushHead(pAttachingThread);
diff --git a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc
index f8ec8f5037b1b2..78d1a461d1628f 100644
--- a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc
+++ b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc
@@ -241,7 +241,6 @@ C_FUNC(\Name):
 // Rename fields of nested structs
 //
 #define OFFSETOF__Thread__m_alloc_context__alloc_ptr    OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr
-#define OFFSETOF__Thread__m_alloc_context__alloc_limit  OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit
 
 // GC type flags
 #define GC_ALLOC_FINALIZE           1
diff --git a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc
index 68631819f7dee4..eea96fdd17d812 100644
--- a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc
+++ b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc
@@ -29,7 +29,6 @@
 
 // Rename fields of nested structs
 #define OFFSETOF__Thread__m_alloc_context__alloc_ptr (OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr)
-#define OFFSETOF__Thread__m_alloc_context__alloc_limit (OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit)
 
 // GC minimal sized object. We use this to switch between 4 and 8 byte alignment in the GC heap (see AllocFast.asm).
 #define SIZEOF__MinObject 12
diff --git a/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h b/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h
new file mode 100644
index 00000000000000..ad275526a51155
--- /dev/null
+++ b/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h
@@ -0,0 +1,131 @@
+#pragma once
+
+/*  Written in 2019 by David Blackman and Sebastiano Vigna (vigna@acm.org)
+
+To the extent possible under law, the author has dedicated all copyright
+and related and neighboring rights to this software to the public domain
+worldwide.
+
+Permission to use, copy, modify, and/or distribute this software for any
+purpose with or without fee is hereby granted.
+
+THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR
+IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */
+
+#include <stdint.h>
+
+/* This is xoshiro128++ 1.0, one of our 32-bit all-purpose, rock-solid
+   generators. It has excellent speed, a state size (128 bits) that is
+   large enough for mild parallelism, and it passes all tests we are aware
+   of.
+
+   For generating just single-precision (i.e., 32-bit) floating-point
+   numbers, xoshiro128+ is even faster.
+
+   The state must be seeded so that it is not everywhere zero. */
+
+// Note: the code has been changed to avoid static state in multi-threaded usage
+
+struct sxoshiro128pp
+{
+	static inline uint32_t rotl(const uint32_t x, int k) {
+		return (x << k) | (x >> (32 - k));
+	}
+
+	uint32_t s[4];
+
+	void InitSeed(uint32_t seed)
+	{
+		if (seed == 0)
+		{
+			seed = 997;
+		}
+
+		s[0] = seed;
+		s[1] = seed;
+		s[2] = seed;
+		s[3] = seed;
+		jump();
+	}
+
+	uint32_t next(void) {
+		const uint32_t result = rotl(s[0] + s[3], 7) + s[0];
+
+		const uint32_t t = s[1] << 9;
+
+		s[2] ^= s[0];
+		s[3] ^= s[1];
+		s[1] ^= s[2];
+		s[0] ^= s[3];
+
+		s[2] ^= t;
+
+		s[3] = rotl(s[3], 11);
+
+		return result;
+	}
+
+
+	/* This is the jump function for the generator. It is equivalent
+	   to 2^64 calls to next(); it can be used to generate 2^64
+	   non-overlapping subsequences for parallel computations. */
+
+	void jump(void) {
+		static const uint32_t JUMP[] = { 0x8764000b, 0xf542d2d3, 0x6fa035c3, 0x77f2db5b };
+
+		uint32_t s0 = 0;
+		uint32_t s1 = 0;
+		uint32_t s2 = 0;
+		uint32_t s3 = 0;
+		for (int i = 0; i < sizeof JUMP / sizeof * JUMP; i++)
+			for (int b = 0; b < 32; b++) {
+				if (JUMP[i] & UINT32_C(1) << b) {
+					s0 ^= s[0];
+					s1 ^= s[1];
+					s2 ^= s[2];
+					s3 ^= s[3];
+				}
+				next();
+			}
+
+		s[0] = s0;
+		s[1] = s1;
+		s[2] = s2;
+		s[3] = s3;
+	}
+
+
+	/* This is the long-jump function for the generator. It is equivalent to
+	   2^96 calls to next(); it can be used to generate 2^32 starting points,
+	   from each of which jump() will generate 2^32 non-overlapping
+	   subsequences for parallel distributed computations. */
+
+	void long_jump(void) {
+		static const uint32_t LONG_JUMP[] = { 0xb523952e, 0x0b6f099f, 0xccf5a0ef, 0x1c580662 };
+
+		uint32_t s0 = 0;
+		uint32_t s1 = 0;
+		uint32_t s2 = 0;
+		uint32_t s3 = 0;
+		for (int i = 0; i < sizeof LONG_JUMP / sizeof * LONG_JUMP; i++)
+			for (int b = 0; b < 32; b++) {
+				if (LONG_JUMP[i] & UINT32_C(1) << b) {
+					s0 ^= s[0];
+					s1 ^= s[1];
+					s2 ^= s[2];
+					s3 ^= s[3];
+				}
+				next();
+			}
+
+		s[0] = s0;
+		s[1] = s1;
+		s[2] = s2;
+		s[3] = s3;
+	}
+};
\ No newline at end of file
diff --git a/src/coreclr/vm/ClrEtwAll.man b/src/coreclr/vm/ClrEtwAll.man
index 265d7a07726cf6..8309a0eea51979 100644
--- a/src/coreclr/vm/ClrEtwAll.man
+++ b/src/coreclr/vm/ClrEtwAll.man
@@ -91,6 +91,8 @@
                              message="$(string.RuntimePublisher.ProfilerKeywordMessage)" symbol="CLR_PROFILER_KEYWORD" />
                     <keyword name="WaitHandleKeyword" mask="0x40000000000"
                              message="$(string.RuntimePublisher.WaitHandleKeywordMessage)" symbol="CLR_WAITHANDLE_KEYWORD"/>
+                    <keyword name="AllocationSamplingKeyword" mask="0x80000000000"
+                             message="$(string.RuntimePublisher.AllocationSamplingKeywordMessage)" symbol="CLR_ALLOCATIONSAMPLING_KEYWORD"/>
                 </keywords>
                 <!--Tasks-->
                 <tasks>
@@ -461,7 +463,13 @@
                         <opcodes>
                         </opcodes>
                     </task>
-                <!--Next available ID is 40-->
+                    <task name="AllocationSampling" symbol="CLR_ALLOCATIONSAMPLING_TASK"
+                          value="40" eventGUID="{CC82530E-A21C-4EB0-A68A-FA4F7E66498F}"
+                          message="$(string.RuntimePublisher.AllocationSamplingTaskMessage)">
+                        <opcodes>
+                        </opcodes>
+                    </task>
+                <!--Next available ID is 41-->
                 </tasks>
                 <!--Maps-->
                 <maps>
@@ -998,7 +1006,7 @@
                             </GCHeapStats_V2>
                         </UserData>
                     </template>
-                    
+
                     <template tid="GCCreateSegment">
                         <data name="Address" inType="win:UInt64" outType="win:HexInt64" />
                         <data name="Size" inType="win:UInt64" outType="win:HexInt64" />
@@ -1444,7 +1452,7 @@
                         <struct name="Values"   count="Count"  >
                             <data name="Index" inType="win:UInt16" />
                             <data name="Count" inType="win:UInt32" />
-                            <data name="Size" inType="win:Pointer" outType="win:HexInt64" /> 
+                            <data name="Size" inType="win:Pointer" outType="win:HexInt64" />
                         </struct>
 
                         <UserData>
@@ -3126,7 +3134,7 @@
                             </ExecutionCheckpoint>
                         </UserData>
                     </template>
-                    
+
                     <template tid="ProfilerMessage">
                       <data name="ClrInstanceID" inType="win:UInt16"/>
                       <data name="Message" inType="win:UnicodeString" />
@@ -3164,6 +3172,30 @@
                         </UserData>
                     </template>
 
+                    <template tid="AllocationSampled">
+                        <data name="AllocationKind" inType="win:UInt32" map="GCAllocationKindMap" />
+                        <data name="ClrInstanceID" inType="win:UInt16" />
+                        <data name="TypeID" inType="win:Pointer" />
+                        <data name="TypeName" inType="win:UnicodeString" />
+                        <data name="HeapIndex" inType="win:UInt32" />
+                        <data name="Address" inType="win:Pointer" />
+                        <data name="ObjectSize" inType="win:UInt64" outType="win:HexInt64" />
+                        <data name="SampledByteOffset" inType="win:UInt64" outType="win:HexInt64" />
+
+                        <UserData>
+                            <AllocationSampled xmlns="myNs">
+                                <AllocationKind> %1 </AllocationKind>
+                                <ClrInstanceID> %2 </ClrInstanceID>
+                                <TypeID> %3 </TypeID>
+                                <TypeName> %4 </TypeName>
+                                <HeapIndex> %5 </HeapIndex>
+                                <Address> %6 </Address>
+                                <ObjectSize> %7 </ObjectSize>
+                                <SampledByteOffset> %8 </SampledByteOffset>
+                            </AllocationSampled>
+                        </UserData>
+                    </template>
+
                 </templates>
 
                 <events>
@@ -3566,7 +3598,7 @@
                            keywords ="ThreadingKeyword"  opcode="Wait"
                            task="ThreadPoolWorkerThread"
                            symbol="ThreadPoolWorkerThreadWait" message="$(string.RuntimePublisher.ThreadPoolWorkerThreadEventMessage)"/>
-                  
+
                     <event value="58" version="0" level="win:Informational" template="YieldProcessorMeasurement"
                            keywords="ThreadingKeyword" task="YieldProcessorMeasurement" opcode="win:Info"
                            symbol="YieldProcessorMeasurement" message="$(string.RuntimePublisher.YieldProcessorMeasurementEventMessage)"/>
@@ -4257,6 +4289,12 @@
                            task="WaitHandleWait"
                            symbol="WaitHandleWaitStop" message="$(string.RuntimePublisher.WaitHandleWaitStopEventMessage)"/>
 
+                    <!-- Allocation sampling events -->
+                    <event value="303" version="0" level="win:Informational" template="AllocationSampled"
+                           keywords="AllocationSamplingKeyword"
+                           task="AllocationSampling"
+                           symbol="AllocationSampled" message="$(string.RuntimePublisher.AllocationSampledEventMessage)"/>
+
                 </events>
             </provider>
 
@@ -4372,14 +4410,14 @@
                             <opcode name="ExecutionCheckpointDCEnd" message="$(string.RundownPublisher.ExecutionCheckpointDCEndOpcodeMessage)" symbol="CLR_EXECUTIONCHECKPOINT_DCSTART_OPCODE" value="11"/>
                         </opcodes>
                     </task>
-                    
+
                     <task name="CLRGCRundown" symbol="CLR_GC_RUNDOWN_TASK"
                           value="40" eventGUID="{51B6C146-777F-4375-A0F8-1349D076E215}"
                           message="$(string.RundownPublisher.GCTaskMessage)">
                         <opcodes>
                             <opcode name="GCSettingsRundown" message="$(string.RundownPublisher.GCSettingsOpcodeMessage)" symbol="CLR_GC_GCSETTINGS_OPCODE" value="10"> </opcode>
                         </opcodes>
-                    </task>                    
+                    </task>
                 </tasks>
 
                 <maps>
@@ -7297,7 +7335,7 @@
                            keywords="PrivateFusionKeyword" opcode="NgenBind"
                            task="CLRNgenBinder"
                            symbol="NgenBindEvent" message="$(string.PrivatePublisher.NgenBinderMessage)"/>
-                    
+
                     <event value="189" version="0" level="win:Informational" template="JittedMethodRichDebugInfo"
                            keywords="JittedMethodRichDebugInfoKeyword"
                            task="CLRMethodPrivate" opcode="JittedMethodRichDebugInfo"
@@ -8203,7 +8241,7 @@
                            keywords ="LoaderKeyword" opcode="ModuleFailed"
                            task="MonoProfiler"
                            symbol="MonoProfilerModuleFailed" message="$(string.MonoProfilerPublisher.ModuleLoadingUnloadingFailedEventMessage)" />
-                    
+
                     <event value="22" version="0" level="win:Informational"  template="ModuleLoadedUnloaded"
                            keywords ="LoaderKeyword" opcode="ModuleLoaded"
                            task="MonoProfiler"
@@ -8213,7 +8251,7 @@
                            keywords ="LoaderKeyword" opcode="ModuleUnloading"
                            task="MonoProfiler"
                            symbol="MonoProfilerModuleUnloading" message="$(string.MonoProfilerPublisher.ModuleLoadingUnloadingFailedEventMessage)" />
-                    
+
                     <event value="24" version="0" level="win:Informational"  template="ModuleLoadedUnloaded"
                            keywords ="LoaderKeyword" opcode="ModuleUnloaded"
                            task="MonoProfiler"
@@ -8223,17 +8261,17 @@
                            keywords ="LoaderKeyword" opcode="AssemblyLoading"
                            task="MonoProfiler"
                            symbol="MonoProfilerAssemblyLoading" message="$(string.MonoProfilerPublisher.AssemblyLoadingUnloadingEventMessage)" />
-                    
+
                     <event value="26" version="0" level="win:Informational"  template="AssemblyLoadedUnloaded"
                            keywords ="LoaderKeyword" opcode="AssemblyLoaded"
                            task="MonoProfiler"
                            symbol="MonoProfilerAssemblyLoaded" message="$(string.MonoProfilerPublisher.AssemblyLoadedUnloadedEventMessage)" />
-                    
+
                     <event value="27" version="0" level="win:Verbose"  template="AssemblyLoadingUnloading"
                            keywords ="LoaderKeyword" opcode="AssemblyUnloading"
                            task="MonoProfiler"
                            symbol="MonoProfilerAssemblyUnloading" message="$(string.MonoProfilerPublisher.AssemblyLoadingUnloadingEventMessage)" />
-                    
+
                     <event value="28" version="0" level="win:Informational"  template="AssemblyLoadedUnloaded"
                            keywords ="LoaderKeyword" opcode="AssemblyUnloaded"
                            task="MonoProfiler"
@@ -8616,6 +8654,7 @@
                 <string id="RuntimePublisher.ExecutionCheckpointEventMessage" value="ClrInstanceID=%1;Checkpoint=%2;Timestamp=%3"/>
                 <string id="RuntimePublisher.WaitHandleWaitStartEventMessage" value="WaitSource=%1;%nAssociatedObjectID=%2;%nClrInstanceID=%3"/>
                 <string id="RuntimePublisher.WaitHandleWaitStopEventMessage" value="ClrInstanceID=%1"/>
+                <string id="RuntimePublisher.AllocationSampledEventMessage" value="%nKind=%1;%nClrInstanceID=%2;%nTypeID=%3;%nTypeName=%4;%nHeapIndex=%5;%nAddress=%6;%nObjectSize=%7;%nSampledByteOffset=%8" />
 
                 <string id="RundownPublisher.MethodDCStartEventMessage" value="MethodID=%1;%nModuleID=%2;%nMethodStartAddress=%3;%nMethodSize=%4;%nMethodToken=%5;%nMethodFlags=%6" />
                 <string id="RundownPublisher.MethodDCStart_V1EventMessage" value="MethodID=%1;%nModuleID=%2;%nMethodStartAddress=%3;%nMethodSize=%4;%nMethodToken=%5;%nMethodFlags=%6;%nClrInstanceID=%7" />
@@ -8659,7 +8698,7 @@
                 <string id="RundownPublisher.ThreadCreatedEventMessage" value="ManagedThreadID=%1;%nAppDomainID=%2;%nFlags=%3;%nManagedThreadIndex=%4;%nOSThreadID=%5;%nClrInstanceID=%6" />
                 <string id="RundownPublisher.RuntimeInformationEventMessage" value="ClrInstanceID=%1;%nSKU=%2;%nBclMajorVersion=%3;%nBclMinorVersion=%4;%nBclBuildNumber=%5;%nBclQfeNumber=%6;%nVMMajorVersion=%7;%nVMMinorVersion=%8;%nVMBuildNumber=%9;%nVMQfeNumber=%10;%nStartupFlags=%11;%nStartupMode=%12;%nCommandLine=%13;%nComObjectGUID=%14;%nRuntimeDllPath=%15"/>
                 <string id="RundownPublisher.StackEventMessage" value="ClrInstanceID=%1;%nReserved1=%2;%nReserved2=%3;%nFrameCount=%4;%nStack=%5" />
-                <string id="RundownPublisher.GCSettingsRundownEventMessage" value="HardLimit=%1;%nLOHThreshold=%2;%nPhysicalMemoryConfig=%3;%nGen0MinBudgetConfig=%4;%nGen0MaxBudgetConfig=%5;%nHighMemPercentConfig=%6;%nBitSettings=%7;%nClrInstanceID=%8" />                
+                <string id="RundownPublisher.GCSettingsRundownEventMessage" value="HardLimit=%1;%nLOHThreshold=%2;%nPhysicalMemoryConfig=%3;%nGen0MinBudgetConfig=%4;%nGen0MaxBudgetConfig=%5;%nHighMemPercentConfig=%6;%nBitSettings=%7;%nClrInstanceID=%8" />
                 <string id="RundownPublisher.ModuleRangeDCStartEventMessage" value="ClrInstanceID=%1;%ModuleID=%2;%nRangeBegin=%3;%nRangeSize=%4;%nRangeType=%5" />
                 <string id="RundownPublisher.ModuleRangeDCEndEventMessage" value= "ClrInstanceID=%1;%ModuleID=%2;%nRangeBegin=%3;%nRangeSize=%4;%nRangeType=%5" />
                 <string id="RundownPublisher.TieredCompilationSettingsDCStartEventMessage" value="ClrInstanceID=%1;%nFlags=%2" />
@@ -8791,6 +8830,7 @@
                 <string id="RuntimePublisher.ProfilerTaskMessage" value="Profiler" />
                 <string id="RuntimePublisher.YieldProcessorMeasurementTaskMessage" value="YieldProcessorMeasurement" />
                 <string id="RuntimePublisher.WaitHandleWaitTaskMessage" value="WaitHandleWait" />
+                <string id="RuntimePublisher.AllocationSamplingTaskMessage" value="AllocationSampling" />
 
                 <string id="RundownPublisher.GCTaskMessage" value="GC" />
                 <string id="RundownPublisher.EEStartupTaskMessage" value="Runtime" />
@@ -9155,6 +9195,7 @@
                 <string id="RundownPublisher.StackKeywordMessage" value="Stack" />
                 <string id="RundownPublisher.CompilationKeywordMessage" value="Compilation" />
                 <string id="RuntimePublisher.WaitHandleKeywordMessage" value="WaitHandle" />
+                <string id="RuntimePublisher.AllocationSamplingKeywordMessage" value="AllocationSampling" />
 
                 <string id="PrivatePublisher.GCPrivateKeywordMessage" value="GC" />
                 <string id="PrivatePublisher.StartupKeywordMessage" value="Startup" />
@@ -9287,7 +9328,7 @@
                 <string id="RuntimePublisher.InstrumentationDataOpcodeMessage" value="InstrumentationData" />
 
                 <string id="RuntimePublisher.ExecutionCheckpointOpcodeMessage" value="ExecutionCheckpoint" />
-                
+
                 <string id="RuntimePublisher.ProfilerOpcodeMessage" value="ProfilerMessage" />
 
                 <string id="RundownPublisher.GCSettingsOpcodeMessage" value="GCSettingsRundown" />
diff --git a/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm b/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm
new file mode 100644
index 00000000000000..b5ee78274d7f14
--- /dev/null
+++ b/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm
@@ -0,0 +1,263 @@
+; Licensed to the .NET Foundation under one or more agreements.
+; The .NET Foundation licenses this file to you under the MIT license.
+
+; ***********************************************************************
+; File: JitHelpers_InlineGetThread.asm, see history in jithelp.asm
+;
+; Notes: These routinues will be patched at runtime with the location in
+;        the TLS to find the Thread* and are the fastest implementation
+;        of their specific functionality.
+; ***********************************************************************
+
+include AsmMacros.inc
+include asmconstants.inc
+
+; Min amount of stack space that a nested function should allocate.
+MIN_SIZE equ 28h
+
+JIT_NEW                 equ     ?JIT_New@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@@Z
+CopyValueClassUnchecked equ     ?CopyValueClassUnchecked@@YAXPEAX0PEAVMethodTable@@@Z
+JIT_Box                 equ     ?JIT_Box@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@PEAX@Z
+g_pStringClass          equ     ?g_pStringClass@@3PEAVMethodTable@@EA
+FramedAllocateString    equ     ?FramedAllocateString@@YAPEAVStringObject@@K@Z
+JIT_NewArr1             equ     ?JIT_NewArr1@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@_J@Z
+
+INVALIDGCVALUE          equ     0CCCCCCCDh
+
+extern JIT_NEW:proc
+extern CopyValueClassUnchecked:proc
+extern JIT_Box:proc
+extern g_pStringClass:QWORD
+extern FramedAllocateString:proc
+extern JIT_NewArr1:proc
+
+extern JIT_InternalThrow:proc
+
+; IN: rcx: MethodTable*
+; OUT: rax: new object
+LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT
+        mov     edx, [rcx + OFFSET__MethodTable__m_BaseSize]
+
+        ; m_BaseSize is guaranteed to be a multiple of 8.
+
+        INLINE_GETTHREAD r11
+        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit]
+        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr]
+
+        add     rdx, rax
+
+        cmp     rdx, r10
+        ja      AllocFailed
+
+        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx
+        mov     [rax], rcx
+
+        ret
+
+    AllocFailed:
+        jmp     JIT_NEW
+LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT
+
+; HCIMPL2(Object*, JIT_Box, CORINFO_CLASS_HANDLE type, void* unboxedData)
+NESTED_ENTRY JIT_BoxFastMP_InlineGetThread, _TEXT
+
+        ; m_BaseSize is guaranteed to be a multiple of 8.
+        mov     r8d, [rcx + OFFSET__MethodTable__m_BaseSize]
+
+        INLINE_GETTHREAD r11
+        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit]
+        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr]
+
+        add     r8, rax
+
+        cmp     r8, r10
+        ja      AllocFailed
+
+        test    rdx, rdx
+        je      NullRef
+
+        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8
+        mov     [rax], rcx
+
+        ; Check whether the object contains pointers
+        test    dword ptr [rcx + OFFSETOF__MethodTable__m_dwFlags], MethodTable__enum_flag_ContainsPointers
+        jnz     ContainsPointers
+
+        ; We have no pointers - emit a simple inline copy loop
+        ; Copy the contents from the end
+        mov     ecx, [rcx + OFFSET__MethodTable__m_BaseSize]
+        sub     ecx, 18h  ; sizeof(ObjHeader) + sizeof(Object) + last slot
+
+align 16
+    CopyLoop:
+        mov     r8, [rdx+rcx]
+        mov     [rax+rcx+8], r8
+        sub     ecx, 8
+        jge     CopyLoop
+        REPRET
+
+    ContainsPointers:
+        ; Do call to CopyValueClassUnchecked(object, data, pMT)
+        push_vol_reg rax
+        alloc_stack 20h
+        END_PROLOGUE
+
+        mov     r8, rcx
+        lea     rcx, [rax + 8]
+        call    CopyValueClassUnchecked
+
+        add     rsp, 20h
+        pop     rax
+        ret
+
+    AllocFailed:
+    NullRef:
+        jmp     JIT_Box
+NESTED_END JIT_BoxFastMP_InlineGetThread, _TEXT
+
+LEAF_ENTRY AllocateStringFastMP_InlineGetThread, _TEXT
+        ; We were passed the number of characters in ECX
+
+        ; we need to load the method table for string from the global
+        mov     r9, [g_pStringClass]
+
+        ; Instead of doing elaborate overflow checks, we just limit the number of elements
+        ; to (LARGE_OBJECT_SIZE - 256)/sizeof(WCHAR) or less.
+        ; This will avoid all overflow problems, as well as making sure
+        ; big string objects are correctly allocated in the big object heap.
+
+        cmp     ecx, (ASM_LARGE_OBJECT_SIZE - 256)/2
+        jae     OversizedString
+
+        ; Calculate the final size to allocate.
+        ; We need to calculate baseSize + cnt*2, then round that up by adding 7 and anding ~7.
+
+        lea     edx, [STRING_BASE_SIZE + ecx*2 + 7]
+        and     edx, -8
+
+        INLINE_GETTHREAD r11
+        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit]
+        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr]
+
+        add     rdx, rax
+
+        cmp     rdx, r10
+        ja      AllocFailed
+
+        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx
+        mov     [rax], r9
+
+        mov     [rax + OFFSETOF__StringObject__m_StringLength], ecx
+
+        ret
+
+    OversizedString:
+    AllocFailed:
+        jmp     FramedAllocateString
+LEAF_END AllocateStringFastMP_InlineGetThread, _TEXT
+
+; HCIMPL2(Object*, JIT_NewArr1VC_MP_InlineGetThread, CORINFO_CLASS_HANDLE arrayMT, INT_PTR size)
+LEAF_ENTRY JIT_NewArr1VC_MP_InlineGetThread, _TEXT
+        ; We were passed a (shared) method table in RCX, which contains the element type.
+
+        ; The element count is in RDX
+
+        ; NOTE: if this code is ported for CORINFO_HELP_NEWSFAST_ALIGN8, it will need
+        ; to emulate the double-specific behavior of JIT_TrialAlloc::GenAllocArray.
+
+        ; Do a conservative check here.  This is to avoid overflow while doing the calculations.  We don't
+        ; have to worry about "large" objects, since the allocation quantum is never big enough for
+        ; LARGE_OBJECT_SIZE.
+
+        ; For Value Classes, this needs to be 2^16 - slack (2^32 / max component size),
+        ; The slack includes the size for the array header and round-up ; for alignment.  Use 256 for the
+        ; slack value out of laziness.
+
+        ; In both cases we do a final overflow check after adding to the alloc_ptr.
+
+        cmp     rdx, (65535 - 256)
+        jae     OversizedArray
+
+        movzx   r8d, word ptr [rcx + OFFSETOF__MethodTable__m_dwFlags]  ; component size is low 16 bits
+        imul    r8d, edx
+        add     r8d, dword ptr [rcx + OFFSET__MethodTable__m_BaseSize]
+
+        ; round the size to a multiple of 8
+
+        add     r8d, 7
+        and     r8d, -8
+
+
+        INLINE_GETTHREAD r11
+        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit]
+        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr]
+
+        add     r8, rax
+        jc      AllocFailed
+
+        cmp     r8, r10
+        ja      AllocFailed
+
+        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8
+        mov     [rax], rcx
+
+        mov     dword ptr [rax + OFFSETOF__ArrayBase__m_NumComponents], edx
+
+        ret
+
+    OversizedArray:
+    AllocFailed:
+        jmp     JIT_NewArr1
+LEAF_END JIT_NewArr1VC_MP_InlineGetThread, _TEXT
+
+
+; HCIMPL2(Object*, JIT_NewArr1OBJ_MP_InlineGetThread, CORINFO_CLASS_HANDLE arrayMT, INT_PTR size)
+LEAF_ENTRY JIT_NewArr1OBJ_MP_InlineGetThread, _TEXT
+        ; We were passed a (shared) method table in RCX, which contains the element type.
+
+        ; The element count is in RDX
+
+        ; NOTE: if this code is ported for CORINFO_HELP_NEWSFAST_ALIGN8, it will need
+        ; to emulate the double-specific behavior of JIT_TrialAlloc::GenAllocArray.
+
+        ; Verifies that LARGE_OBJECT_SIZE fits in 32-bit.  This allows us to do array size
+        ; arithmetic using 32-bit registers.
+        .erre ASM_LARGE_OBJECT_SIZE lt 100000000h
+
+        cmp     rdx, (ASM_LARGE_OBJECT_SIZE - 256)/8 ; sizeof(void*)
+        jae     OversizedArray
+
+        ; In this case we know the element size is sizeof(void *), or 8 for x64
+        ; This helps us in two ways - we can shift instead of multiplying, and
+        ; there's no need to align the size either
+
+        mov     r8d, dword ptr [rcx + OFFSET__MethodTable__m_BaseSize]
+        lea     r8d, [r8d + edx * 8]
+
+        ; No need for rounding in this case - element size is 8, and m_BaseSize is guaranteed
+        ; to be a multiple of 8.
+
+        INLINE_GETTHREAD r11
+        mov     r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit]
+        mov     rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr]
+
+        add     r8, rax
+
+        cmp     r8, r10
+        ja      AllocFailed
+
+        mov     [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8
+        mov     [rax], rcx
+
+        mov     dword ptr [rax + OFFSETOF__ArrayBase__m_NumComponents], edx
+
+        ret
+
+    OversizedArray:
+    AllocFailed:
+        jmp     JIT_NewArr1
+LEAF_END JIT_NewArr1OBJ_MP_InlineGetThread, _TEXT
+
+
+        end
+
diff --git a/src/coreclr/vm/amd64/JitHelpers_Slow.asm b/src/coreclr/vm/amd64/JitHelpers_Slow.asm
index 6d322248cdeeec..41a80794c97bbe 100644
--- a/src/coreclr/vm/amd64/JitHelpers_Slow.asm
+++ b/src/coreclr/vm/amd64/JitHelpers_Slow.asm
@@ -169,7 +169,7 @@ endif
 
 
 extern g_global_alloc_lock:dword
-extern g_global_alloc_context:qword
+extern g_global_ee_alloc_context:qword
 
 LEAF_ENTRY JIT_TrialAllocSFastSP, _TEXT
 
@@ -180,15 +180,15 @@ LEAF_ENTRY JIT_TrialAllocSFastSP, _TEXT
         inc     [g_global_alloc_lock]
         jnz     JIT_NEW
 
-        mov     rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr]       ; alloc_ptr
-        mov     r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit]     ; limit_ptr
+        mov     rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr]      ; alloc_ptr
+        mov     r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit
 
         add     r8, rax
 
         cmp     r8, r10
         ja      AllocFailed
 
-        mov     qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8     ; update the alloc ptr
+        mov     qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8      ; update the alloc ptr
         mov     [rax], rcx
         mov     [g_global_alloc_lock], -1
 
@@ -208,8 +208,8 @@ NESTED_ENTRY JIT_BoxFastUP, _TEXT
         inc     [g_global_alloc_lock]
         jnz     JIT_Box
 
-        mov     rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr]       ; alloc_ptr
-        mov     r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit]     ; limit_ptr
+        mov     rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr]      ; alloc_ptr
+        mov     r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit
 
         add     r8, rax
 
@@ -219,7 +219,7 @@ NESTED_ENTRY JIT_BoxFastUP, _TEXT
         test    rdx, rdx
         je      NullRef
 
-        mov     qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8     ; update the alloc ptr
+        mov     qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8      ; update the alloc ptr
         mov     [rax], rcx
         mov     [g_global_alloc_lock], -1
 
@@ -287,15 +287,15 @@ LEAF_ENTRY AllocateStringFastUP, _TEXT
         inc     [g_global_alloc_lock]
         jnz     FramedAllocateString
 
-        mov     rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr]       ; alloc_ptr
-        mov     r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit]     ; limit_ptr
+        mov     rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr]      ; alloc_ptr
+        mov     r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit
 
         add     r8, rax
 
         cmp     r8, r10
         ja      AllocFailed
 
-        mov     qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8     ; update the alloc ptr
+        mov     qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8      ; update the alloc ptr
         mov     [rax], r11
         mov     [g_global_alloc_lock], -1
 
@@ -343,8 +343,8 @@ LEAF_ENTRY JIT_NewArr1VC_UP, _TEXT
         inc     [g_global_alloc_lock]
         jnz     JIT_NewArr1
 
-        mov     rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr]       ; alloc_ptr
-        mov     r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit]     ; limit_ptr
+        mov     rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr]      ; alloc_ptr
+        mov     r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit
 
         add     r8, rax
         jc      AllocFailed
@@ -352,7 +352,7 @@ LEAF_ENTRY JIT_NewArr1VC_UP, _TEXT
         cmp     r8, r10
         ja      AllocFailed
 
-        mov     qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8     ; update the alloc ptr
+        mov     qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8      ; update the alloc ptr
         mov     [rax], rcx
         mov     [g_global_alloc_lock], -1
 
@@ -396,15 +396,15 @@ LEAF_ENTRY JIT_NewArr1OBJ_UP, _TEXT
         inc     [g_global_alloc_lock]
         jnz     JIT_NewArr1
 
-        mov     rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr]       ; alloc_ptr
-        mov     r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit]     ; limit_ptr
+        mov     rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr]      ; alloc_ptr
+        mov     r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit
 
         add     r8, rax
 
         cmp     r8, r10
         ja      AllocFailed
 
-        mov     qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8     ; update the alloc ptr
+        mov     qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8      ; update the alloc ptr
         mov     [rax], rcx
         mov     [g_global_alloc_lock], -1
 
diff --git a/src/coreclr/vm/amd64/asmconstants.h b/src/coreclr/vm/amd64/asmconstants.h
index 524e1fd40b7ae8..0eefdbbf9c5d97 100644
--- a/src/coreclr/vm/amd64/asmconstants.h
+++ b/src/coreclr/vm/amd64/asmconstants.h
@@ -111,11 +111,24 @@ ASMCONSTANTS_C_ASSERT(OFFSETOF__Thread__m_pFrame
 #define Thread_m_pFrame OFFSETOF__Thread__m_pFrame
 
 
-#define               OFFSETOF__gc_alloc_context__alloc_ptr 0x0
-ASMCONSTANT_OFFSETOF_ASSERT(gc_alloc_context, alloc_ptr);
+// ----------------------------------
+// TODO: all these offsets are now invalid because the allocation context is now in a TLS instead of being relative to a Thread instance
 
-#define               OFFSETOF__gc_alloc_context__alloc_limit 0x8
-ASMCONSTANT_OFFSETOF_ASSERT(gc_alloc_context, alloc_limit);
+#define               OFFSET__Thread__m_alloc_context__alloc_ptr 0x50
+//ASMCONSTANTS_C_ASSERT(OFFSET__Thread__m_alloc_context__alloc_ptr == offsetof(Thread, m_alloc_context) + offsetof(ee_alloc_context, gc_alloc_context) + offsetof(gc_alloc_context, alloc_ptr));
+
+#define               OFFSET__Thread__m_alloc_context__combined_limit 0x48
+//ASMCONSTANTS_C_ASSERT(OFFSET__Thread__m_alloc_context__combined_limit == offsetof(Thread, m_alloc_context) + offsetof(ee_alloc_context, combined_limit));
+
+#define               OFFSETOF__ee_alloc_context__alloc_ptr 0x8
+//ASMCONSTANTS_C_ASSERT(OFFSETOF__ee_alloc_context__alloc_ptr == offsetof(ee_alloc_context, gc_alloc_context) + offsetof(gc_alloc_context, alloc_ptr));
+
+// if we keep the ee_alloc_context idea, this should be the offset of the alloc_ptr (after the combined_limit field
+#define               OFFSETOF__gc_alloc_context__alloc_ptr 0x8
+// ----------------------------------
+
+#define               OFFSETOF__ee_alloc_context__combined_limit 0x0
+ASMCONSTANTS_C_ASSERT(OFFSETOF__ee_alloc_context__combined_limit == offsetof(ee_alloc_context, combined_limit));
 
 #define               OFFSETOF__ThreadExceptionState__m_pCurrentTracker 0x000
 ASMCONSTANTS_C_ASSERT(OFFSETOF__ThreadExceptionState__m_pCurrentTracker
diff --git a/src/coreclr/vm/common.h b/src/coreclr/vm/common.h
index 92e9c5f1d58a6e..48630557f22aa2 100644
--- a/src/coreclr/vm/common.h
+++ b/src/coreclr/vm/common.h
@@ -159,6 +159,7 @@ typedef VPTR(class VirtualCallStubManager) PTR_VirtualCallStubManager;
 typedef VPTR(class VirtualCallStubManagerManager) PTR_VirtualCallStubManagerManager;
 typedef VPTR(class IGCHeap)             PTR_IGCHeap;
 typedef VPTR(class ModuleBase)          PTR_ModuleBase;
+typedef DPTR(struct gc_alloc_context)   PTR_gc_alloc_context;
 
 //
 // _UNCHECKED_OBJECTREF is for code that can't deal with DEBUG OBJECTREFs
diff --git a/src/coreclr/vm/comutilnative.cpp b/src/coreclr/vm/comutilnative.cpp
index a281ac7505d089..eca0a8b80803b0 100644
--- a/src/coreclr/vm/comutilnative.cpp
+++ b/src/coreclr/vm/comutilnative.cpp
@@ -848,7 +848,7 @@ FCIMPL0(INT64, GCInterface::GetAllocatedBytesForCurrentThread)
 
     INT64 currentAllocated = 0;
     Thread *pThread = GetThread();
-    gc_alloc_context* ac = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context* ac = &t_runtime_thread_locals.alloc_context.gc_allocation_context;
     currentAllocated = ac->alloc_bytes + ac->alloc_bytes_uoh - (ac->alloc_limit - ac->alloc_ptr);
 
     return currentAllocated;
diff --git a/src/coreclr/vm/gccover.cpp b/src/coreclr/vm/gccover.cpp
index b7ae97613d507d..ab564c6ba17730 100644
--- a/src/coreclr/vm/gccover.cpp
+++ b/src/coreclr/vm/gccover.cpp
@@ -1834,7 +1834,7 @@ void DoGcStress (PCONTEXT regs, NativeCodeVersion nativeCodeVersion)
     // BUG(github #10318) - when not using allocation contexts, the alloc lock
     // must be acquired here. Until fixed, this assert prevents random heap corruption.
     assert(GCHeapUtilities::UseThreadAllocationContexts());
-    GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context);
+    GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context);
 
     // StressHeap can exit early w/o forcing a SuspendEE to trigger the instruction update
     // We can not rely on the return code to determine if the instruction update happened
diff --git a/src/coreclr/vm/gcenv.ee.cpp b/src/coreclr/vm/gcenv.ee.cpp
index 852c655cf9591e..038267e80ea100 100644
--- a/src/coreclr/vm/gcenv.ee.cpp
+++ b/src/coreclr/vm/gcenv.ee.cpp
@@ -443,7 +443,34 @@ gc_alloc_context * GCToEEInterface::GetAllocContext()
         return nullptr;
     }
 
-    return &t_runtime_thread_locals.alloc_context;
+    return &t_runtime_thread_locals.alloc_context.gc_allocation_context;
+}
+
+void InvokeGCAllocCallback(ee_alloc_context* pEEAllocContext, enum_alloc_context_func* fn, void* param)
+{
+    // NOTE: Its possible that alloc_ptr = alloc_limit = combined_limit = NULL at this point
+    gc_alloc_context* pAllocContext = &pEEAllocContext->gc_allocation_context;
+
+    // The allocation context might be modified by the callback, so we need to save
+    // the remaining sampling budget and restore it after the callback if needed.
+    size_t currentSamplingBudget = (size_t)(pEEAllocContext->combined_limit - pAllocContext->alloc_ptr);
+    size_t currentSize = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr);
+
+    fn(pAllocContext, param);
+
+    // If the GC changed the size of the allocation context, we need to recompute the sampling limit
+    // This includes the case where the AC was initially zero-sized/uninitialized.
+    // Functionally we'd get valid results if we called UpdateCombinedLimit() unconditionally but its
+    // empirically a little more performant to only call it when the AC size has changed.
+    if (currentSize != (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr))
+    {
+        pEEAllocContext->UpdateCombinedLimit();
+    }
+    else
+    {
+        // Restore the remaining sampling budget as the size is the same.
+        pEEAllocContext->combined_limit = pAllocContext->alloc_ptr + currentSamplingBudget;
+    }
 }
 
 void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* param)
@@ -460,16 +487,12 @@ void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* par
         Thread * pThread = NULL;
         while ((pThread = ThreadStore::GetThreadList(pThread)) != NULL)
         {
-            gc_alloc_context* palloc_context = pThread->GetAllocContext();
-            if (palloc_context != nullptr)
-            {
-                fn(palloc_context, param);
-            }
+            InvokeGCAllocCallback(pThread->GetEEAllocContext(), fn, param);
         }
     }
     else
     {
-        fn(&g_global_alloc_context, param);
+        InvokeGCAllocCallback(&g_global_ee_alloc_context, fn, param);
     }
 }
 
diff --git a/src/coreclr/vm/gcheaputilities.cpp b/src/coreclr/vm/gcheaputilities.cpp
index cd0259eef45d83..65d47130765044 100644
--- a/src/coreclr/vm/gcheaputilities.cpp
+++ b/src/coreclr/vm/gcheaputilities.cpp
@@ -41,7 +41,10 @@ bool g_sw_ww_enabled_for_gc_heap = false;
 
 #endif // FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP
 
-GVAL_IMPL_INIT(gc_alloc_context, g_global_alloc_context, {});
+ee_alloc_context g_global_ee_alloc_context = {};
+GPTR_IMPL_INIT(gc_alloc_context, g_global_alloc_context, &(g_global_ee_alloc_context.gc_allocation_context));
+
+thread_local ee_alloc_context::CLRRandomHolder ee_alloc_context::t_instance = CLRRandomHolder();
 
 enum GC_LOAD_STATUS {
     GC_LOAD_STATUS_BEFORE_START,
diff --git a/src/coreclr/vm/gcheaputilities.h b/src/coreclr/vm/gcheaputilities.h
index c652cc52bf417c..b558a1a7f18712 100644
--- a/src/coreclr/vm/gcheaputilities.h
+++ b/src/coreclr/vm/gcheaputilities.h
@@ -4,7 +4,12 @@
 #ifndef _GCHEAPUTILITIES_H_
 #define _GCHEAPUTILITIES_H_
 
+#include "eventtracebase.h"
 #include "gcinterface.h"
+#include "math.h"
+
+// TODO: trying to use Thread members but compilation errors
+// #include "threads.h"
 
 // The singular heap instance.
 GPTR_DECL(IGCHeap, g_pGCHeap);
@@ -12,6 +17,113 @@ GPTR_DECL(IGCHeap, g_pGCHeap);
 #ifndef DACCESS_COMPILE
 extern "C" {
 #endif // !DACCESS_COMPILE
+
+
+const DWORD SamplingDistributionMean = (100 * 1024);
+
+// This struct adds some state that is only visible to the EE onto the standard gc_alloc_context
+typedef struct _ee_alloc_context
+{
+    // Any allocation that would overlap combined_limit needs to be handled by the allocation slow path.
+    // combined_limit is the minimum of:
+    //  - gc_alloc_context.alloc_limit (the end of the current AC)
+    //  - the sampling_limit
+    //
+    // In the simple case that randomized sampling is disabled, combined_limit is always equal to alloc_limit.
+    //
+    // There are two different useful interpretations for the sampling_limit. One is to treat the sampling_limit
+    // as an address and when we allocate an object that overlaps that address we should emit a sampling event.
+    // The other is that we can treat (sampling_limit - alloc_ptr) as a budget of how many bytes we can allocate
+    // before emitting a sampling event. If we always allocated objects contiguously in the AC and incremented
+    // alloc_ptr by the size of the object, these two interpretations would be equivalent. However, when objects
+    // don't fit in the AC we allocate them in some other address range. The budget interpretation is more
+    // flexible to handle those cases.
+    //
+    // The sampling limit isn't stored in any separate field explicitly, instead it is implied:
+    // - if combined_limit == alloc_limit there is no sampled byte in the AC. In the budget interpretation
+    //   we can allocate (alloc_limit - alloc_ptr) unsampled bytes. We'll need a new random number after
+    //   that to determine whether future allocated bytes should be sampled.
+    //   This occurs either because the sampling feature is disabled, or because the randomized selection
+    //   of sampled bytes didn't select a byte in this AC.
+    // - if combined_limit < alloc_limit there is a sample limit in the AC. sample_limit = combined_limit.
+    uint8_t* combined_limit;
+    gc_alloc_context gc_allocation_context;
+
+ public:
+    void init()
+    {
+        LIMITED_METHOD_CONTRACT;
+        combined_limit = nullptr;
+        gc_allocation_context.init();
+    }
+
+    static inline bool IsRandomizedSamplingEnabled()
+    {
+#ifdef FEATURE_EVENT_TRACE
+        return ETW_TRACING_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context,
+                                        TRACE_LEVEL_INFORMATION,
+                                        CLR_ALLOCATIONSAMPLING_KEYWORD);
+#else
+        return false;
+#endif // FEATURE_EVENT_TRACE
+    }
+
+    // Regenerate the randomized sampling limit and update the combined_limit field.
+    inline void UpdateCombinedLimit()
+    {
+        UpdateCombinedLimit(IsRandomizedSamplingEnabled());
+    }
+
+    inline void UpdateCombinedLimit(bool samplingEnabled)
+    {
+        if (!samplingEnabled)
+        {
+            combined_limit = gc_allocation_context.alloc_limit;
+        }
+        else
+        {
+            // compute the next sampling limit based on a geometric distribution
+            uint8_t* sampling_limit = gc_allocation_context.alloc_ptr + ComputeGeometricRandom();
+
+            // if the sampling limit is larger than the allocation context, no sampling will occur in this AC
+            combined_limit = Min(sampling_limit, gc_allocation_context.alloc_limit);
+        }
+    }
+
+    static inline int ComputeGeometricRandom()
+    {
+        // compute a random sample from the Geometric distribution
+        double probability = GetRandomizer()->NextDouble();
+        int threshold = (int)(-log(1 - probability) * SamplingDistributionMean);
+        return threshold;
+    }
+
+// per thread lazily allocated randomizer
+    struct CLRRandomHolder
+    {
+        CLRRandom* _p;
+
+        CLRRandomHolder()
+        {
+            _p = new CLRRandom();
+            _p->Init();
+        }
+
+        ~CLRRandomHolder()
+        {
+            delete _p;
+        }
+    };
+
+    static thread_local CLRRandomHolder t_instance;
+
+public:
+    static inline CLRRandom* GetRandomizer()
+    {
+        return t_instance._p;
+    }
+} ee_alloc_context;
+
 GPTR_DECL(uint8_t,g_lowest_address);
 GPTR_DECL(uint8_t,g_highest_address);
 GPTR_DECL(uint32_t,g_card_table);
@@ -21,7 +133,11 @@ GVAL_DECL(GCHeapType, g_heap_type);
 // for all allocations. In order to avoid extra indirections in assembly
 // allocation helpers, the EE owns the global allocation context and the
 // GC will update it when it needs to.
-GVAL_DECL(gc_alloc_context, g_global_alloc_context);
+extern "C" ee_alloc_context g_global_ee_alloc_context;
+
+// This is a pointer into the g_global_ee_alloc_context for the GC visible
+// subset of the data
+GPTR_DECL(gc_alloc_context, g_global_alloc_context);
 #ifndef DACCESS_COMPILE
 }
 #endif // !DACCESS_COMPILE
diff --git a/src/coreclr/vm/gchelpers.cpp b/src/coreclr/vm/gchelpers.cpp
index 335bd3cb25caba..ce9a3bec72dfa7 100644
--- a/src/coreclr/vm/gchelpers.cpp
+++ b/src/coreclr/vm/gchelpers.cpp
@@ -40,7 +40,7 @@
 //
 //========================================================================
 
-inline gc_alloc_context* GetThreadAllocContext()
+inline ee_alloc_context* GetThreadAllocContext()
 {
     WRAPPER_NO_CONTRACT;
 
@@ -183,6 +183,116 @@ inline void CheckObjectSize(size_t alloc_size)
     }
 }
 
+inline void FireAllocationSampled(GC_ALLOC_FLAGS flags, size_t size, size_t samplingBudgetOffset, Object* orObject)
+{
+    // Note: this code is duplicated from GCToCLREventSink::FireGCAllocationTick_V4
+    void* typeId = nullptr;
+    const WCHAR* name = nullptr;
+    InlineSString<MAX_CLASSNAME_LENGTH> strTypeName;
+    EX_TRY
+    {
+        TypeHandle th = GetThread()->GetTHAllocContextObj();
+
+        if (th != 0)
+        {
+            th.GetName(strTypeName);
+            name = strTypeName.GetUnicode();
+            typeId = th.GetMethodTable();
+        }
+    }
+    EX_CATCH{}
+    EX_END_CATCH(SwallowAllExceptions)
+    // end of duplication
+
+    if (typeId != nullptr)
+    {
+        unsigned int allocKind =
+            (flags & GC_ALLOC_PINNED_OBJECT_HEAP) ? 2 :
+            (flags & GC_ALLOC_LARGE_OBJECT_HEAP) ? 1 :
+            0;  // SOH
+        unsigned int heapIndex = 0;
+#ifdef BACKGROUND_GC
+        gc_heap* hp = gc_heap::heap_of((BYTE*)orObject);
+        heapIndex = hp->heap_number;
+#endif
+        FireEtwAllocationSampled(allocKind, GetClrInstanceId(), typeId, name, heapIndex, (BYTE*)orObject, size, samplingBudgetOffset);
+    }
+}
+
+inline Object* Alloc(ee_alloc_context* pEEAllocContext, size_t size, GC_ALLOC_FLAGS flags)
+{
+    CONTRACTL {
+        THROWS;
+        GC_TRIGGERS;
+        MODE_COOPERATIVE; // returns an objref without pinning it => cooperative
+    } CONTRACTL_END;
+
+    Object* retVal = nullptr;
+    gc_alloc_context* pAllocContext = &pEEAllocContext->gc_allocation_context;
+    auto pCurrentThread = GetThread();
+
+    bool isSampled = false;
+    size_t availableSpace = 0;
+    size_t aligned_size = 0;
+    size_t samplingBudget = 0;
+    bool isRandomizedSamplingEnabled = ee_alloc_context::IsRandomizedSamplingEnabled();
+    if (isRandomizedSamplingEnabled)
+    {
+        // object allocations are always padded up to pointer size
+        aligned_size = AlignUp(size, sizeof(uintptr_t));
+
+        // The number bytes we can allocate before we need to emit a sampling event.
+        // This calculation is only valid if combined_limit < alloc_limit.
+        samplingBudget = (size_t)(pEEAllocContext->combined_limit - pAllocContext->alloc_ptr);
+
+        // The number of bytes available in the current allocation context
+        availableSpace = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr);
+
+        // Check to see if the allocated object overlaps a sampled byte
+        // in this AC. This happens when both:
+        // 1) The AC contains a sampled byte (combined_limit < alloc_limit)
+        // 2) The object is large enough to overlap it (samplingBudget < aligned_size)
+        //
+        // Note that the AC could have no remaining space for allocations (alloc_ptr =
+        // alloc_limit = combined_limit). When a thread hasn't done any SOH allocations
+        // yet it also starts in an empty state where alloc_ptr = alloc_limit =
+        // combined_limit = nullptr. The (1) check handles both of these situations
+        // properly as an empty AC can not have a sampled byte inside of it.
+        isSampled =
+            (pEEAllocContext->combined_limit < pAllocContext->alloc_limit) &&
+            (samplingBudget < aligned_size);
+
+        // if the object overflows the AC, we need to sample the remaining bytes
+        // the sampling budget only included at most the bytes inside the AC
+        if (aligned_size > availableSpace && !isSampled)
+        {
+            samplingBudget = ee_alloc_context::ComputeGeometricRandom() + availableSpace;
+            isSampled = (samplingBudget < aligned_size);
+        }
+    }
+
+    GCStress<gc_on_alloc>::MaybeTrigger(pAllocContext);
+
+    // for SOH, if there is enough space in the current allocation context, then
+    // the allocation will be done in place (like in the fast path),
+    // otherwise a new allocation context will be provided
+    retVal = GCHeapUtilities::GetGCHeap()->Alloc(pAllocContext, size, flags);
+
+    if (isSampled)
+    {
+        FireAllocationSampled(flags, aligned_size, samplingBudget, retVal);
+    }
+
+    // There are a variety of conditions that may have invalidated the previous combined_limit value
+    // such as not allocating the object in the AC memory region (UOH allocations), moving the AC, adding
+    // extra alignment padding, allocating a new AC, or allocating an object that consumed the sampling budget.
+    // Rather than test for all the different invalidation conditions individually we conservatively always
+    // recompute it. If sampling isn't enabled this inlined function is just trivially setting
+    // combined_limit=alloc_limit.
+    pEEAllocContext->UpdateCombinedLimit(isRandomizedSamplingEnabled);
+
+    return retVal;
+}
 
 // There are only two ways to allocate an object.
 //     * Call optimized helpers that were generated on the fly. This is how JIT compiled code does most
@@ -222,16 +332,12 @@ inline Object* Alloc(size_t size, GC_ALLOC_FLAGS flags)
 
     if (GCHeapUtilities::UseThreadAllocationContexts())
     {
-        gc_alloc_context *threadContext = GetThreadAllocContext();
-        GCStress<gc_on_alloc>::MaybeTrigger(threadContext);
-        retVal = GCHeapUtilities::GetGCHeap()->Alloc(threadContext, size, flags);
+        retVal = Alloc(GetThreadAllocContext(), size, flags);
     }
     else
     {
         GlobalAllocLockHolder holder(&g_global_alloc_lock);
-        gc_alloc_context *globalContext = &g_global_alloc_context;
-        GCStress<gc_on_alloc>::MaybeTrigger(globalContext);
-        retVal = GCHeapUtilities::GetGCHeap()->Alloc(globalContext, size, flags);
+        retVal = Alloc(&g_global_ee_alloc_context, size, flags);
     }
 
 
@@ -424,70 +530,26 @@ OBJECTREF AllocateSzArray(MethodTable* pArrayMT, INT32 cElements, GC_ALLOC_FLAGS
     }
     else
     {
-#ifndef FEATURE_64BIT_ALIGNMENT
-        if ((DATA_ALIGNMENT < sizeof(double)) && (pArrayMT->GetArrayElementType() == ELEMENT_TYPE_R8) &&
-            (totalSize < GCHeapUtilities::GetGCHeap()->GetLOHThreshold() - MIN_OBJECT_SIZE))
+#ifdef FEATURE_DOUBLE_ALIGNMENT_HINT
+        if (pArrayMT->GetArrayElementType() == ELEMENT_TYPE_R8)
         {
-            // Creation of an array of doubles, not in the large object heap.
-            // We want to align the doubles to 8 byte boundaries, but the GC gives us pointers aligned
-            // to 4 bytes only (on 32 bit platforms). To align, we ask for 12 bytes more to fill with a
-            // dummy object.
-            // If the GC gives us a 8 byte aligned address, we use it for the array and place the dummy
-            // object after the array, otherwise we put the dummy object first, shifting the base of
-            // the array to an 8 byte aligned address. Also, we need to make sure that the syncblock of the
-            // second object is zeroed. GC won't take care of zeroing it out with GC_ALLOC_ZEROING_OPTIONAL.
-            //
-            // Note: on 64 bit platforms, the GC always returns 8 byte aligned addresses, and we don't
-            // execute this code because DATA_ALIGNMENT < sizeof(double) is false.
-
-            _ASSERTE(DATA_ALIGNMENT == sizeof(double) / 2);
-            _ASSERTE((MIN_OBJECT_SIZE % sizeof(double)) == DATA_ALIGNMENT);   // used to change alignment
-            _ASSERTE(pArrayMT->GetComponentSize() == sizeof(double));
-            _ASSERTE(g_pObjectClass->GetBaseSize() == MIN_OBJECT_SIZE);
-            _ASSERTE(totalSize < totalSize + MIN_OBJECT_SIZE);
-            orArray = (ArrayBase*)Alloc(totalSize + MIN_OBJECT_SIZE, flags);
-
-            Object* orDummyObject;
-            if (((size_t)orArray % sizeof(double)) != 0)
-            {
-                orDummyObject = orArray;
-                orArray = (ArrayBase*)((size_t)orArray + MIN_OBJECT_SIZE);
-                if (flags & GC_ALLOC_ZEROING_OPTIONAL)
-                {
-                    // clean the syncblock of the aligned array.
-                    *(((void**)orArray)-1) = 0;
-                }
-            }
-            else
-            {
-                orDummyObject = (Object*)((size_t)orArray + totalSize);
-                if (flags & GC_ALLOC_ZEROING_OPTIONAL)
-                {
-                    // clean the syncblock of the dummy object.
-                    *(((void**)orDummyObject)-1) = 0;
-                }
-            }
-            _ASSERTE(((size_t)orArray % sizeof(double)) == 0);
-            orDummyObject->SetMethodTable(g_pObjectClass);
+            flags |= GC_ALLOC_ALIGN8;
         }
-        else
-#endif  // FEATURE_64BIT_ALIGNMENT
-        {
-#ifdef FEATURE_64BIT_ALIGNMENT
-            MethodTable* pElementMT = pArrayMT->GetArrayElementTypeHandle().GetMethodTable();
-            if (pElementMT->RequiresAlign8() && pElementMT->IsValueType())
-            {
-                // This platform requires that certain fields are 8-byte aligned (and the runtime doesn't provide
-                // this guarantee implicitly, e.g. on 32-bit platforms). Since it's the array payload, not the
-                // header that requires alignment we need to be careful. However it just so happens that all the
-                // cases we care about (single and multi-dim arrays of value types) have an even number of DWORDs
-                // in their headers so the alignment requirements for the header and the payload are the same.
-                _ASSERTE(((pArrayMT->GetBaseSize() - SIZEOF_OBJHEADER) & 7) == 0);
-                flags |= GC_ALLOC_ALIGN8;
-            }
 #endif
-            orArray = (ArrayBase*)Alloc(totalSize, flags);
+#ifdef FEATURE_64BIT_ALIGNMENT
+        MethodTable* pElementMT = pArrayMT->GetArrayElementTypeHandle().GetMethodTable();
+        if (pElementMT->RequiresAlign8() && pElementMT->IsValueType())
+        {
+            // This platform requires that certain fields are 8-byte aligned (and the runtime doesn't provide
+            // this guarantee implicitly, e.g. on 32-bit platforms). Since it's the array payload, not the
+            // header that requires alignment we need to be careful. However it just so happens that all the
+            // cases we care about (single and multi-dim arrays of value types) have an even number of DWORDs
+            // in their headers so the alignment requirements for the header and the payload are the same.
+            _ASSERTE(((pArrayMT->GetBaseSize() - SIZEOF_OBJHEADER) & 7) == 0);
+            flags |= GC_ALLOC_ALIGN8;
         }
+#endif
+        orArray = (ArrayBase*)Alloc(totalSize, flags);
         orArray->SetMethodTable(pArrayMT);
     }
 
diff --git a/src/coreclr/vm/gcstress.h b/src/coreclr/vm/gcstress.h
index 23b11d9989fcf6..a5626da1b6961c 100644
--- a/src/coreclr/vm/gcstress.h
+++ b/src/coreclr/vm/gcstress.h
@@ -298,7 +298,7 @@ namespace _GCStress
             // BUG(github #10318) - when not using allocation contexts, the alloc lock
             // must be acquired here. Until fixed, this assert prevents random heap corruption.
             _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts());
-            GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context);
+            GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context);
         }
 
         FORCEINLINE
diff --git a/src/coreclr/vm/gctoclreventsink.cpp b/src/coreclr/vm/gctoclreventsink.cpp
index fff929d51567a5..ce75e4cc661830 100644
--- a/src/coreclr/vm/gctoclreventsink.cpp
+++ b/src/coreclr/vm/gctoclreventsink.cpp
@@ -162,6 +162,16 @@ void GCToCLREventSink::FireGCAllocationTick_V4(uint64_t allocationAmount,
 {
     LIMITED_METHOD_CONTRACT;
 
+#ifdef FEATURE_EVENT_TRACE
+    if (ETW_TRACING_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context,
+        TRACE_LEVEL_INFORMATION,
+        CLR_ALLOCATIONSAMPLING_KEYWORD))
+    {
+        // skip AllocationTick if AllocationSampled is emitted
+        return;
+    }
+#endif // FEATURE_EVENT_TRACE
+
     void * typeId = nullptr;
     const WCHAR * name = nullptr;
     InlineSString<MAX_CLASSNAME_LENGTH> strTypeName;
diff --git a/src/coreclr/vm/i386/jitinterfacex86.cpp b/src/coreclr/vm/i386/jitinterfacex86.cpp
index 3807b00a8ca6e1..ecad50a00a8644 100644
--- a/src/coreclr/vm/i386/jitinterfacex86.cpp
+++ b/src/coreclr/vm/i386/jitinterfacex86.cpp
@@ -237,8 +237,9 @@ void JIT_TrialAlloc::EmitCore(CPUSTUBLINKER *psl, CodeLabel *noLock, CodeLabel *
 
         if (flags & (ALIGN8 | SIZE_IN_EAX | ALIGN8OBJ))
         {
-            // MOV EBX, [edx]gc_alloc_context.alloc_ptr
-            psl->X86EmitOffsetModRM(0x8B, kEBX, kEDX, offsetof(gc_alloc_context, alloc_ptr));
+            // MOV EBX, [edx]alloc_context.gc_allocation_context.alloc_ptr
+            psl->X86EmitOffsetModRM(0x8B, kEBX, kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr));
+
             // add EAX, EBX
             psl->Emit16(0xC303);
             if (flags & ALIGN8)
@@ -246,20 +247,20 @@ void JIT_TrialAlloc::EmitCore(CPUSTUBLINKER *psl, CodeLabel *noLock, CodeLabel *
         }
         else
         {
-            // add             eax, [edx]gc_alloc_context.alloc_ptr
-            psl->X86EmitOffsetModRM(0x03, kEAX, kEDX, offsetof(gc_alloc_context, alloc_ptr));
+            // add             eax, [edx]alloc_context.gc_allocation_context.alloc_ptr
+            psl->X86EmitOffsetModRM(0x03, kEAX, kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr));
         }
 
-        // cmp             eax, [edx]gc_alloc_context.alloc_limit
-        psl->X86EmitOffsetModRM(0x3b, kEAX, kEDX, offsetof(gc_alloc_context, alloc_limit));
+        // cmp             eax, [edx]alloc_context.combined_limit
+        psl->X86EmitOffsetModRM(0x3b, kEAX, kEDX, offsetof(ee_alloc_context, combined_limit));
 
         // ja              noAlloc
         psl->X86EmitCondJump(noAlloc, X86CondCode::kJA);
 
         // Fill in the allocation and get out.
 
-        // mov             [edx]gc_alloc_context.alloc_ptr, eax
-        psl->X86EmitIndexRegStore(kEDX, offsetof(gc_alloc_context, alloc_ptr), kEAX);
+        // mov             [edx]alloc_context.gc_allocation_context.alloc_ptr, eax
+        psl->X86EmitIndexRegStore(kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr), kEAX);
 
         if (flags & (ALIGN8 | SIZE_IN_EAX | ALIGN8OBJ))
         {
diff --git a/src/coreclr/vm/i386/stublinkerx86.cpp b/src/coreclr/vm/i386/stublinkerx86.cpp
index cfe9eec74af2e5..2a3cbc765dfc52 100644
--- a/src/coreclr/vm/i386/stublinkerx86.cpp
+++ b/src/coreclr/vm/i386/stublinkerx86.cpp
@@ -2432,7 +2432,7 @@ VOID StubLinkerCPU::X86EmitCurrentThreadFetch(X86Reg dstreg, unsigned preservedR
 #ifdef TARGET_UNIX
 namespace
 {
-    gc_alloc_context* STDCALL GetAllocContextHelper()
+    ee_alloc_context* STDCALL GetAllocContextHelper()
     {
         return &t_runtime_thread_locals.alloc_context;
     }
diff --git a/src/coreclr/vm/jithelpers.cpp b/src/coreclr/vm/jithelpers.cpp
index 1bfeaf2b039289..b039b76a55c046 100644
--- a/src/coreclr/vm/jithelpers.cpp
+++ b/src/coreclr/vm/jithelpers.cpp
@@ -1668,7 +1668,8 @@ HCIMPL1_RAW(Object*, JIT_NewS_MP_FastPortable, CORINFO_CLASS_HANDLE typeHnd_)
     } CONTRACTL_END;
 
     _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts());
-    gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context;
+    ee_alloc_context *eeAllocContext = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context *allocContext = &eeAllocContext->gc_allocation_context;
 
     TypeHandle typeHandle(typeHnd_);
     _ASSERTE(!typeHandle.IsTypeDesc()); // heap objects must have method tables
@@ -1678,13 +1679,15 @@ HCIMPL1_RAW(Object*, JIT_NewS_MP_FastPortable, CORINFO_CLASS_HANDLE typeHnd_)
     _ASSERTE(size % DATA_ALIGNMENT == 0);
 
     BYTE *allocPtr = allocContext->alloc_ptr;
-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (size > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (size > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))
     {
         // Tail call to the slow helper
         return HCCALL1(JIT_New, typeHnd_);
     }
 
+    _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit);
+
     allocContext->alloc_ptr = allocPtr + size;
 
     _ASSERTE(allocPtr != nullptr);
@@ -1785,7 +1788,8 @@ HCIMPL1_RAW(StringObject*, AllocateString_MP_FastPortable, DWORD stringLength)
         return HCCALL1(FramedAllocateString, stringLength);
     }
 
-    gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context;
+    ee_alloc_context *eeAllocContext = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context *allocContext = &eeAllocContext->gc_allocation_context;
 
     SIZE_T totalSize = StringObject::GetSize(stringLength);
 
@@ -1798,12 +1802,15 @@ HCIMPL1_RAW(StringObject*, AllocateString_MP_FastPortable, DWORD stringLength)
     totalSize = alignedTotalSize;
 
     BYTE *allocPtr = allocContext->alloc_ptr;
-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (totalSize > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (totalSize > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))
     {
         // Tail call to the slow helper
         return HCCALL1(FramedAllocateString, stringLength);
     }
+
+    _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit);
+
     allocContext->alloc_ptr = allocPtr + totalSize;
 
     _ASSERTE(allocPtr != nullptr);
@@ -1901,7 +1908,8 @@ HCIMPL2_RAW(Object*, JIT_NewArr1VC_MP_FastPortable, CORINFO_CLASS_HANDLE arrayMT
         return HCCALL2(JIT_NewArr1, arrayMT, size);
     }
 
-    gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context;
+    ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context;
 
     MethodTable *pArrayMT = (MethodTable *)arrayMT;
 
@@ -1919,12 +1927,15 @@ HCIMPL2_RAW(Object*, JIT_NewArr1VC_MP_FastPortable, CORINFO_CLASS_HANDLE arrayMT
     totalSize = alignedTotalSize;
 
     BYTE *allocPtr = allocContext->alloc_ptr;
-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (totalSize > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (totalSize > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))
     {
         // Tail call to the slow helper
         return HCCALL2(JIT_NewArr1, arrayMT, size);
     }
+
+    _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit);
+
     allocContext->alloc_ptr = allocPtr + totalSize;
 
     _ASSERTE(allocPtr != nullptr);
@@ -1970,14 +1981,18 @@ HCIMPL2_RAW(Object*, JIT_NewArr1OBJ_MP_FastPortable, CORINFO_CLASS_HANDLE arrayM
 
     _ASSERTE(ALIGN_UP(totalSize, DATA_ALIGNMENT) == totalSize);
 
-    gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context;
+    ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context;
     BYTE *allocPtr = allocContext->alloc_ptr;
-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (totalSize > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (totalSize > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))
     {
         // Tail call to the slow helper
         return HCCALL2(JIT_NewArr1, arrayMT, size);
     }
+
+    _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit);
+
     allocContext->alloc_ptr = allocPtr + totalSize;
 
     _ASSERTE(allocPtr != nullptr);
@@ -2120,7 +2135,8 @@ HCIMPL2_RAW(Object*, JIT_Box_MP_FastPortable, CORINFO_CLASS_HANDLE type, void* u
     }
 
     _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts());
-    gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context;
+    ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context;
+    gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context;
 
     TypeHandle typeHandle(type);
     _ASSERTE(!typeHandle.IsTypeDesc()); // heap objects must have method tables
@@ -2139,13 +2155,15 @@ HCIMPL2_RAW(Object*, JIT_Box_MP_FastPortable, CORINFO_CLASS_HANDLE type, void* u
     _ASSERTE(size % DATA_ALIGNMENT == 0);
 
     BYTE *allocPtr = allocContext->alloc_ptr;
-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (size > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (size > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))
     {
         // Tail call to the slow helper
         return HCCALL2(JIT_Box, type, unboxedData);
     }
 
+    _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit);
+
     allocContext->alloc_ptr = allocPtr + size;
 
     _ASSERTE(allocPtr != nullptr);
diff --git a/src/coreclr/vm/threads.cpp b/src/coreclr/vm/threads.cpp
index f98a5cf58a2251..723aa1e90fd4b5 100644
--- a/src/coreclr/vm/threads.cpp
+++ b/src/coreclr/vm/threads.cpp
@@ -2766,8 +2766,8 @@ void Thread::CooperativeCleanup()
         // If the GC heap is initialized, we need to fix the alloc context for this detaching thread.
         // GetTotalAllocatedBytes reads dead_threads_non_alloc_bytes, but will suspend EE, being in COOP mode we cannot race with that
         // however, there could be other threads terminating and doing the same Add.
-        InterlockedExchangeAdd64((LONG64*)&dead_threads_non_alloc_bytes, t_runtime_thread_locals.alloc_context.alloc_limit - t_runtime_thread_locals.alloc_context.alloc_ptr);
-        GCHeapUtilities::GetGCHeap()->FixAllocContext(&t_runtime_thread_locals.alloc_context, NULL, NULL);
+        InterlockedExchangeAdd64((LONG64*)&dead_threads_non_alloc_bytes, t_runtime_thread_locals.alloc_context.gc_allocation_context.alloc_limit - t_runtime_thread_locals.alloc_context.gc_allocation_context.alloc_ptr);
+        GCHeapUtilities::GetGCHeap()->FixAllocContext(&t_runtime_thread_locals.alloc_context.gc_allocation_context, NULL, NULL);
         t_runtime_thread_locals.alloc_context.init(); // re-initialize the context.
 
         // Clear out the alloc context pointer for this thread. When TLS is gone, this pointer will point into freed memory.
diff --git a/src/coreclr/vm/threads.h b/src/coreclr/vm/threads.h
index 429031cf5493a1..5155097f2c9a3c 100644
--- a/src/coreclr/vm/threads.h
+++ b/src/coreclr/vm/threads.h
@@ -453,7 +453,7 @@ struct RuntimeThreadLocals
 {
     // on MP systems, each thread has its own allocation chunk so we can avoid
     // lock prefixes and expensive MP cache snooping stuff
-    gc_alloc_context alloc_context;
+    ee_alloc_context alloc_context;
 };
 
 #ifdef _MSC_VER
@@ -971,7 +971,14 @@ class Thread
 public:
     inline void InitRuntimeThreadLocals() { LIMITED_METHOD_CONTRACT; m_pRuntimeThreadLocals = PTR_RuntimeThreadLocals(&t_runtime_thread_locals); }
 
-    inline PTR_gc_alloc_context GetAllocContext() { LIMITED_METHOD_CONTRACT; return PTR_gc_alloc_context(&m_pRuntimeThreadLocals->alloc_context); }
+    inline ee_alloc_context *GetEEAllocContext() { LIMITED_METHOD_CONTRACT; return &m_pRuntimeThreadLocals->alloc_context; }
+    inline PTR_gc_alloc_context GetAllocContext()
+    {
+        LIMITED_METHOD_CONTRACT;
+        return (m_pRuntimeThreadLocals == nullptr)
+            ? nullptr
+            : PTR_gc_alloc_context(&m_pRuntimeThreadLocals->alloc_context.gc_allocation_context);
+    }
 
     // This is the type handle of the first object in the alloc context at the time
     // we fire the AllocationTick event. It's only for tooling purpose.
@@ -3723,6 +3730,13 @@ class Thread
     // See ThreadStore::TriggerGCForDeadThreadsIfNecessary()
     bool m_fHasDeadThreadBeenConsideredForGCTrigger;
 
+    // lazily allocated
+    CLRRandom* m_pRandom;
+
+    public:
+        // TODO: where to delete the allocated CLRRandom object?
+        CLRRandom* GetRandom() { if (m_pRandom == nullptr) { m_pRandom = new CLRRandom(); m_pRandom->Init(); } return m_pRandom; }
+
 #ifdef FEATURE_COMINTEROP
 private:
     // Cookie returned from CoRegisterInitializeSpy
diff --git a/src/coreclr/vm/threadsuspend.cpp b/src/coreclr/vm/threadsuspend.cpp
index 9cdb8689984339..9649599df5181a 100644
--- a/src/coreclr/vm/threadsuspend.cpp
+++ b/src/coreclr/vm/threadsuspend.cpp
@@ -2363,7 +2363,7 @@ void Thread::PerformPreemptiveGC()
         // BUG(github #10318) - when not using allocation contexts, the alloc lock
         // must be acquired here. Until fixed, this assert prevents random heap corruption.
         _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts());
-        GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context);
+        GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context);
         m_bGCStressing = FALSE;
     }
     m_GCOnTransitionsOK = TRUE;
diff --git a/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs b/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs
index 2d7f92cb4cb247..13634724422117 100644
--- a/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs
+++ b/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs
@@ -11,6 +11,15 @@ static RuntimeThreadLocals IData<RuntimeThreadLocals>.Create(Target target, Targ
     public RuntimeThreadLocals(Target target, TargetPointer address)
     {
         Target.TypeInfo type = target.GetTypeInfo(DataType.RuntimeThreadLocals);
+
+        // TODO: Before the GCAllocationContext, there is a pointer to the "combined limit" used to randomly sample allocations.
+        //       How to get the size of a pointer here so the offset should be correct?
+        //ex:
+        //  AllocContext = target.ProcessedData.GetOrAdd<GCAllocContext>(
+        //      address +
+        //      (ulong)type.Fields[nameof(AllocContext)].Offset
+        //      + sizeof(pointer)
+        //      );
         AllocContext = target.ProcessedData.GetOrAdd<GCAllocContext>(address + (ulong)type.Fields[nameof(AllocContext)].Offset);
     }
 
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs
new file mode 100644
index 00000000000000..1856bfc082cff9
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs
@@ -0,0 +1,174 @@
+// Licensed to the .NET Foundation under one or more agreements.
+// The .NET Foundation licenses this file to you under the MIT license.
+
+using System;
+using System.Collections.Generic;
+using System.Diagnostics;
+using System.Diagnostics.Tracing;
+using System.IO;
+using System.Linq;
+using System.Text;
+using System.Threading;
+using System.Threading.Tasks;
+using Microsoft.Diagnostics.Tracing;
+using Microsoft.Diagnostics.Tracing.Parsers.Clr;
+using Microsoft.Diagnostics.NETCore.Client;
+using Tracing.Tests.Common;
+using Xunit;
+
+namespace Tracing.Tests
+{
+    public class AllocationSamplingValidation
+    {
+        [Fact]
+        public static int TestEntryPoint()
+        {
+            // check that AllocationSampled events are generated and size + type name are correct
+            var ret = IpcTraceTest.RunAndValidateEventCounts(
+                new Dictionary<string, ExpectedEventCount>() { { "Microsoft-Windows-DotNETRuntime", -1 } },
+                _eventGeneratingActionForAllocations,
+                // AllocationSamplingKeyword (0x80000000000): 0b1000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000
+                new List<EventPipeProvider>() { new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0x80000000000) },
+                1024, _DoesTraceContainEnoughAllocationSampledEvents, enableRundownProvider: false);
+            if (ret != 100)
+                return ret;
+
+            return 100;
+        }
+
+        const int InstanceCount = 2000000;
+        const int MinExpectedEvents = 1;
+        static List<Object128> _objects128s = new List<Object128>(InstanceCount);
+
+        // allocate objects to trigger dynamic allocation sampling events
+        private static Action _eventGeneratingActionForAllocations = () =>
+        {
+            _objects128s.Clear();
+            for (int i = 0; i < InstanceCount; i++)
+            {
+                if ((i != 0) && (i % (InstanceCount/5) == 0))
+                    Logger.logger.Log($"Allocated {i} instances...");
+
+                Object128 obj = new Object128();
+                _objects128s.Add(obj);
+            }
+
+            Logger.logger.Log($"{_objects128s.Count} instances allocated");
+        };
+
+        private static Func<EventPipeEventSource, Func<int>> _DoesTraceContainEnoughAllocationSampledEvents = (source) =>
+        {
+            int AllocationSampledEvents = 0;
+            int Object128Count = 0;
+            source.Dynamic.All += (eventData) =>
+            {
+                if (eventData.ID == (TraceEventID)303)  // AllocationSampled is not defined in TraceEvent yet
+                {
+                    AllocationSampledEvents++;
+
+                    AllocationSampledData payload = new AllocationSampledData(eventData, source.PointerSize);
+                    // uncomment to see the allocation events payload
+                    //Logger.logger.Log($"{payload.HeapIndex} - {payload.AllocationKind} | ({payload.ObjectSize}) {payload.TypeName}  = 0x{payload.Address}");
+                    if (payload.TypeName == "Tracing.Tests.Object128")
+                    {
+                        Object128Count++;
+                    }
+                }
+            };
+            return () => {
+                Logger.logger.Log("AllocationSampled counts validation");
+                Logger.logger.Log("Nb events: " + AllocationSampledEvents);
+                Logger.logger.Log("Nb object128: " + Object128Count);
+                return (AllocationSampledEvents >= MinExpectedEvents) && (Object128Count != 0) ? 100 : -1;
+            };
+        };
+    }
+
+    internal class Object0
+    {
+    }
+
+    internal class Object128 : Object0
+    {
+        private readonly UInt64 _x1;
+        private readonly UInt64 _x2;
+        private readonly UInt64 _x3;
+        private readonly UInt64 _x4;
+        private readonly UInt64 _x5;
+        private readonly UInt64 _x6;
+        private readonly UInt64 _x7;
+        private readonly UInt64 _x8;
+        private readonly UInt64 _x9;
+        private readonly UInt64 _x10;
+        private readonly UInt64 _x11;
+        private readonly UInt64 _x12;
+        private readonly UInt64 _x13;
+        private readonly UInt64 _x14;
+        private readonly UInt64 _x15;
+        private readonly UInt64 _x16;
+    }
+
+    // AllocationSampled is not defined in TraceEvent yet
+    //
+    //  <data name="AllocationKind" inType="win:UInt32" map="GCAllocationKindMap" />
+    //  <data name="ClrInstanceID" inType="win:UInt16" />
+    //  <data name="TypeID" inType="win:Pointer" />
+    //  <data name="TypeName" inType="win:UnicodeString" />
+    //  <data name="HeapIndex" inType="win:UInt32" />
+    //  <data name="Address" inType="win:Pointer" />
+    //  <data name="ObjectSize" inType="win:UInt64" outType="win:HexInt64" />
+    //  <data name="SampledByteOffset" inType="win:UInt64" outType="win:HexInt64" />
+    //
+    class AllocationSampledData
+    {
+        const int EndOfStringCharLength = 2;
+        private TraceEvent _payload;
+        private int _pointerSize;
+        public AllocationSampledData(TraceEvent payload, int pointerSize)
+        {
+            _payload = payload;
+            _pointerSize = pointerSize;
+            TypeName = "?";
+
+            ComputeFields();
+        }
+
+        public GCAllocationKind AllocationKind;
+        public int ClrInstanceID;
+        public UInt64 TypeID;
+        public string TypeName;
+        public int HeapIndex;
+        public UInt64 Address;
+        public long ObjectSize;
+        public long SampledByteOffset;
+
+        private void ComputeFields()
+        {
+            int offsetBeforeString = 4 + 2 + _pointerSize;
+
+            Span<byte> data = _payload.EventData().AsSpan();
+            AllocationKind = (GCAllocationKind)BitConverter.ToInt32(data.Slice(0, 4));
+            ClrInstanceID = BitConverter.ToInt16(data.Slice(4, 2));
+            if (_pointerSize == 4)
+            {
+                TypeID = BitConverter.ToUInt32(data.Slice(6, _pointerSize));
+            }
+            else
+            {
+                TypeID = BitConverter.ToUInt64(data.Slice(6, _pointerSize));
+            }
+            TypeName = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength - 4 - _pointerSize - 8 - 8));
+            HeapIndex = BitConverter.ToInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength, 4));
+            if (_pointerSize == 4)
+            {
+                Address = BitConverter.ToUInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize));
+            }
+            else
+            {
+                Address = BitConverter.ToUInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize));
+            }
+            ObjectSize = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize, 8));
+            SampledByteOffset = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize + 8, 8));
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj
new file mode 100644
index 00000000000000..040aac14727f59
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj
@@ -0,0 +1,26 @@
+<Project Sdk="Microsoft.NET.Sdk">
+  <PropertyGroup>
+    <!-- Needed for GCStressIncompatible, UnloadabilityIncompatible, JitOptimizationSensitive, GC.WaitForPendingFinalizers -->
+    <RequiresProcessIsolation>true</RequiresProcessIsolation>
+    <TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
+    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
+    <UnloadabilityIncompatible>true</UnloadabilityIncompatible>
+    <!-- Tracing tests routinely time out with jitstress and gcstress -->
+    <GCStressIncompatible>true</GCStressIncompatible>
+    <JitOptimizationSensitive>true</JitOptimizationSensitive>
+
+    <!-- Not AllocationSampled with Mono runtime -->
+    <DisableProjectBuild Condition="'$(RuntimeFlavor)' == 'mono'">true</DisableProjectBuild>
+  </PropertyGroup>
+
+  <!-- Test coverage for EventPipe CFG library, eventpipe-enabled.GuardCF.lib -->
+  <PropertyGroup Condition="'$(TestBuildMode)' == 'nativeaot' and '$(TargetsWindows)' == 'true'">
+    <ControlFlowGuard>guard</ControlFlowGuard>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <Compile Include="$(MSBuildProjectName).cs" />
+    <ProjectReference Include="../common/eventpipe_common.csproj" />
+    <ProjectReference Include="../common/Microsoft.Diagnostics.NETCore.Client/Microsoft.Diagnostics.NETCore.Client.csproj" />
+  </ItemGroup>
+</Project>
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj
new file mode 100644
index 00000000000000..01e8ecfa42a698
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj
@@ -0,0 +1,9 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <BuildAsStandAlone>true</BuildAsStandAlone>
+    <OutputType>Exe</OutputType>
+    <TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
+  </PropertyGroup>
+
+</Project>
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs
new file mode 100644
index 00000000000000..ee18309c547acf
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs
@@ -0,0 +1,21 @@
+﻿using System;
+using System.Collections.Generic;
+using System.Linq;
+
+namespace Allocate
+{
+    public class AllocateArraysOfDoubles : IAllocations
+    {
+        public void Allocate(int count)
+        {
+            List<double[]> arrays = new List<double[]>(count);
+
+            for (int i = 0; i < count; i++)
+            {
+                arrays.Add(new double[1] { i });
+            }
+
+            Console.WriteLine($"Sum {arrays.Count} arrays of one double = {arrays.Sum(doubles => doubles[0])}");
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs
new file mode 100644
index 00000000000000..8dfaecb0cf3509
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs
@@ -0,0 +1,49 @@
+﻿using System;
+using System.Collections.Generic;
+
+namespace Allocate
+{
+    public class AllocateDifferentTypes : IAllocations
+    {
+        public void Allocate(int count)
+        {
+            List<object> objects = new List<object>(count);
+
+            for (int i = 0; i < count; i++)
+            {
+                objects.Add(new string('c', 37));
+                objects.Add(new WithFinalizer(i));
+                objects.Add(new byte[173]);
+                int[,] matrix = { { 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } };
+                objects.Add(matrix);
+            }
+
+            Console.WriteLine($"{objects.Count} objects");
+        }
+    }
+
+    public class WithFinalizer
+    {
+        private static int _counter;
+
+        private readonly UInt16 _x1;
+        private readonly UInt16 _x2;
+        private readonly UInt16 _x3;
+
+        public static int Counter => _counter;
+
+        public WithFinalizer(int id)
+        {
+            _counter++;
+
+            _x1 = (UInt16)(id % 10);
+            _x2 = (UInt16)(id % 100);
+            _x3 = (UInt16)(id % 1000);
+        }
+
+        ~WithFinalizer()
+        {
+            _counter--;
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs
new file mode 100644
index 00000000000000..5af08b3991593f
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs
@@ -0,0 +1,44 @@
+﻿using System;
+using System.Collections.Generic;
+
+namespace Allocate
+{
+    public class AllocateRatioSizedArrays : IAllocations
+    {
+        public void Allocate(int count)
+        {
+            // We can't keep the objects in memory, just keep their size
+            List<int> sizes= new List<int>(count * 5);
+
+            var gcCount = GC.CollectionCount(0);
+
+            for (int i = 0; i < count; i++)
+            {
+                var bytes1 = new byte[1024];
+                bytes1[1] = 1;
+                sizes.Add(bytes1.Length);
+                var bytes2 = new byte[10240];
+                bytes2[2] = 2;
+                sizes.Add(bytes2.Length);
+                var bytes3 = new byte[102400];
+                bytes3[3] = 3;
+                sizes.Add(bytes3.Length);
+                var bytes4 = new byte[1024000];
+                bytes4[4] = 4;
+                sizes.Add(bytes4.Length);
+                var bytes5 = new byte[10240000];
+                bytes5[5] = 5;
+                sizes.Add(bytes5.Length);
+            }
+
+            Console.WriteLine($"+ {GC.CollectionCount(0) - gcCount} collections");
+
+            long totalAllocated = 0;
+            foreach (int size in sizes)
+            {
+                totalAllocated += size;
+            }
+            Console.WriteLine($"{sizes.Count} arrays for {totalAllocated / 1024} KB");
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs
new file mode 100644
index 00000000000000..5f8660be6a74d3
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs
@@ -0,0 +1,180 @@
+﻿#pragma warning disable CS0169 // Remove unused private members
+#pragma warning disable IDE0049 // Simplify Names
+
+using System;
+using System.Collections.Generic;
+
+namespace Allocate
+{
+    public class AllocateSmallAndBig : IAllocations
+    {
+        public void Allocate(int count)
+        {
+            Dictionary<string, AllocStats> allocations = Initialize();
+            List<Object0> objects = new List<Object0>(1024 * 1024);
+
+            AllocateSmallThenBig(count/2, objects, allocations);
+            Console.WriteLine();
+            AllocateBigThenSmall(count/2, objects, allocations);
+            Console.WriteLine();
+        }
+
+        private void AllocateSmallThenBig(int count, List<Object0> objects, Dictionary<string, AllocStats> allocations)
+        {
+            for (int i = 0; i < count; i++)
+            {
+                // allocate from smaller to larger
+                objects.Add(new Object24());
+                objects.Add(new Object32());
+                objects.Add(new Object48());
+                objects.Add(new Object80());
+                objects.Add(new Object144());
+            }
+
+            allocations[nameof(Object24)].Count = count;
+            allocations[nameof(Object24)].Size = count * 24;
+            allocations[nameof(Object32)].Count = count;
+            allocations[nameof(Object32)].Size = count * 32;
+            allocations[nameof(Object48)].Count = count;
+            allocations[nameof(Object48)].Size = count * 48;
+            allocations[nameof(Object80)].Count = count;
+            allocations[nameof(Object80)].Size = count * 80;
+            allocations[nameof(Object144)].Count = count;
+            allocations[nameof(Object144)].Size = count * 144;
+
+            DumpAllocations(allocations);
+            Clear(allocations);
+            objects.Clear();
+        }
+
+        private void AllocateBigThenSmall(int count, List<Object0> objects, Dictionary<string, AllocStats> allocations)
+        {
+            for (int i = 0; i < count; i++)
+            {
+                // allocate from larger to smaller
+                objects.Add(new Object144());
+                objects.Add(new Object80());
+                objects.Add(new Object48());
+                objects.Add(new Object32());
+                objects.Add(new Object24());
+            }
+
+            allocations[nameof(Object24)].Count = count;
+            allocations[nameof(Object24)].Size = count * 24;
+            allocations[nameof(Object32)].Count = count;
+            allocations[nameof(Object32)].Size = count * 32;
+            allocations[nameof(Object48)].Count = count;
+            allocations[nameof(Object48)].Size = count * 48;
+            allocations[nameof(Object80)].Count = count;
+            allocations[nameof(Object80)].Size = count * 80;
+            allocations[nameof(Object144)].Count = count;
+            allocations[nameof(Object144)].Size = count * 144;
+
+            DumpAllocations(allocations);
+            Clear(allocations);
+            objects.Clear();
+        }
+
+        private Dictionary<string, AllocStats> Initialize()
+        {
+            var allocations = new Dictionary<string, AllocStats>(16);
+            allocations[nameof(Object24)] = new AllocStats();
+            allocations[nameof(Object32)] = new AllocStats();
+            allocations[nameof(Object48)] = new AllocStats();
+            allocations[nameof(Object80)] = new AllocStats();
+            allocations[nameof(Object144)] = new AllocStats();
+
+            Clear(allocations);
+            return allocations;
+        }
+
+        private void Clear(Dictionary<string, AllocStats> allocations)
+        {
+            allocations[nameof(Object24)].Count = 0;
+            allocations[nameof(Object24)].Size = 0;
+            allocations[nameof(Object32)].Count = 0;
+            allocations[nameof(Object32)].Size = 0;
+            allocations[nameof(Object48)].Count = 0;
+            allocations[nameof(Object48)].Size = 0;
+            allocations[nameof(Object80)].Count = 0;
+            allocations[nameof(Object80)].Size = 0;
+            allocations[nameof(Object144)].Count = 0;
+            allocations[nameof(Object144)].Size = 0;
+        }
+
+        private void DumpAllocations(Dictionary<string, AllocStats> objects)
+        {
+            Console.WriteLine("Allocations start");
+            foreach (var allocation in objects)
+            {
+                Console.WriteLine($"{allocation.Key}={allocation.Value.Count},{allocation.Value.Size}");
+            }
+
+            Console.WriteLine("Allocations end");
+        }
+
+        internal class AllocStats
+        {
+            public int Count { get; set; }
+            public long Size { get; set; }
+        }
+
+        internal class Object0
+        {
+        }
+
+        internal class Object24 : Object0
+        {
+            private readonly UInt32 _x1;
+            private readonly UInt32 _x2;
+        }
+
+        internal class Object32 : Object0
+        {
+            private readonly UInt64 _x1;
+            private readonly UInt64 _x2;
+        }
+
+        internal class Object48 : Object0
+        {
+            private readonly UInt64 _x1;
+            private readonly UInt64 _x2;
+            private readonly UInt64 _x3;
+            private readonly UInt64 _x4;
+        }
+
+        internal class Object80 : Object0
+        {
+            private readonly UInt64 _x1;
+            private readonly UInt64 _x2;
+            private readonly UInt64 _x3;
+            private readonly UInt64 _x4;
+            private readonly UInt64 _x5;
+            private readonly UInt64 _x6;
+            private readonly UInt64 _x7;
+            private readonly UInt64 _x8;
+        }
+
+        internal class Object144 : Object0
+        {
+            private readonly UInt64 _x1;
+            private readonly UInt64 _x2;
+            private readonly UInt64 _x3;
+            private readonly UInt64 _x4;
+            private readonly UInt64 _x5;
+            private readonly UInt64 _x6;
+            private readonly UInt64 _x7;
+            private readonly UInt64 _x8;
+            private readonly UInt64 _x9;
+            private readonly UInt64 _x10;
+            private readonly UInt64 _x11;
+            private readonly UInt64 _x12;
+            private readonly UInt64 _x13;
+            private readonly UInt64 _x14;
+            private readonly UInt64 _x15;
+            private readonly UInt64 _x16;
+        }
+    }
+}
+#pragma warning restore IDE0049 // Simplify Names
+#pragma warning restore CS0169 // Remove unused private members
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs
new file mode 100644
index 00000000000000..ee21414d2c2ddb
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs
@@ -0,0 +1,34 @@
+﻿using System.Diagnostics.Tracing;
+
+namespace Allocate
+{
+    [EventSource(Name = "Allocations-Run")]
+    public class AllocationsRunEventSource : EventSource
+    {
+        public static readonly AllocationsRunEventSource Log = new AllocationsRunEventSource();
+
+        [Event(600, Level = EventLevel.Informational)]
+        public void StartRun(int iterationsCount, int allocationCount, string listOfTypes)
+        {
+            WriteEvent(eventId: 600, iterationsCount, allocationCount, listOfTypes);
+        }
+
+        [Event(601, Level = EventLevel.Informational)]
+        public void StopRun()
+        {
+            WriteEvent(eventId: 601);
+        }
+
+        [Event(602, Level = EventLevel.Informational)]
+        public void StartIteration(int iteration)
+        {
+            WriteEvent(eventId: 602, iteration);
+        }
+
+        [Event(603, Level = EventLevel.Informational)]
+        public void StopIteration(int iteration)
+        {
+            WriteEvent(eventId: 603, iteration);
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs
new file mode 100644
index 00000000000000..3ee00f39adcdfe
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs
@@ -0,0 +1,8 @@
+﻿
+namespace Allocate
+{
+    public interface IAllocations
+    {
+        public void Allocate(int count);
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs
new file mode 100644
index 00000000000000..f7220a11289752
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs
@@ -0,0 +1,133 @@
+﻿using System;
+using System.Diagnostics;
+
+namespace Allocate
+{
+    public enum Scenario
+    {
+        SmallAndBig                  = 1,
+        PerThread                    = 2,
+        ArrayOfDouble                = 3,
+        FinalizerAndArraysAndStrings = 4,
+        RatioSizedArrays             = 5,
+    }
+
+
+    internal class Program
+    {
+        static void Main(string[] args)
+        {
+            if (args.Length < 1)
+            {
+                Console.WriteLine("Usage: Allocate --scenario (1|2|3|4|5) [--iterations (number of iterations)] [--allocations (allocations count)]");
+                Console.WriteLine("                     1: small and big allocations");
+                Console.WriteLine("                     2: allocations per thread");
+                Console.WriteLine("                     3: arrays of double (for x86)");
+                Console.WriteLine("                     4: different types of objects");
+                Console.WriteLine("                     5: ratio sized arrays");
+                return;
+            }
+            ParseCommandLine(args, out Scenario scenario, out int allocationsCount, out int iterations);
+
+            IAllocations allocationsRun = null;
+            string allocatedTypes = string.Empty;
+
+            switch(scenario)
+            {
+                case Scenario.SmallAndBig:
+                    allocationsRun = new AllocateSmallAndBig();
+                    allocatedTypes = "Object24;Object32;Object48;Object80;Object144";
+                    break;
+                case Scenario.PerThread:
+                    allocationsRun = new ThreadedAllocations();
+                    allocatedTypes = "Object24;Object48;Object72;Object32;Object64;Object96";
+                    break;
+                case Scenario.ArrayOfDouble:
+                    allocationsRun = new AllocateArraysOfDoubles();
+                    allocatedTypes = "System.Double[]";
+                    break;
+                case Scenario.FinalizerAndArraysAndStrings:
+                    allocationsRun = new AllocateDifferentTypes();
+                    allocatedTypes = "System.String;Allocate.WithFinalizer;System.Byte[]";
+                    break;
+                case Scenario.RatioSizedArrays:
+                    allocationsRun = new AllocateRatioSizedArrays();
+                    allocatedTypes = "System.Byte[]";
+                    break;
+                default:
+                    Console.WriteLine($"Invalid scenario: '{scenario}'");
+                    return;
+            }
+
+            Console.WriteLine($"pid = {Process.GetCurrentProcess().Id}");
+            Console.ReadLine();
+
+            if (allocationsRun != null)
+            {
+                Stopwatch clock = new Stopwatch();
+                clock.Start();
+
+                AllocationsRunEventSource.Log.StartRun(iterations, allocationsCount, allocatedTypes);
+                for (int i = 0; i < iterations; i++)
+                {
+                    AllocationsRunEventSource.Log.StartIteration(i);
+                    allocationsRun.Allocate(allocationsCount);
+                    AllocationsRunEventSource.Log.StopIteration(i);
+                }
+                AllocationsRunEventSource.Log.StopRun();
+
+                clock.Stop();
+                Console.WriteLine($"Duration = {clock.ElapsedMilliseconds} ms");
+            }
+        }
+
+        private static void ParseCommandLine(string[] args, out Scenario scenario, out int allocationsCount, out int iterations)
+        {
+            iterations = 100;
+            allocationsCount = 1_000_000;
+            scenario = Scenario.SmallAndBig;
+
+            for (int i = 0; i < args.Length; i++)
+            {
+                string arg = args[i];
+
+                if ("--scenario".Equals(arg, StringComparison.OrdinalIgnoreCase))
+                {
+                    int valueOffset = i + 1;
+                    if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number))
+                    {
+                        scenario = (Scenario)number;
+                    }
+                }
+                else
+                if ("--iterations".Equals(arg, StringComparison.OrdinalIgnoreCase))
+                {
+                    int valueOffset = i + 1;
+                    if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number))
+                    {
+                        if (number <= 0)
+                        {
+                            throw new ArgumentOutOfRangeException($"Invalid iterations count '{number}': must be > 0");
+                        }
+
+                        iterations = number;
+                    }
+                }
+                else
+                if ("--allocations".Equals(arg, StringComparison.OrdinalIgnoreCase))
+                {
+                    int valueOffset = i + 1;
+                    if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number))
+                    {
+                        if (number <= 0)
+                        {
+                            throw new ArgumentOutOfRangeException($"Invalid numbers of allocations '{number}: must be > 0");
+                        }
+
+                        allocationsCount = number;
+                    }
+                }
+            }
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs
new file mode 100644
index 00000000000000..8172a19a9fa822
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs
@@ -0,0 +1,176 @@
+﻿#pragma warning disable CS0169 // Remove unused private members
+#pragma warning disable IDE0049 // Simplify Names
+
+using System;
+using System.Collections.Generic;
+using System.Threading;
+
+namespace Allocate
+{
+    public class ThreadedAllocations : IAllocations
+    {
+        public void Allocate(int count)
+        {
+            List<Object0> objects1 = new List<Object0>(1024 * 1024);
+            List<Object0> objects2 = new List<Object0>(1024 * 1024);
+
+            Thread[] threads = new Thread[2];
+            threads[0] = new Thread(() => Allocate1(count, objects1));
+            threads[1] = new Thread(() => Allocate2(count, objects2));
+
+            for (int i = 0; i < threads.Length; i++) { threads[i].Start(); }
+            for (int i = 0; i < threads.Length; i++) { threads[i].Join(); }
+
+            Console.WriteLine($"Allocated {objects1.Count + objects2.Count} objects");
+        }
+
+        private void Allocate1(int count, List<Object0> objects)
+        {
+            for (int i = 0; i < count; i++)
+            {
+                objects.Add(new Object24());
+                objects.Add(new Object48());
+                objects.Add(new Object72());
+            }
+        }
+
+        private void Allocate2(int count, List<Object0> objects)
+        {
+            for (int i = 0; i < count; i++)
+            {
+                objects.Add(new Object32());
+                objects.Add(new Object64());
+                objects.Add(new Object96());
+            }
+        }
+
+        internal class Object0
+        {
+        }
+
+        internal class Object24 : Object0
+        {
+            private readonly UInt16 _x1;
+            private readonly UInt16 _x2;
+            private readonly UInt16 _x3;
+        }
+
+        internal class Object32 : Object0
+        {
+            private readonly UInt16 _x1;
+            private readonly UInt16 _x2;
+            private readonly UInt16 _x3;
+            private readonly UInt16 _x4;
+            private readonly UInt16 _x5;
+            private readonly UInt16 _x6;
+            private readonly UInt16 _x7;
+        }
+
+        internal class Object48 : Object0
+        {
+            private readonly UInt16 _x1;
+            private readonly UInt16 _x2;
+            private readonly UInt16 _x3;
+            private readonly UInt16 _x4;
+            private readonly UInt16 _x5;
+            private readonly UInt16 _x6;
+            private readonly UInt16 _x7;
+            private readonly UInt16 _x8;
+            private readonly UInt16 _x9;
+            private readonly UInt16 _x10;
+            private readonly UInt16 _x11;
+            private readonly UInt16 _x12;
+            private readonly UInt16 _x13;
+            private readonly UInt16 _x14;
+            private readonly UInt16 _x15;
+        }
+
+        internal class Object64 : Object0
+        {
+            private readonly UInt16 _x1;
+            private readonly UInt16 _x2;
+            private readonly UInt16 _x3;
+            private readonly UInt16 _x4;
+            private readonly UInt16 _x5;
+            private readonly UInt16 _x6;
+            private readonly UInt16 _x7;
+            private readonly UInt16 _x8;
+            private readonly UInt16 _x9;
+            private readonly UInt16 _x10;
+            private readonly UInt16 _x11;
+            private readonly UInt16 _x12;
+            private readonly UInt16 _x13;
+            private readonly UInt16 _x14;
+            private readonly UInt16 _x15;
+            private readonly UInt16 _x16;
+            private readonly UInt16 _x17;
+            private readonly UInt16 _x18;
+            private readonly UInt16 _x19;
+            private readonly UInt16 _x20;
+            private readonly UInt16 _x21;
+            private readonly UInt16 _x22;
+            private readonly UInt16 _x23;
+            private readonly UInt16 _x24;
+        }
+
+        internal class Object72 : Object0
+        {
+            private readonly UInt16 _x1;
+            private readonly UInt16 _x2;
+            private readonly UInt16 _x3;
+            private readonly UInt16 _x4;
+            private readonly UInt16 _x5;
+            private readonly UInt16 _x6;
+            private readonly UInt16 _x7;
+            private readonly UInt16 _x8;
+            private readonly UInt16 _x9;
+            private readonly UInt16 _x10;
+            private readonly UInt16 _x11;
+            private readonly UInt16 _x12;
+            private readonly UInt16 _x13;
+            private readonly UInt16 _x14;
+            private readonly UInt16 _x15;
+            private readonly UInt16 _x16;
+            private readonly UInt16 _x17;
+            private readonly UInt16 _x18;
+            private readonly UInt16 _x19;
+            private readonly UInt16 _x20;
+            private readonly UInt16 _x21;
+            private readonly UInt16 _x22;
+            private readonly UInt16 _x23;
+            private readonly UInt16 _x24;
+            private readonly UInt16 _x25;
+            private readonly UInt16 _x26;
+            private readonly UInt16 _x27;
+            private readonly UInt16 _x28;
+        }
+
+        internal class Object96 : Object0
+        {
+            private readonly UInt32 _x1;
+            private readonly UInt32 _x2;
+            private readonly UInt32 _x3;
+            private readonly UInt32 _x4;
+            private readonly UInt32 _x5;
+            private readonly UInt32 _x6;
+            private readonly UInt32 _x7;
+            private readonly UInt32 _x8;
+            private readonly UInt32 _x9;
+            private readonly UInt32 _x10;
+            private readonly UInt32 _x11;
+            private readonly UInt32 _x12;
+            private readonly UInt32 _x13;
+            private readonly UInt32 _x14;
+            private readonly UInt32 _x15;
+            private readonly UInt32 _x16;
+            private readonly UInt32 _x17;
+            private readonly UInt32 _x18;
+            private readonly UInt32 _x19;
+            private readonly UInt32 _x20;
+        }
+    }
+}
+
+
+#pragma warning restore IDE0049 // Simplify Names
+#pragma warning restore CS0169 // Remove unused private members
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj
new file mode 100644
index 00000000000000..4a1f3d25c23b34
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj
@@ -0,0 +1,14 @@
+<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <BuildAsStandAlone>true</BuildAsStandAlone>
+    <OutputType>Exe</OutputType>
+    <TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Diagnostics.NETCore.Client" Version="0.2.510501" />
+    <PackageReference Include="Microsoft.Diagnostics.Tracing.TraceEvent" Version="3.1.8" />
+  </ItemGroup>
+
+</Project>
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln
new file mode 100644
index 00000000000000..6e5beeaa3691f3
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln
@@ -0,0 +1,51 @@
+﻿
+Microsoft Visual Studio Solution File, Format Version 12.00
+# Visual Studio Version 17
+VisualStudioVersion = 17.9.34616.47
+MinimumVisualStudioVersion = 10.0.40219.1
+Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "AllocationProfiler", "AllocationProfiler.csproj", "{1530D7FB-8635-4267-A7B0-EB1280780CAA}"
+EndProject
+Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Allocate", "Allocate\Allocate.csproj", "{883FD439-6B92-421F-A68B-D22FFC21BF0A}"
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug|Any CPU = Debug|Any CPU
+		Debug|x64 = Debug|x64
+		Debug|x86 = Debug|x86
+		Release|Any CPU = Release|Any CPU
+		Release|x64 = Release|x64
+		Release|x86 = Release|x86
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|Any CPU.Build.0 = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x64.ActiveCfg = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x64.Build.0 = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x86.ActiveCfg = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x86.Build.0 = Debug|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|Any CPU.ActiveCfg = Release|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|Any CPU.Build.0 = Release|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x64.ActiveCfg = Release|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x64.Build.0 = Release|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x86.ActiveCfg = Release|Any CPU
+		{1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x86.Build.0 = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|Any CPU.Build.0 = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x64.ActiveCfg = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x64.Build.0 = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x86.ActiveCfg = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x86.Build.0 = Debug|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|Any CPU.ActiveCfg = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|Any CPU.Build.0 = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x64.ActiveCfg = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x64.Build.0 = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x86.ActiveCfg = Release|Any CPU
+		{883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x86.Build.0 = Release|Any CPU
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+	GlobalSection(ExtensibilityGlobals) = postSolution
+		SolutionGuid = {64F6D2D8-C43C-41D5-8CEA-8F45ADF2EC6C}
+	EndGlobalSection
+EndGlobal
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs
new file mode 100644
index 00000000000000..72719d6bf97395
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs
@@ -0,0 +1,470 @@
+﻿using Microsoft.Diagnostics.NETCore.Client;
+using Microsoft.Diagnostics.Tracing.Parsers;
+using Microsoft.Diagnostics.Tracing;
+using System.Diagnostics.Tracing;
+using Microsoft.Diagnostics.Tracing.Parsers.Clr;
+using System.Text;
+using System.Runtime.CompilerServices;
+
+namespace DynamicAllocationSampling
+{
+    internal class TypeInfo
+    {
+        public string TypeName = "?";
+        public int Count;
+        public long Size;
+        public long TotalSize;
+        public long RemainderSize;
+
+        public override int GetHashCode()
+        {
+            return (TypeName+Size).GetHashCode();
+        }
+
+        public override bool Equals(object obj)
+        {
+            if (obj == null)
+            {
+                return false;
+            }
+
+            if (!(obj is TypeInfo))
+            {
+                return false;
+            }
+
+            return (TypeName+Size).Equals(((TypeInfo)obj).TypeName+Size);
+        }
+    }
+
+    internal class Program
+    {
+        private static Dictionary<string, TypeInfo> _sampledTypes = new Dictionary<string, TypeInfo>();
+        private static Dictionary<string, TypeInfo> _tickTypes = new Dictionary<string, TypeInfo>();
+        private static List<Dictionary<string, TypeInfo>> _sampledTypesInRun = null;
+        private static List<Dictionary<string, TypeInfo>> _tickTypesInRun = null;
+        private static int _allocationsCount = 0;
+        private static List<string> _allocatedTypes = new List<string>();
+        private static EventPipeEventSource _source;
+;
+
+        static void Main(string[] args)
+        {
+            if (args.Length == 0)
+            {
+                Console.WriteLine("No process ID specified");
+                return;
+            }
+
+            int pid = -1;
+            if (!int.TryParse(args[0], out pid))
+            {
+                Console.WriteLine($"Invalid specified process ID '{args[0]}'");
+                return;
+            }
+
+            try
+            {
+                PrintEventsLive(pid);
+            }
+            catch (Exception x)
+            {
+                Console.WriteLine(x.Message);
+            }
+        }
+
+
+        public static void PrintEventsLive(int processId)
+        {
+            var providers = new List<EventPipeProvider>()
+            {
+                new EventPipeProvider(
+                        "Microsoft-Windows-DotNETRuntime",
+                        EventLevel.Verbose, // verbose is required for AllocationTick
+                        (long)0x80000000001 // new AllocationSamplingKeyword + GCKeyword
+                        ),
+                new EventPipeProvider(
+                        "Allocations-Run",
+                        EventLevel.Informational
+                        ),
+            };
+            var client = new DiagnosticsClient(processId);
+
+            using (var session = client.StartEventPipeSession(providers, false))
+            {
+                Console.WriteLine();
+
+                Task streamTask = Task.Run(() =>
+                {
+                    var source = new EventPipeEventSource(session.EventStream);
+                    _source = source;
+
+                    ClrTraceEventParser clrParser = new ClrTraceEventParser(source);
+                    clrParser.GCAllocationTick += OnAllocationTick;
+                    source.Dynamic.All += OnEvents;
+
+                    try
+                    {
+                        source.Process();
+                    }
+                    catch (Exception e)
+                    {
+                        Console.WriteLine($"Error encountered while processing events: {e.Message}");
+                    }
+                });
+
+                Task inputTask = Task.Run(() =>
+                {
+                    while (Console.ReadKey().Key != ConsoleKey.Enter)
+                    {
+                        Thread.Sleep(100);
+                    }
+                    session.Stop();
+                });
+
+                Task.WaitAny(streamTask, inputTask);
+            }
+
+            // not all cases are emitting allocations run events
+            if ((_sampledTypesInRun == null) && (_sampledTypes.Count > 0))
+            {
+                ShowIterationResults();
+            }
+        }
+
+        private const long SAMPLING_MEAN = 100 * 1024;
+        private const double SAMPLING_RATIO = 0.999990234375 / 0.000009765625;
+        private static long UpscaleSize(long totalSize, int count, long mean, long sizeRemainder)
+        {
+            //// This is the Poisson process based scaling
+            //var averageSize = (double)totalSize / (double)count;
+            //var scale = 1 / (1 - Math.Exp(-averageSize / mean));
+            //return (long)(totalSize * scale);
+
+            // use the upscaling method detailed in the PR
+            // = sq/p + u
+            //   s = # of samples for a type
+            //   q = 1 - 1/102400
+            //   p = 1/102400
+            //   u = sum of object remainders = Sum(object_size - sampledByteOffset) for all samples
+            return (long)(SAMPLING_RATIO * count + sizeRemainder);
+        }
+
+        private static void OnAllocationTick(GCAllocationTickTraceData payload)
+        {
+            // skip unexpected types
+            if (!_allocatedTypes.Contains(payload.TypeName)) return;
+
+            if (!_tickTypes.TryGetValue(payload.TypeName + payload.ObjectSize, out TypeInfo typeInfo))
+            {
+                typeInfo = new TypeInfo() { TypeName = payload.TypeName, Count = 0, Size = payload.ObjectSize, TotalSize = 0 };
+                _tickTypes.Add(payload.TypeName + payload.ObjectSize, typeInfo);
+            }
+            typeInfo.Count++;
+            typeInfo.TotalSize += (int)payload.ObjectSize;
+        }
+
+        private static void OnEvents(TraceEvent eventData)
+        {
+            if (eventData.ID == (TraceEventID)303)
+            {
+                AllocationSampledData payload = new AllocationSampledData(eventData, _source.PointerSize);
+
+                // skip unexpected types
+                if (!_allocatedTypes.Contains(payload.TypeName)) return;
+
+                if (!_sampledTypes.TryGetValue(payload.TypeName+payload.ObjectSize, out TypeInfo typeInfo))
+                {
+                    typeInfo = new TypeInfo() { TypeName = payload.TypeName, Count = 0, Size = (int)payload.ObjectSize, TotalSize = 0, RemainderSize = payload.ObjectSize - payload.SampledByteOffset };
+                    _sampledTypes.Add(payload.TypeName + payload.ObjectSize, typeInfo);
+                }
+                typeInfo.Count++;
+                typeInfo.TotalSize += (int)payload.ObjectSize;
+                typeInfo.RemainderSize += (payload.ObjectSize - payload.SampledByteOffset);
+
+                return;
+            }
+
+            if (eventData.ID == (TraceEventID)600)
+            {
+                AllocationsRunData payload = new AllocationsRunData(eventData);
+                Console.WriteLine($"> starts {payload.Iterations} iterations allocating {payload.Count} instances");
+
+                _sampledTypesInRun = new List<Dictionary<string, TypeInfo>>(payload.Iterations);
+                _tickTypesInRun = new List<Dictionary<string, TypeInfo>>(payload.Iterations);
+                _allocationsCount = payload.Count;
+                string allocatedTypes = payload.AllocatedTypes;
+                if (allocatedTypes.Length > 0)
+                {
+                    _allocatedTypes = allocatedTypes.Split(';').ToList();
+                }
+
+                return;
+            }
+
+            if (eventData.ID == (TraceEventID)601)
+            {
+                Console.WriteLine("\n< run stops\n");
+
+                ShowRunResults();
+                return;
+            }
+
+            if (eventData.ID == (TraceEventID)602)
+            {
+                AllocationsRunIterationData payload = new AllocationsRunIterationData(eventData);
+                Console.Write($"{payload.Iteration}");
+
+                _sampledTypes.Clear();
+                _tickTypes.Clear();
+                return;
+            }
+
+            if (eventData.ID == (TraceEventID)603)
+            {
+                Console.WriteLine("|");
+                ShowIterationResults();
+
+                _sampledTypesInRun.Add(_sampledTypes);
+                _sampledTypes = new Dictionary<string, TypeInfo>();
+                _tickTypesInRun.Add(_tickTypes);
+                _tickTypes = new Dictionary<string, TypeInfo>();
+                return;
+            }
+        }
+
+        private static void ShowRunResults()
+        {
+            var iterations = _sampledTypesInRun.Count;
+
+            // for each type, get the percent diff between upscaled count and expected _allocationsCount
+            Dictionary<TypeInfo, List<double>> typeDistribution = new Dictionary<TypeInfo, List<double>>();
+            foreach (var iteration in _sampledTypesInRun)
+            {
+                foreach (var info in iteration.Values)
+                {
+                    // ignore types outside of the allocations run
+                    if (info.Count < 16) continue;
+
+                    if (!typeDistribution.TryGetValue(info, out List<double> distribution))
+                    {
+                        distribution = new List<double>(iterations);
+                        typeDistribution.Add(info, distribution);
+                    }
+
+                    var upscaledCount = (long)info.Count * UpscaleSize(info.TotalSize, info.Count, SAMPLING_MEAN, info.RemainderSize) / info.TotalSize;
+                    var percentDiff = (double)(upscaledCount - _allocationsCount) / (double)_allocationsCount;
+                    distribution.Add(percentDiff);
+                }
+            }
+
+            foreach (var type in typeDistribution.Keys.OrderBy(t => t.Size))
+            {
+                var distribution = typeDistribution[type];
+
+                string typeName = type.TypeName;
+                if (typeName.Contains("[]"))
+                {
+                    typeName += $" ({type.Size} bytes)";
+                }
+                Console.WriteLine(typeName);
+                Console.WriteLine("-------------------------");
+                int current = 1;
+                foreach (var diff in distribution.OrderBy(v => v))
+                {
+                    if (iterations > 20)
+                    {
+                        if ((current <= 5) || ((current >= 49) && (current < 52)) || (current >= 96))
+                        {
+                            Console.WriteLine($"{current,4} {diff,8:0.0 %}");
+                        }
+                        else
+                        if ((current == 6) || (current == 95))
+                        {
+                            Console.WriteLine("        ...");
+                        }
+                    }
+                    else
+                    {
+                        Console.WriteLine($"{current,4} {diff,8:0.0 %}");
+                    }
+
+                    current++;
+                }
+                Console.WriteLine();
+            }
+        }
+
+        private static void ShowIterationResults()
+        {
+            // NOTE: need to take the size into account for array types
+            // print the sampled types for both AllocationTick and AllocationSampled
+            Console.WriteLine("Tag  SCount  TCount          SSize          TSize   UnitSize     UpscaledSize  UpscaledCount  Name");
+            Console.WriteLine("--------------------------------------------------------------------------------------------------");
+            foreach (var type in _sampledTypes.Values.OrderBy(v => v.Size))
+            {
+                string tag = "S";
+                if (_tickTypes.TryGetValue(type.TypeName + type.Size, out TypeInfo tickType))
+                {
+                    tag += "T";
+                }
+
+                Console.Write($"{tag,3}  {type.Count,6}");
+                if (tag == "S")
+                {
+                    Console.Write($"  {0,6}");
+                }
+                else
+                {
+                    Console.Write($"  {tickType.Count,6}");
+                }
+
+                Console.Write($"  {type.TotalSize,13}");
+                if (tag == "S")
+                {
+                    Console.Write($"  {0,13}");
+                }
+                else
+                {
+                    Console.Write($"  {tickType.TotalSize,13}");
+                }
+
+                string typeName = type.TypeName;
+                if (typeName.Contains("[]"))
+                {
+                    typeName += $" ({type.Size} bytes)";
+                }
+
+                if (type.Count != 0)
+                {
+                    Console.WriteLine($"  {type.TotalSize / type.Count,9}    {UpscaleSize(type.TotalSize, type.Count, SAMPLING_MEAN, type.RemainderSize),13}     {(long)type.Count * UpscaleSize(type.TotalSize, type.Count, SAMPLING_MEAN, type.RemainderSize) / type.TotalSize,10}  {typeName}");
+                }
+            }
+
+            foreach (var type in _tickTypes.Values)
+            {
+                string tag = "T";
+
+                if (!_sampledTypes.ContainsKey(type.TypeName + type.Size))
+                {
+                    string typeName = type.TypeName;
+                    if (typeName.Contains("[]"))
+                    {
+                        typeName += $" ({type.Size} bytes)";
+                    }
+
+                    Console.WriteLine($"{tag,3}  {"0",6}  {type.Count,6}  {"0",13}  {type.TotalSize,13}  {type.TotalSize / type.Count,9}    {"0",13}     {"0",10}  {typeName}");
+                }
+            }
+        }
+    }
+
+
+    //  <data name="AllocationKind" inType="win:UInt32" map="GCAllocationKindMap" />
+    //  <data name="ClrInstanceID" inType="win:UInt16" />
+    //  <data name="TypeID" inType="win:Pointer" />
+    //  <data name="TypeName" inType="win:UnicodeString" />
+    //  <data name="HeapIndex" inType="win:UInt32" />
+    //  <data name="Address" inType="win:Pointer" />
+    //  <data name="ObjectSize" inType="win:UInt64" outType="win:HexInt64" />
+    //  <data name="SampledByteOffset" inType="win:UInt64" outType="win:HexInt64" />
+    class AllocationSampledData
+    {
+        const int EndOfStringCharLength = 2;
+        private TraceEvent _payload;
+        private int _pointerSize;
+        public AllocationSampledData(TraceEvent payload, int pointerSize)
+        {
+            _payload = payload;
+            _pointerSize = pointerSize;
+            TypeName = "?";
+
+            ComputeFields();
+        }
+
+        public GCAllocationKind AllocationKind;
+        public int ClrInstanceID;
+        public UInt64 TypeID;
+        public string TypeName;
+        public int HeapIndex;
+        public UInt64 Address;
+        public long ObjectSize;
+        public long SampledByteOffset;
+
+        private void ComputeFields()
+        {
+            int offsetBeforeString = 4 + 2 + _pointerSize;
+
+            Span<byte> data = _payload.EventData().AsSpan();
+            AllocationKind = (GCAllocationKind)BitConverter.ToInt32(data.Slice(0, 4));
+            ClrInstanceID = BitConverter.ToInt16(data.Slice(4, 2));
+            if (_pointerSize == 4)
+            {
+                TypeID = BitConverter.ToUInt32(data.Slice(6, _pointerSize));
+            }
+            else
+            {
+                TypeID = BitConverter.ToUInt64(data.Slice(6, _pointerSize));
+            }
+                                                                                                                            //   \0 should not be included for GetString to work
+            TypeName = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength - 4 - _pointerSize - 8 - 8));
+            HeapIndex = BitConverter.ToInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength, 4));
+            if (_pointerSize == 4)
+            {
+                Address = BitConverter.ToUInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize));
+            }
+            else
+            {
+                Address = BitConverter.ToUInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize));
+            }
+            ObjectSize = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize, 8));
+            SampledByteOffset = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize + 8, 8));
+        }
+    }
+
+    class AllocationsRunData
+    {
+        const int EndOfStringCharLength = 2;
+        private TraceEvent _payload;
+
+        public AllocationsRunData(TraceEvent payload)
+        {
+            _payload = payload;
+
+            ComputeFields();
+        }
+
+        public int Iterations;
+        public int Count;
+        public string AllocatedTypes;
+
+        private void ComputeFields()
+        {
+            int offsetBeforeString = 4 + 4;
+
+            Span<byte> data = _payload.EventData().AsSpan();
+            Iterations = BitConverter.ToInt32(data.Slice(0, 4));
+            Count = BitConverter.ToInt32(data.Slice(4, 4));
+            AllocatedTypes = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength));
+        }
+    }
+
+    class AllocationsRunIterationData
+    {
+        private TraceEvent _payload;
+        public AllocationsRunIterationData(TraceEvent payload)
+        {
+            _payload = payload;
+
+            ComputeFields();
+        }
+
+        public int Iteration;
+
+        private void ComputeFields()
+        {
+            Span<byte> data = _payload.EventData().AsSpan();
+            Iteration = BitConverter.ToInt32(data.Slice(0, 4));
+        }
+    }
+}
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md
new file mode 100644
index 00000000000000..7f0d274ba1530b
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md
@@ -0,0 +1,112 @@
+# Manual Testing for Randomized Allocation Sampling
+
+This folder has a test app (Allocate sub-folder) and a profiler (AllocationProfiler.csproj) that together can be used to experimentally
+observe the distribution of sampling events that are generated for different allocation scenarios. To run it:
+
+1. Build both projects
+2. Run the Allocate app with corerun and use the --scenario argument to select an allocation scenario you want to validate
+3. The Allocate app will print its own PID to the console and wait.
+4. Run the AllocationProfiler passing in the allocate app PID as an argument
+5. Hit Enter in the Allocate app to begin the allocations. You will see output in the profiler app's console showing the measurements. For example:
+
+ ```
+ Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+ -------------------------------------------------------------------------------------------
+   S       1       0          24           0        24        102412           4267  System.Int16
+  ST      44      61        1056        1464        24       4506128         187755  Object8
+  ST       1       1          32          32        32        102416           3200  System.Reflection.MetadataImport
+  ST      67      30        2144         960        32       6861872         214433  Object16
+  ST      80     169        3840        8112        48       8193920         170706  Object32
+   S       1       0          56           0        56        102428           1829  MemberInfoCache`1[System.Reflection.RuntimeMethodInfo]
+  ST       2       3         160         240        80        204880           2561  System.String
+   S       2       0         128           0        64        204864           3201  System.Reflection.RuntimeMethodBody
+   S       1       0          80           0        80        102440           1280  System.Signature
+  ST     143      86       11440        6880        80      14648920         183111  Object64
+   S       2       0         222           0       111        204911           1846  System.Byte[]
+   S       1       0          96           0        96        102448           1067  System.Reflection.RuntimeParameterInfo
+   S       1       0         112           0       112        102456            914  System.Reflection.ParameterInfo[]
+  ST     280     272       40320       39168       144      28692164         199251  Object128
+   S       2       0       58224           0     29112        235289              8  EventMetadata[]
+  ST       1       1     8388632     8388640   8388632       8388632              1  Object0[]
+   T       0       1           0         336       336             0              0  System.Reflection.RuntimeFieldInfo[]
+   T       0       1           0          48        48             0              0  System.Text.StringBuilder
+```
+
+- The **Tag** column shows if Allocation**T**ick and/or Allocation**S**ampled events where received for instances of a given type
+- The **S**-prefixed colums refer to data from AllocationSampled events payload
+- The **T**-prefixed colums refer to data from AllocationTick events payload
+- The final **Upscaled**XXX columns are computed from AllocationSampled events payload
+
+In this special case, the same number of 200000 instances were created and should be checked in the **UpscaledCount** column.
+
+In a second case, 2 threads allocate 200000 instances of objects with x1/x2/x3 size ratio to see how the relative size distribution is conserved:
+
+```
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST      47      67        1128        1608        24       4813364         200556  Object24
+ ST      65      48        2080        1536        32       6657040         208032  Object32
+ ST     108      94        5184        4512        48      11061792         230454  Object48
+ ST     132     145        8448        9280        64      13521024         211266  Object64
+ ST     155      87       11160        6264        72      15877580         220521  Object72
+ ST     191     192       18336       18432        96      19567569         203828  Object96
+ ST       2       2    16777264    16777280   8388632      16777264              2  Object0[]
+```
+
+
+A dedicated `AllocationsRunEventSource` has been created to allow monitoring multiple allocation runs and compute percentiles:
+```
+> starts 10 iterations allocating 1000000 instances
+0|
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST     246     224        5904        5376        24      25193352        1049723  Allocate.WithFinalizer
+ ST       5       7         320         448        64        512160           8002  System.RuntimeFieldInfoStub
+ ST     702     719       50544       51768        72      71910074         998751  System.Int32[,]
+ ST     946     859       90816       82464        96      96915815        1009539  System.String
+ ST    1842    1887      362874      377400       197     188802295         958387  System.Byte[]
+ ST       3       3    56000072    56000096  18666690      56000072              3  System.Object[]
+1|
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST     283     224        6792        5376        24      28982596        1207608  Allocate.WithFinalizer
+ ST     675     711       48600       51192        72      69144302         960337  System.Int32[,]
+ ST     974     867       93504       83232        96      99784359        1039420  System.String
+ ST    1861    1888      366617      377600       197     190749767         968272  System.Byte[]
+ ST       3       3    56000072    56000096  18666690      56000072              3  System.Object[]
+2|
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST     215     236        5160        5664        24      22018580         917440  Allocate.WithFinalizer
+ ST       1       1          64          64        64        102432           1600  System.RuntimeFieldInfoStub
+ ST     697     650       50184       46800        72      71397894         991637  System.Int32[,]
+ ST     927     917       88992       88032        96      94969302         989263  System.String
+ ST    1895    1886      373315      377200       197     194234717         985963  System.Byte[]
+ ST       3       3    56000072    56000096  18666690      56000072              3  System.Object[]
+  T       0       1           0         288       288             0              0  System.GCMemoryInfoData
+3|
+...
+8|
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST     244     213        5856        5112        24      24988528        1041188  Allocate.WithFinalizer
+ ST     710     681       51120       49032        72      72729562        1010132  System.Int32[,]
+ ST     974     918       93504       88128        96      99784359        1039420  System.String
+ ST    1920    1875      378240      375000       197     196797180         998970  System.Byte[]
+ ST       3       3    56000072    56000096  18666690      56000072              3  System.Object[]
+9|
+Tag  SCount  TCount       SSize       TSize  UnitSize  UpscaledSize  UpscaledCount  Name
+-------------------------------------------------------------------------------------------
+ ST     236     219        5664        5256        24      24169232        1007051  Allocate.WithFinalizer
+ ST     698     682       50256       49104        72      71500330         993060  System.Int32[,]
+ ST     940     913       90240       87648        96      96301127        1003136  System.String
+ ST    1982    1874      390454      374800       197     203152089        1031228  System.Byte[]
+ ST       3       3    56000072    56000096  18666690      56000072              3  System.Object[]
+
+< run stops
+```
+
+**TODO: I guess the Pxx should be computed on the ***UpscaledCount*** column. TO BE CONFIRMED.**
+
+
+Feel free to allocate the patterns you want in other methods of the **_Allocate_** project and use the _DynamicAllocationSampling_ events listener to get a synthetic view of the different allocation events.
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md
new file mode 100644
index 00000000000000..e2c372e39fc3b9
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md
@@ -0,0 +1,50 @@
+# Test Results
+
+This folder has the results of manual testing observed for this feature. It is here so reviewers can see it but is planned to be deleted before the PR is merged.
+
+## statistical distribution measures
+The manual folder contains code to allocate and count objects in different runs.
+
+# Perf benchmarking
+The performance impact of the PR has been measured against a baseline.
+Each branch is built on Windows for x64 with:
+   .\build.cmd -s clr+libs -c release
+   src\tests\build.cmd generatelayoutonly Release
+
+## Baseline
+commit d1f0e2930f86e8771ccbefa96aead6f960ecc3f4 (HEAD)
+Author: Stephen Toub <stoub@microsoft.com>
+Date:   Sat Feb 3 18:52:31 2024 -0500
+
+This is what is used for all "Baseline" measurements because the changes in this PR started from here.
+
+## PR
+Latest version of the modified CoreCLR
+
+## Tool
+The GCPerfSim module from the Performance repository has been run 10 times to allocate 500 GB of mixed size objects on 4 threads with a 50MB live object size.
+<path_to_repo>\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe C:\git\benchmarks\artifacts\bin\GCPerfSim\release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+
+Here is the command line to measure the impact of computed and emitted events:
+dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- <same as previous command line>
+
+The goal is to emphasize the impact of allocations on performance and GC collection overhead.
+
+## Results
+The two implementation are very close in terms of impact.
+- GCPerfSimx10_Baseline.txt: .NET version before the PR
+  19.6675793   (median)
+  19.82903766  (average)
+
+- GCPerfSimx10_PullRequest.txt: PR without provider enabled
+  19.7984609  (median)
+  19.7717041  (average)
+
+It is expected that AllocationTick is more expensive because of the required Verbosity level that emits much more events than just AllocationTick:
+- GCPerfSimx10_PullRequest+Events.txt: same but with AllocationSampled emitted
+  21.0216025  (median)
+  21.03864168 (average)
+
+- GCPerfSimx10_Baseline+AllocationTick.txt: same but with AllocationTick emitted
+  22.6581132  (median)
+  22.78253674 (average)
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt
new file mode 100644
index 00000000000000..9f0bfade448dce
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt
@@ -0,0 +1,286 @@
+Duration in seconds
+-------------------
+23.1662995
+22.2750725
+22.8078224
+23.3056539
+23.8455668
+23.7292667
+22.508404
+22.4228874
+22.1675368
+21.5968574
+----------------------
+22.6581132  (median)
+22.78253674 (average)
+
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 48684
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 23.1662995
+collection_counts: [50621, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 63121952
+final_heap_size_bytes: 58251120
+final_fragmentation_bytes: 3586800
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 53784
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.2750725
+collection_counts: [50648, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 54469184
+final_heap_size_bytes: 56118032
+final_fragmentation_bytes: 1796880
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 41684
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.8078224
+collection_counts: [50696, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 58087672
+final_heap_size_bytes: 56061112
+final_fragmentation_bytes: 1904512
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 62368
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 23.3056539
+collection_counts: [50746, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55920168
+final_heap_size_bytes: 55982344
+final_fragmentation_bytes: 1887904
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 53412
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 23.8455668
+collection_counts: [50616, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 61739896
+final_heap_size_bytes: 55838944
+final_fragmentation_bytes: 1884488
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 65728
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 23.7292667
+collection_counts: [50523, 323, 3]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 62671224
+final_heap_size_bytes: 55978144
+final_fragmentation_bytes: 1898280
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 64628
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.508404
+collection_counts: [50640, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 64216560
+final_heap_size_bytes: 56097368
+final_fragmentation_bytes: 1903136
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 57576
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.4228874
+collection_counts: [50659, 11, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 62785608
+final_heap_size_bytes: 56379328
+final_fragmentation_bytes: 1724816
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 27968
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.1675368
+collection_counts: [50609, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 59109808
+final_heap_size_bytes: 55923952
+final_fragmentation_bytes: 1858248
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 61168
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 21.5968574
+collection_counts: [50721, 14, 3]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66074328
+final_heap_size_bytes: 55833128
+final_fragmentation_bytes: 1745464
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ 
+__________________
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt
new file mode 100644
index 00000000000000..10f62f1b3d5188
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt
@@ -0,0 +1,284 @@
+Duration in seconds
+-------------------
+19.8961064
+19.206025
+20.0542862
+19.9611777
+20.4653831
+19.8848856
+19.9312506
+20.0551618
+19.671435
+20.231068
+----------------------
+19.93567794  (average)
+
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 24520
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.8961064
+collection_counts: [50926, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 61251632
+final_heap_size_bytes: 58086336
+final_fragmentation_bytes: 3945408
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 41660
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.206025
+collection_counts: [50960, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55791536
+final_heap_size_bytes: 58300392
+final_fragmentation_bytes: 3948056
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 31336
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.0542862
+collection_counts: [50914, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 60374136
+final_heap_size_bytes: 57533224
+final_fragmentation_bytes: 3975448
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 42712
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.9611777
+collection_counts: [50897, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55580960
+final_heap_size_bytes: 58383224
+final_fragmentation_bytes: 3953624
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 33568
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.4653831
+collection_counts: [50857, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55607328
+final_heap_size_bytes: 58215040
+final_fragmentation_bytes: 3931776
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 33896
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.8848856
+collection_counts: [50934, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55872480
+final_heap_size_bytes: 58382160
+final_fragmentation_bytes: 3940656
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 41796
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.9312506
+collection_counts: [50931, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 64109136
+final_heap_size_bytes: 58342584
+final_fragmentation_bytes: 3973856
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 21784
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.0551618
+collection_counts: [50922, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 57598392
+final_heap_size_bytes: 57619760
+final_fragmentation_bytes: 3944216
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 44508
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.671435
+collection_counts: [50929, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55670080
+final_heap_size_bytes: 58372248
+final_fragmentation_bytes: 3927544
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
+__________________
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 35560
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.231068
+collection_counts: [50914, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 58203344
+final_heap_size_bytes: 58321568
+final_fragmentation_bytes: 3975280
+
+C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt
new file mode 100644
index 00000000000000..6a2ad9dbdb931c
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt
@@ -0,0 +1,286 @@
+Duration in seconds
+-------------------
+21.4792368
+20.2993439
+21.1766376
+22.1099492
+20.6253209
+20.4882028
+20.8665674
+21.4560576
+21.3212808
+20.5638198
+----------------------
+21.0216025  (median)
+21.03864168 (average)
+
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 66312
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 21.4792368
+collection_counts: [50717, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66072144
+final_heap_size_bytes: 56478584
+final_fragmentation_bytes: 1818416
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 57168
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.2993439
+collection_counts: [50742, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 54695160
+final_heap_size_bytes: 55915936
+final_fragmentation_bytes: 1919816
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 30336
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 21.1766376
+collection_counts: [50704, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55273496
+final_heap_size_bytes: 56467736
+final_fragmentation_bytes: 1811040
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 37628
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 22.1099492
+collection_counts: [50531, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 57894200
+final_heap_size_bytes: 58227536
+final_fragmentation_bytes: 3606488
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 56096
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.6253209
+collection_counts: [50731, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 58559456
+final_heap_size_bytes: 56497384
+final_fragmentation_bytes: 1844328
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 75284
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.4882028
+collection_counts: [50703, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 56799888
+final_heap_size_bytes: 56426552
+final_fragmentation_bytes: 1773128
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 63320
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.8665674
+collection_counts: [50694, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 57237456
+final_heap_size_bytes: 55808024
+final_fragmentation_bytes: 1835496
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 56612
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 21.4560576
+collection_counts: [50648, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 58474568
+final_heap_size_bytes: 58284656
+final_fragmentation_bytes: 3691816
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 15564
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 21.3212808
+collection_counts: [50689, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 59064240
+final_heap_size_bytes: 57605248
+final_fragmentation_bytes: 3663600
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time 
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 3776
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.5638198
+collection_counts: [50761, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66419896
+final_heap_size_bytes: 56421288
+final_fragmentation_bytes: 1830984
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ 
+_____________
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt
new file mode 100644
index 00000000000000..6a8b99caa3f9a1
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt
@@ -0,0 +1,311 @@
+Duration in seconds
+-------------------
+19.1773522
+20.0924279
+20.1548909
+20.2996304
+19.9037176
+19.9528813
+19.6312431
+19.2613891
+19.6083818
+18.8774711
+----------------------
+19.69593854  (average)
+
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 8
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.3572435
+collection_counts: [50920, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 65455968
+final_heap_size_bytes: 58357064
+final_fragmentation_bytes: 3915584
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 30128
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.1773522
+collection_counts: [50956, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 63548608
+final_heap_size_bytes: 58392416
+final_fragmentation_bytes: 3950576
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 34652
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.0924279
+collection_counts: [50912, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 57182464
+final_heap_size_bytes: 57644400
+final_fragmentation_bytes: 3965360
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 17960
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.1548909
+collection_counts: [50934, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55941992
+final_heap_size_bytes: 57975680
+final_fragmentation_bytes: 3982776
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 41712
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 20.2996304
+collection_counts: [50950, 10, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66670400
+final_heap_size_bytes: 58314976
+final_fragmentation_bytes: 3875168
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 28108
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.9037176
+collection_counts: [50942, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55632448
+final_heap_size_bytes: 58199336
+final_fragmentation_bytes: 3874504
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 40348
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.9528813
+collection_counts: [50921, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 55041704
+final_heap_size_bytes: 58356168
+final_fragmentation_bytes: 3914816
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 39776
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 0 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.6312431
+collection_counts: [50923, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 65766616
+final_heap_size_bytes: 58419512
+final_fragmentation_bytes: 3982760
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 22944
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.2613891
+collection_counts: [50951, 9, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66551888
+final_heap_size_bytes: 58370160
+final_fragmentation_bytes: 3928744
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun  C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 34784
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 3 stopping phase after 128000MB
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 19.6083818
+collection_counts: [50938, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 66478920
+final_heap_size_bytes: 58405344
+final_fragmentation_bytes: 3980528
+
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________
+_____________
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>
+C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
+allocating 134,217,728,000 per thread
+Running 64-bit? True
+PID: 44316
+Running 4 threads.
+time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets:
+    100-4000; surv every 0; pin every 0; weight 1000; isPoh False
+Thread 2 stopping phase after 128000MB
+Thread 0 stopping phase after 128000MB
+Thread 1 stopping phase after 128000MB
+Thread 3 stopping phase after 128000MB
+=== STATS ===
+sohAllocatedBytes: 536870916212
+lohAllocatedBytes: 0
+pohAllocatedBytes: 0
+seconds_taken: 18.8774711
+collection_counts: [50909, 8, 2]
+num_created_with_finalizers: 0
+num_finalized: 0
+final_total_memory_bytes: 58476048
+final_heap_size_bytes: 58355880
+final_fragmentation_bytes: 3991832
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt
new file mode 100644
index 00000000000000..8cd8ece0f1ee40
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt
@@ -0,0 +1,57 @@
+> starts 100 iterations allocating 1000000 instances of different types with different sizes (24 for WithFinalizer, 96 for string and 200 for byte[])
+...
+< run stops
+
+Allocate.WithFinalizer
+-------------------------
+   1  -15.9 %
+   2  -12.5 %
+   3  -12.5 %
+   4  -12.5 %
+   5  -12.1 %
+        ...
+  49   -0.6 %
+  50   -0.6 %
+  51   -0.6 %
+        ...
+  96   11.8 %
+  97   12.7 %
+  98   12.7 %
+  99   14.8 %
+ 100   17.8 %
+
+System.String
+-------------------------
+   1  -10.6 %
+   2   -7.1 %
+   3   -6.4 %
+   4   -6.3 %
+   5   -6.1 %
+        ...
+  49    0.3 %
+  50    0.3 %
+  51    0.4 %
+        ...
+  96    6.4 %
+  97    6.4 %
+  98    6.6 %
+  99    6.6 %
+ 100    7.2 %
+
+System.Byte[] (200 bytes)
+-------------------------
+   1   -6.8 %
+   2   -6.7 %
+   3   -5.4 %
+   4   -5.4 %
+   5   -5.0 %
+        ...
+  49   -0.2 %
+  50   -0.1 %
+  51   -0.1 %
+        ...
+  96    3.4 %
+  97    3.6 %
+  98    4.7 %
+  99    5.1 %
+ 100    5.3 %
\ No newline at end of file
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt
new file mode 100644
index 00000000000000..091819cb25ea73
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt
@@ -0,0 +1,113 @@
+
+> starts 100 iterations allocating 1000000 instances of class with proportional sizes (24 bytes, 32 bytes, 48 bytes, 64 bytes, 72 bytes and 96 bytes)
+...
+< run stops
+
+Object24
+-------------------------
+   1  -17.6 %
+   2  -15.9 %
+   3  -15.5 %
+   4  -13.0 %
+   5  -12.9 %
+        ...
+  49   -0.6 %
+  50   -0.6 %
+  51   -0.6 %
+        ...
+  96   10.9 %
+  97   11.4 %
+  98   11.8 %
+  99   13.9 %
+ 100   15.6 %
+
+Object32
+-------------------------
+   1  -17.7 %
+   2  -10.4 %
+   3   -9.1 %
+   4   -8.5 %
+   5   -8.5 %
+        ...
+  49   -0.5 %
+  50   -0.5 %
+  51   -0.5 %
+        ...
+  96   10.7 %
+  97   11.4 %
+  98   12.0 %
+  99   13.9 %
+ 100   15.2 %
+
+Object48
+-------------------------
+   1  -13.6 %
+   2  -10.0 %
+   3  -10.0 %
+   4   -9.5 %
+   5   -9.5 %
+        ...
+  49   -0.6 %
+  50   -0.4 %
+  51   -0.4 %
+        ...
+  96    9.0 %
+  97   10.3 %
+  98   10.3 %
+  99   11.0 %
+ 100   12.5 %
+
+Object64
+-------------------------
+   1  -10.1 %
+   2   -9.4 %
+   3   -8.8 %
+   4   -8.3 %
+   5   -7.8 %
+        ...
+  49   -0.3 %
+  50   -0.1 %
+  51   -0.1 %
+        ...
+  96    5.8 %
+  97    6.1 %
+  98    6.4 %
+  99    8.8 %
+ 100   10.9 %
+
+Object72
+-------------------------
+   1  -10.5 %
+   2   -8.9 %
+   3   -8.4 %
+   4   -7.8 %
+   5   -6.4 %
+        ...
+  49    0.2 %
+  50    0.2 %
+  51    0.2 %
+        ...
+  96    6.8 %
+  97    6.8 %
+  98    7.7 %
+  99    8.6 %
+ 100   10.0 %
+
+Object96
+-------------------------
+   1   -8.2 %
+   2   -7.1 %
+   3   -6.4 %
+   4   -6.1 %
+   5   -5.3 %
+        ...
+  49   -0.2 %
+  50   -0.2 %
+  51   -0.2 %
+        ...
+  96    4.9 %
+  97    5.0 %
+  98    5.5 %
+  99    5.9 %
+ 100    7.6 %
+
diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt
new file mode 100644
index 00000000000000..d1337607cde42b
--- /dev/null
+++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt
@@ -0,0 +1,70 @@
+
+> starts 10 iterations allocating 100000 instances of System.Byte[] (1K, 10K, 100K, 1M, 10M)
+...
+< run stops
+
+System.Byte[] (1048 bytes)
+-------------------------
+   1   -6.6 %
+   2   -4.5 %
+   3   -2.8 %
+   4   -0.4 %
+   5    1.1 %
+   6    1.4 %
+   7    2.6 %
+   8    2.6 %
+   9    2.7 %
+  10    2.8 %
+
+System.Byte[] (10264 bytes)
+-------------------------
+   1   -3.6 %
+   2   -0.5 %
+   3   -0.4 %
+   4   -0.4 %
+   5   -0.3 %
+   6    0.2 %
+   7    0.5 %
+   8    1.1 %
+   9    1.2 %
+  10    2.0 %
+
+System.Byte[] (102424 bytes)
+-------------------------
+   1   -0.5 %
+   2   -0.2 %
+   3   -0.2 %
+   4   -0.0 %
+   5   -0.0 %
+   6    0.0 %
+   7    0.1 %
+   8    0.2 %
+   9    0.2 %
+  10    0.4 %
+
+System.Byte[] (1024024 bytes)
+-------------------------
+   1   -0.0 %
+   2   -0.0 %
+   3   -0.0 %
+   4   -0.0 %
+   5   -0.0 %
+   6    0.0 %
+   7    0.0 %
+   8    0.0 %
+   9    0.0 %
+  10    0.0 %
+
+System.Byte[] (10240024 bytes)
+-------------------------
+   1   -0.0 %
+   2   -0.0 %
+   3   -0.0 %
+   4   -0.0 %
+   5    0.0 %
+   6    0.0 %
+   7    0.0 %
+   8    0.0 %
+   9    0.0 %
+  10    0.0 %
+
diff --git a/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs b/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs
index 6472a2f995cc9f..d6be259e8704ef 100644
--- a/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs
+++ b/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs
@@ -2,15 +2,17 @@
 // The .NET Foundation licenses this file to you under the MIT license.
 
 using System;
+using System.Collections.Generic;
 using System.Diagnostics.Tracing;
 using System.IO;
 using System.Linq;
+using System.Text;
 using System.Threading;
 using System.Threading.Tasks;
-using System.Collections.Generic;
 using Microsoft.Diagnostics.Tracing;
-using Tracing.Tests.Common;
+using Microsoft.Diagnostics.Tracing.Parsers.Clr;
 using Microsoft.Diagnostics.NETCore.Client;
+using Tracing.Tests.Common;
 using Xunit;
 
 namespace Tracing.Tests.SimpleRuntimeEventValidation
@@ -24,28 +26,30 @@ public static int TestEntryPoint()
             var ret = IpcTraceTest.RunAndValidateEventCounts(
                 // Validation is done with _DoesTraceContainEvents
                 new Dictionary<string, ExpectedEventCount>(){{ "Microsoft-Windows-DotNETRuntime", -1 }},
-                _eventGeneratingActionForGC, 
-                // GCKeyword (0x1): 0b1, GCAllocationTick requries Verbose level
+                _eventGeneratingActionForGC,
+                // GCKeyword (0x1): 0b1, GCAllocationTick requires Verbose level
                 new List<EventPipeProvider>(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Verbose, 0b1)},
                 1024, _DoesTraceContainGCEvents, enableRundownProvider:false);
+            if (ret != 100)
+                return ret;
 
             // Run the 2nd test scenario only if the first one passes
-            if(ret== 100)
+            if (ret == 100)
             {
                 ret = IpcTraceTest.RunAndValidateEventCounts(
-                    new Dictionary<string, ExpectedEventCount>(){{ "Microsoft-DotNETCore-EventPipe", 1 }}, 
-                    _eventGeneratingActionForExceptions, 
+                    new Dictionary<string, ExpectedEventCount>(){{ "Microsoft-DotNETCore-EventPipe", 1 }},
+                    _eventGeneratingActionForExceptions,
                     // ExceptionKeyword (0x8000): 0b1000_0000_0000_0000
-                    new List<EventPipeProvider>(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Warning, 0b1000_0000_0000_0000)}, 
+                    new List<EventPipeProvider>(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Warning, 0b1000_0000_0000_0000)},
                     1024, _DoesTraceContainExceptionEvents, enableRundownProvider:false);
 
-                if(ret == 100)
+                if (ret == 100)
                 {
-                ret = IpcTraceTest.RunAndValidateEventCounts(
-                    new Dictionary<string, ExpectedEventCount>(){{ "Microsoft-Windows-DotNETRuntime", -1}}, 
-                    _eventGeneratingActionForFinalizers, 
-                    new List<EventPipeProvider>(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0b1)}, 
-                    1024, _DoesTraceContainFinalizerEvents, enableRundownProvider:false);
+                    ret = IpcTraceTest.RunAndValidateEventCounts(
+                        new Dictionary<string, ExpectedEventCount>() { { "Microsoft-Windows-DotNETRuntime", -1 } },
+                        _eventGeneratingActionForFinalizers,
+                        new List<EventPipeProvider>() { new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0b1) },
+                        1024, _DoesTraceContainFinalizerEvents, enableRundownProvider: false);
                 }
             }
 
@@ -55,7 +59,7 @@ public static int TestEntryPoint()
                 return 100;
         }
 
-        private static Action _eventGeneratingActionForGC = () => 
+        private static Action _eventGeneratingActionForGC = () =>
         {
             for (int i = 0; i < 50; i++)
             {
@@ -70,7 +74,7 @@ public static int TestEntryPoint()
             }
         };
 
-        private static Action _eventGeneratingActionForExceptions = () => 
+        private static Action _eventGeneratingActionForExceptions = () =>
         {
             for (int i = 0; i < 10; i++)
             {
@@ -110,7 +114,7 @@ public static int TestEntryPoint()
             int GCRestartEEStartEvents = 0;
             int GCRestartEEStopEvents = 0;
             source.Clr.GCRestartEEStart += (eventData) => GCRestartEEStartEvents += 1;
-            source.Clr.GCRestartEEStop += (eventData) => GCRestartEEStopEvents += 1;            
+            source.Clr.GCRestartEEStop += (eventData) => GCRestartEEStopEvents += 1;
 
             int GCSuspendEEEvents = 0;
             int GCSuspendEEEndEvents = 0;
@@ -148,7 +152,7 @@ public static int TestEntryPoint()
         private static Func<EventPipeEventSource, Func<int>> _DoesTraceContainExceptionEvents = (source) =>
         {
             int ExStartEvents = 0;
-            source.Clr.ExceptionStart += (eventData) => 
+            source.Clr.ExceptionStart += (eventData) =>
             {
                 if(eventData.ToString().IndexOf("System.ArgumentNullException")>=0)
                     ExStartEvents += 1;