diff --git a/docs/design/features/RandomizedAllocationSampling.md b/docs/design/features/RandomizedAllocationSampling.md new file mode 100644 index 00000000000000..b27ff7cd208e46 --- /dev/null +++ b/docs/design/features/RandomizedAllocationSampling.md @@ -0,0 +1,332 @@ +# Randomized Allocation Sampling + +Christophe Nasarre (@chrisnas), Noah Falk (@noahfalk) - 2024 + +## Introduction + +.NET developers want to understand the GC allocation behavior of their programs both for general observability and specifically to better understand performance costs. Although the runtime has a very high performance GC, reducing the number of bytes allocated in a scenario can have notable impact on the total execution time and frequency of GC pauses. Some ways developers understand these costs are measuring allocated bytes in: +1. Microbenchmarks such as Benchmark.DotNet +2. .NET APIs such as [GC.GetAllocatedBytesForCurrentThread()](https://learn.microsoft.com/dotnet/api/system.gc.getallocatedbytesforcurrentthread) +3. Memory profiling tools such as VS profiler, PerfView, and dotTrace +4. Metrics or other production telemetry + +Analysis of allocation behavior often starts simple using the total bytes allocated while executing a block of code or during some time duration. However for any non-trivial scenario gaining a deeper understanding requires attributing allocations to specific lines of source code, callstacks, types, and object sizes. .NET's current state of the art technique for doing this is using a profiling tool to sample using the [AllocationTick](https://learn.microsoft.com/en-us/dotnet/fundamentals/diagnostics/runtime-garbage-collection-events#gcallocationtick_v3-event) event. When enabled this event triggers approximately every time 100KB has been allocated. However this sampling is not a random sample. It has a fixed starting point and stride which can lead to significant sampling error for allocation patterns that are periodic. This has been observed in practice so it isn't merely a theoretical concern. The new randomized allocation sampling feature is intended to address the shortcomings of AllocationTick and offer more rigorous estimations of allocation behavior and probabilistic error bounds. We do this by creating a new `AllocationSampled` event that profilers can opt into via any of our standard event tracing technologies (ETW, EventPipe, Lttng, EventListener). The new event is completely independent of AllocationTick and we expect profilers will prefer to use the AllocationSampled event on runtime versions where it is available. + +The initial part of this document describe the conceptual sampling model and how we suggest the data be interpretted by profilers. The latter portion describes how the sampling model is implemented in runtime code efficiently. + +## The sampling model + +When the new AllocationSampled event is enabled, each managed thread starts sampling independent of one another. For a given thread there will be a sequence of allocated objects Object_1, Object_2, etc that may continue indefinitely. Each object has a corresponding .NET type and size. The size of an object includes the object header, method table, object fields, and trailing padding that aligns the size to be a multiple of the pointer size. It does not include any additional memory the GC may optionally allocate for more than pointer-sized alignment, filling gaps that are impossible/inefficient to use for objects, or other GC bookkeeping data structures. Also note that .NET does have a non-GC heap where some objects that stay alive for process lifetime are allocated. Those non-GC heap objects are ignored by this feature. + +When each new object is allocated, conceptually the runtime starts doing independent [Bernoulli Trials](https://en.wikipedia.org/wiki/Bernoulli_trial) (weighted coin flips) for every byte in the object. Each trial has probability p = 1/102,400 of being considered a success. As soon as one successful trial is observed no more trials are performed for that object and an AllocationSampled event is emitted. This event contains the object type, its size, and the 0-based offset of the byte where the successful trial occured. This means for a given object if an event was generated `offset` failed trials occured followed by a successful trial, and if no event is generated `size` failed trials occured. This process continues indefinitely for each new allocated object. + +This sampling process is closely related to the [Bernoulli process](https://en.wikipedia.org/wiki/Bernoulli_process) and is a well studied area of statistics. Skipping ahead to the end of an object once a successful sample has been produced does require some accomodations in the analysis, but many key results are still applicable. + +## Using the feature + +### Enabling sample events + +The allocation sampled events are enabled on the `Microsoft-Windows-DotNetRuntime` provider using keyword `0x80000000000` at informational or higher level. For more details on how to do this using different event tracing technologies see [here](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/eventsource-collect-and-view-traces). + +### Interpretting sample events + +Although diagnostic tools are free to interpret the data in whatever way, we have some recommendations for analysis that we expect is useful and statistically sound. + +#### Definitions + +For all of this section assume that we enabled the AllocationSampling events and have observed `s` such sample events were generated from a specific thread - `event_1`, `event_2`, ... `event_s`. Each `event_i` contains corresponding fields `type_i`, `size_i`, and `offset_i`. Let `u_i = size_i - offset_i`. `u_i` represents the successful trial byte + the number bytes which remained after it in the same object. Let `u` = the sum of all the `u_i`, `i` = 1 to `s`. `p` is the constant 1/102400, the probability that each trial is successful. `q` is the complement 1 - 1/102400. + +#### Estimation strategies + +We have explored two different mathematical approaches for [estimating](https://en.wikipedia.org/wiki/Estimator) the number of bytes that were allocated given a set of observed sample events. Both approaches are unbiased which means if we repeated the same sampling procedure many times we expect the average of the estimates to match the number of bytes allocated. Where the approaches differ is on the particular distribution of the estimates. + +#### Estimation Approach 1: Weighted samples + +We believe this approach gives estimates with lower [Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error) but the exact shape of the distribution is hard to calculate so we don't know a good way to produce [confidence intervals](https://en.wikipedia.org/wiki/Confidence_interval) based on small numbers of samples. The distribution does approach a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) as the number of samples increase ([Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem)) but we haven't done any analysis attempting to define how rapidly that convergence occurs. + +To estimate the number of bytes using this technique let `estimate_i = size_i/(1 - q^size_i)` for each sample `i`. Then sum `estimate_i` over all samples to get a total estimate of the allocated bytes. With sufficiently many samples the estimate distribution should converge to a normal distribution with variance at most `N*q/p` for `N` total bytes of allocated objects. + +##### Statistics stuff + +Understanding this part isn't necessary to use the estimation formula above but may be helpful. + +Proving the weighted sample estimator is unbiased: +Consider the sequence of all objects allocated on a thread. Let `X_j` be a random indicator variable that has value `size_j/(1 - q^size_j)` if the `j`th object is sampled, otherwise zero. Our estimation formula above is the sum of all `X_j` because only the sampled objects will contribute a non-zero term. Based on our sampling procedure the probability for an object to be sampled is `1-q^size_j` which means `E(X_j) = size_j/(1 - q^size_j) * Pr(object j is sampled) = size_j/(1 - q^size_j) * (1 - q^size_j) = size_j`. By linearity of expectation, the expected value of the sum is the sum of the expected values = sum of `size_j` for all `j` = total size of allocated objects. + +The variance for this estimation is the sum of variances for each `X_j` term, `(size_j^2)*(q^size_j)/(1-q^size_j)`. If we assume there are `N` total bytes of objects divided up into `N/n` objects of size `n` then total variance for that set of objects would be `(N/n)*n^2*q^n/(1-q^n) = N*n*q^n/(1-q^n)`. That expression is maximized when `n=1` so the maximum variance for any collection objects with total size `N` is `N*1*q^1/(1-q^1) = N*q/(1-q) = N*q/p`. + +#### Estimation Approach 2: Estimating failed trials + +This is an alternate estimate that has a more predictable distribution, but potentially higher [Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error). You could use this approach to produce both estimates and confidence intervals, or use the weighted sample formula to produce estimates and use this one solely as a conservative confidence interval for the estimate. + +The estimation formula is `sq/p + u`. + +This estimate is based on the [Negative Bernoulli distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution) with `s` successes and `p` chance of success. The `sq/p` term is the mean of this distribution and represents the expected number of failed trials necessary to observe `s` successful trials. The `u` term then adds in the number of successful trials (1 per sample) and the number of bytes for which no trials were performed (`u_i-1` per sample). + +Here is an approach to calculate a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) estimate based on this distribution: + +1. Decide on some probability `C` that the actual number of allocated bytes `N` should fall within the interval. You can pick a probability arbitrarily close to 1 however the higher the probability is the wider the estimated interval will be. For the remaining `1-C` probability that `N` is not within the interval we will pick the upper and lower bounds so that there is a `(1-C)/2` chance that `N` is below the interval and `(1-C)/2` chance that `N` is above the interval. We think `C=0.95` would be a reasonable choice for many tools which means there would be a 2.5% chance the actual value is below the lower bound, a 95% chance it is between the lower and upper bound, and a 2.5% chance it is above the upper bound. + +2. Implement some method to calculate the Negative Binomial [CDF](https://en.wikipedia.org/wiki/Cumulative_distribution_function). Unfortunately there is no trivial formula for this but there are a couple potential approaches: + a. The Negative Binomial Distribution has a CDF defined based on the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function). There are various numerical libraries such as scipy in Python that will calculate this for you. Alternately you could directly implement numerical approximation techniques to evaluate the function, either approximating the integral form or approximating the continued fraction expansion. + b. The Camp-Paulson approximation described in [Barko(66)](https://www.stat.cmu.edu/technometrics/59-69/VOL-08-02/v0802345.pdf). We validated that for p=0.00001 this approximation was within ~0.01 of the true CDF for any number of failures at s=1, within ~0.001 of the true CDF at s=5, and continues to get more accurate as the sample count increases. + +3. Do binary search on the CDF to locate the input number of failures for which `CDF(failures, s, p)` is closest to `(1-C)/2` and `C + (1-C)/2`. Assuming that `CDF(low_failures, s, p) = (1-C)/2` and `CDF(high_failures, s, p) = C + (1-C)/2` then the confidence interval for `N` is `[low_failures+u, high_failures+u]`. + +For example if we select `C=0.95`, observed 8 samples and `u=10,908` then we'd use binary search to find `CDF(353666, 8, 1/102400) ~= 0.025` and `CDF(1476870, 8, 1/102400) ~= 0.975`. Our interval estimate for the number of bytes allocated would be `[353666 + 10908, 1476870 + 10908]`. + +To get a rough idea of the error in proportion to the number of samples, here is table of calculated Negative Binomial failed trials for the 0.025 and 0.975 thresholds of the CDF: + +| # of samples (successes) | CDF() = 0.025 | CDF() = 0.975 | +| ---------------------------| ------------------------| --------------------------- | +| 1 | 2591 | 377738 | +| 2 | 24800 | 570531 | +| 3 | 63349 | 739802 | +| 4 | 111599 | 897761 | +| 5 | 166241 | 1048730 | +| 6 | 225469 | 1194827 | +| 7 | 288185 | 1337279 | +| 8 | 353666 | 1476870 | +| 9 | 421407 | 1614137 | +| 10 | 491039 | 1749469 | +| 20 | 1250954 | 3038270 | +| 30 | 2072639 | 4264804 | +| 40 | 2926207 | 5459335 | +| 50 | 3800118 | 6633475 | +| 100 | 8331581 | 12342053 | +| 200 | 17739679 | 23413825 | +| 300 | 27341465 | 34291862 | +| 400 | 37043463 | 45069676 | +| 500 | 46809487 | 55783459 | +| 1000 | 96149867 | 108842093 | +| 2000 | 195919830 | 213870137 | +| 3000 | 296301551 | 318286418 | +| 4000 | 396999923 | 422386047 | +| 5000 | 497900649 | 526283322 | +| 10000 | 1004017229 | 1044156743 | + +Notice that if we compare the expected total number of trials (102400 * # of samples) to the estimated ranges, at 10 samples the error bars extend more than 50% in each direction showing the predictions on so few samples are very imprecise. However at 1,000 samples the error is ~6% in each direction and at 10,000 samples ~2% in each direction. + +The variance for the Negative Binomial Distribution is `sq/p^2`. In the limit where all allocated objects have size 1 byte, `E(s)=Np` which gives an expected variance of `Nq/p`, the same as with the weighted sampled approach. However as object sizes increase the variance on approach 1 decreases more rapidly than in this approach. + +#### Compensating for bytes allocated on a thread in between events + +It is likely you want to estimate allocations starting and ending at arbitrary points in time that do not correspond exactly with the moment a sampling event was emitted. This means the initial sampling event covered more time than the allocation period we are interested in and the allocations at the end aren't included in any sampling event. You can conservatively adjust the error bounds to account for the uncertainty in the starting and ending allocations. If the starting point is not aligned with a sampling event calculate the lower bound of allocated bytes as if there was one fewer sample received. If the ending point is not aligned with a sampling event calculate the upper bound as if there was one more sample received. + +#### Estimating the total number of bytes allocated on all threads + +The per-thread estimations can be repeated for all threads and summed up. + +#### Estimating the number of bytes allocated for objects of a specific type, size or other characteristic + +Select from the sampling events only those events which occured in objects matching your criteria. For example if you want to estimate the number of bytes allocated for Foo typed objects, select the samples in Foo-typed objects. Using this reduced set of samples do the same estimation technique as above. The error on this estimation will also be based on the number of samples in your filtered subset. If there were 1000 initial samples but only 3 of those samples were in Foo-typed objects that might generate an estimate of 310K bytes of Foo objects but beware that the potential sampling error for a small number of samples is very large. + +## Implementation design + +Overall the implementation needs to do a few steps: +1. Determine if sampling events are enabled. If no, there is nothing else to do, but if yes we need to do steps (2) and (3). +2. Use a random number generator to simulate random trials for each allocated byte and determine which objects contain the successful trials +3. When a successful trial occurs, emit the AllocationSampled event + +Steps (1) and (3) are straightforward but step (2) is non-trivial to do correctly and performantly. For step (1) we use the existing macro ETW_TRACING_CATEGORY_ENABLED() which despite its name works for all our event tracing technologies. For step (3) we defined a method FireAllocationSampled() in gchelpers.cpp and the code to emit the event is in there. Like all runtime events the definition for the event itself is in ClrEtwAll.man. All the remaining discussion is how we accomplish step (2). + +Our conceptual sampling model involves doing Bernoulli trials for every byte of an object. In theory we could implement that very literally. Each object allocation would run a for loop n iterations for an n byte object and generate random coin flips with a pseudo random number generator (PRNG). However doing this would be incredibly slow. A good way to understand the actual implementation is to imagine we started with this simple slow approach and then did several iterative transformations to make it run faster while maintaining the same output. Imagine that we have some function `bool GetNextTrialResult(CLRRandom* pRandom)` that takes a PRNG and should randomly return true with probability 1 in 102,400. It might be implemented: + +``` +bool GetNextTrialResult(CLRRandom* pRandom) +{ + return pRandom->NextDouble() < 1/102400; +} +``` + +We don't have to generate random numbers at the instant we need them however, we are allowed to cache a batch of them at a time and dispense them later. For simplicity treat all the apparent global variables in these examples as being thread-local. In pseudo-code that looks like: + +``` +List _cachedTrials = PopulateTrials(pRandom); + +List PopulateTrials(CLRRandom* pRandom) +{ + List trials = new List(); + for(int i = 0; i < 100; i++) + { + trials.Push(pRandom->NextDouble() < 1/102400); + } + return trials; +} + + +bool GetNextTrialResult(CLRRandom* pRandom) +{ + bool ret = _cachedTrials.Pop(); + + // if we are out of trials, cache some more for next time + if(_cachedTrials.Count == 0) + { + _cachedTrials = PopulateTrials(pRandom); + } + + return ret; +} +``` + +Notice that almost the every entry in the cached list will be false so this is an inefficient way to store it. Rather than storing a large number of false bits we could store a single number that represents a run of zero or more contiguous false bools followed by a single true bool. There is also no requirement that our cached batches of trials are the same size so we could cache exactly one run of false results. In pseudo-code that looks like: + +``` +BigInteger _cachedFailedTrials = PopulateTrials(pRandom); + +BigInteger PopulateTrials(CLRRandom* pRandom) +{ + BigInteger failedTrials = 0; + while(pRandom->NextDouble() >= 1/102400) + { + failedTrials++; + } + return failedTrials; +} + +bool GetNextTrialResult(CLRRandom* pRandom) +{ + bool ret = (_cachedFailedTrials == 0); + _cachedFailedTrials--; + + // if we are out of trials, cache some more for next time + if(cachedTrials < 0) + { + _cachedFailedTrials = PopulateTrials(pRandom); + } + + return ret; +} +``` + +Rather than generating `_cachedFailedTrials` by doing many independent queries to a random number generator we can use some math to speed this up. The probability `_cachedFailedTrials` has some particular value `X` is given by the [Geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution). We can use [Inverse Transform Sampling](https://en.wikipedia.org/wiki/Inverse_transform_sampling) to generate random values for this distribution directly. The CDF for the Geometric distribution is `1-(1-p)^(floor(x)+1)` which means the inverse is `floor(ln(1-y)/ln(1-p))`. + +We've been using BigInteger so far because mathmatically there is a non-zero probability of getting an arbitrarily large number of failed trials in a row. In practice however our PRNG has its outputs constrained to return a floating point number with value k/MAX_INT for an integer value of k between 0 and MAX_INT-1. The largest value PopulateTrials() can return under these constraints is ~2.148M which means a 32bit integer can easily accomodate the value. The perfect mathematical model of the Geometric distribution has a 0.00000005% chance of getting a larger run of failed trials but our PRNG rounds that incredibly unlikely case to zero probability. + +Both of these changes combined give the pseudo-code + +``` +int _cachedFailedTrials = CalculateGeometricRandom(pRandom); + +// Previously this method was called PopulateTrials() +// Use Inverse Transform Sampling to calculate a random value from the Geometric distribution +int CalculateGeometricRandom(CLRRandom* pRandom) +{ + return floor(log(1-pRandom->NextDouble())/log(1-1/102400)); +} + +bool GetNextTrialResult(CLRRandom* pRandom) +{ + bool ret = (_cachedFailedTrials == 0); + _cachedFailedTrials--; + + // if we are out of trials, cache some more for next time + if(_cachedFailedTrials < 0) + { + _cachedFailedTrials = CalculateGeometricRandom(pRandom); + } + + return ret; +} +``` + +When allocating an object we need to do many trials at once, one for each byte. A naive implementation of that would look like: + +``` +bool DoesAnyTrialSucceed(CLRRandom* pRandom, int countOfTrials) +{ + for(int i = 0; i < countOfTrials; i++) + { + if(GetNextTrialResult(pRandom)) return true; + } + return false; +} +``` + +However the `_cachedFailedTrials` representation lets us speed this up by checking if the number of failed trials in the cache covers the number of trials we need to perform without iterating through them one at a time: + +``` +bool DoesAnyTrialSucceed(CLRRandom* pRandom, int countOfTrials) +{ + bool ret = _cachedFailedTrials < countOfTrials; + _cachedFailedTrials -= countOfTrials; + + // if we are out of trials, cache some more for next time + if(ret) + { + _cachedFailedTrials = CalculateGeometricRandom(pRandom); + } + + return ret; +} +``` + + +We are getting closer to mapping our pseudo-code implementation to the real CLR code. The current CLR implementation for memory allocation has the GC sub-allocate blocks of memory 8KB in size which the runtime is allowed to sub-allocate from. The GC gives out an `alloc_context` to each thread which has a `alloc_ptr` and 'alloc_limit' fields. These fields define the memory range [alloc_ptr, alloc_limit) which can be used to sub-allocate objects. The runtime has optimized assembly code helper functions to increment `alloc_ptr` directly for objects that are small enough to fit in the current range and don't require any special handling. For all other objects the runtime invokes a slower allocation path that ultimately calls the GC's Alloc() function. If the alloc_context is exhausted, calling GC Alloc() also allocates a new 8KB block for future fast object allocations to use. In order to allocate objects we could naively do this: + +``` +void* FastAssemblyAllocate(int objectSize) +{ + Thread* pThread = GetThread(); + CLRRandom* pRandom = pThread->GetRandom(); + alloc_context* pAllocContext = pThread->GetAllocContext(); + void* alloc_end = pAllocContext->alloc_ptr + objectSize; + if(IsSamplingEnabled() && DoesAnyTrialSucceed(pRandom, objectSize)) + PublishSamplingEvent(); + if(alloc_limit < alloc_end) + return SlowAlloc(objectSize); + else + void* objectAddr = pAllocContext->alloc_ptr; + pAllocContext->alloc_ptr = alloc_end; + *objectAddr = methodTable + return objectAddr; +} +``` + +Although orders of magnitude faster than where we started, this is still too slow. We don't want to put extra conditional checks for IsSamplingEnabled() and DoesAnyTrialSucceed() in the fast path of every allocation. Instead we want to combine the two if conditions down to a single compare and jump, then handle publishing a sample event as part of the slow allocation path. Note that the value of the expression `alloc_ptr + _cachedFailedTrials` doesn't change by repeated calls to the FastAssemblyAllocate() as long as we don't go down the SlowAlloc path or the PublishSamplingEvent() path. Each invocation increments `alloc_ptr` by `objectSize` and decrements `_cachedFailedTrials` by the same amount leaving the sum unchanged. Lets define that sum `alloc_ptr + _cachedFailedTrials = sampling_limit`. You can imagine that if we started allocating objects contiguously from `alloc_ptr`, `sampling_limit` represents the point in the memory range where whatever object overlaps it contains the successful trial and emits the sampling event. A little more rigorously `DoesAnyTrialSucceed()` returns true when `_cachedFailedTrials < objectSize`. Adding `alloc_ptr` to each side shows this is the same as the condition `sampling_limit < alloc_end`: + +``` +_cachedFailedTrials < objectSize +_cachedFailedTrials + alloc_ptr < objectSize + alloc_ptr +sampling_limit < alloc_end +``` + +Last to combine the two if conditionals we can define a new field `combined_limit = min(sampling_limit, alloc_limit)`. If sampling events aren't enabled then we define `combined_limit = alloc_limit`. This means that a single check `if(alloc_end < combined_limit)` detects when either the object exceeds `alloc_limit` or it exceeds `sampling_limit`. The runtime actually has a bunch of different fast paths depending on the type of the object being allocated and the CPU architecture, but converted to pseudo-code they all look approximately like this: + +``` +void* FastAssemblyAllocate(int objectSize) +{ + Thread* pThread = GetThread(); + alloc_context* pAllocContext = pThread->GetAllocContext(); + void* alloc_end = pAllocContext->alloc_ptr + objectSize; + if(combined_limit < alloc_end) + return SlowAlloc(objectSize); + else + void* objectAddr = pAllocContext->alloc_ptr; + pAllocContext->alloc_ptr = alloc_end; + *objectAddr = methodTable + return objectAddr; +} +``` + +The only change we've made in the assembly helpers is doing a comparison against combined_limit instead of alloc_limit which should have no performance impact. Look at [JIT_TrialAllocSFastMP_InlineGetThread](https://github.com/dotnet/runtime/blob/5c8bb402e6a8274e8135dd00eda2248b4f57102f/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm#L38) for an example of what one of these helpers looks like in assembly code. + +The pseudo-code and concepts we've been describing here are now close to matching the runtime code but there are still some important differences to call out to map it more exactly: + +1. In the real runtime code the assembly helpers call a variety of different C++ helpers depending on object type and all of those helpers in turn call into [Alloc()](https://github.com/dotnet/runtime/blob/5c8bb402e6a8274e8135dd00eda2248b4f57102f/src/coreclr/vm/gchelpers.cpp#L201). Here we've omitted the different per-type intermediate functions and represented all of them as the SlowAlloc() function in the pseudo-code. + +2. The combined_limit field is a member of ee_alloc_context rather than alloc_context. This was done to avoid creating a breaking change in the EE<->GC interface. The ee_alloc_context contains an alloc_context within it as well as any additional fields we want to add that are only visible to the EE. + +3. In order to reduce the number of per-thread fields being managed the real implementation doesn't have an explicit `sampling_limit`. Instead this only exists as the transient calculation of `alloc_ptr + CalculateGeometricRandom()` that is used when computing an updated value for `combined_limit`. Whenever `combined_limit < alloc_limit` then it is implied that `sampling_limit = combined_limit` and `_cachedFailedTrials = combined_limit - alloc_ptr`. However if `combined_limit == alloc_limit` that represents one of two possible states: +- Sampling is disabled +- Sampling is enabled and we have a batch of cached failed trials with size `alloc_limit - alloc_ptr`. In the examples above our batches were N failures followed by a success but this is just N failures without any success at the end. This means no objects allocated in the current AC are going to be sampled and whenever we allocate the N+1st byte we'll need to generate a new batch of trial results to determine whether that byte was sampled. +If it turns out to be easier to track `sampling_limit` with an explicit field when sampling is enabled we could do that, it just requires an extra pointer per-thread. As memory overhead its not much, but it will probably land in the L1 cache and wind up evicting some other field on the Thread object that now no longer fits in the cache line. The current implementation tries to minimize this cache impact. We never did any perf testing on alternative implementations that do track sampling_limit explicitly so its possible the difference isn't that meaningful. + +4. When we generate batches of trial results in the examples above we always used all the results before generating a new batch, however the real implementation sometimes discards part of a batch. Implicitly this happens when we calculate a value for `sampling_limit=alloc_ptr+CalculateGeometricRandom()`, determine that `alloc_limit` is smaller than `sampling_limit`, and then set `combined_limit=alloc_limit`. Discarding also happens any time we recompute the `sampling_limit` based on a new random value without having fully allocated bytes up to `combined_limit`. It may seem suspicious that we can do this and still generate the correct distribution of samples but it is OK if done properly. Bernoulli trials are independent from one another so it is legal to discard trials from our cache as long as the decision to discard a given trial result is independent of what that trial result is. For example in the very first pseudo-code sample with the List, it would be legal to generate 100 boolean trials and then arbitrarily truncate the list to size 50. The first 50 values in the list are still valid bernoulli trials with the original p=1/102,400 of being true, as will be all the future ones from the batches that are populated later. However if we scanned the list and conditionally discarded any trials that we observed had a success result that would be problematic. This type of selective removal changes the probability distribution for the items that remain. + +5. The GC Alloc() function isn't the only time that the GC updates alloc_ptr and alloc_limit. They also get updated during a GC in the callback inside of GCToEEInterface::GcEnumAllocContexts(). This is another place where combined_limit needs to be updated to ensure it stays synchronized with alloc_ptr and alloc_limit. + + +## Thanks + +Thanks to Christophe Nasarre (@chrisnas) at DataDog for implementing this feature and Mikelle Rogers for doing the investigation of the Camp-Paulson approximation. \ No newline at end of file diff --git a/src/coreclr/debug/daccess/dacdbiimpl.cpp b/src/coreclr/debug/daccess/dacdbiimpl.cpp index a6dda591278557..d49cfecca6379c 100644 --- a/src/coreclr/debug/daccess/dacdbiimpl.cpp +++ b/src/coreclr/debug/daccess/dacdbiimpl.cpp @@ -6551,10 +6551,10 @@ HRESULT DacHeapWalker::Init(CORDB_ADDRESS start, CORDB_ADDRESS end) j++; } } - if ((&g_global_alloc_context)->alloc_ptr != nullptr) + if (g_global_alloc_context->alloc_ptr != nullptr) { - mAllocInfo[j].Ptr = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_ptr; - mAllocInfo[j].Limit = (CORDB_ADDRESS)(&g_global_alloc_context)->alloc_limit; + mAllocInfo[j].Ptr = (CORDB_ADDRESS)g_global_alloc_context->alloc_ptr; + mAllocInfo[j].Limit = (CORDB_ADDRESS)g_global_alloc_context->alloc_limit; } mThreadCount = j; diff --git a/src/coreclr/debug/daccess/request.cpp b/src/coreclr/debug/daccess/request.cpp index 2dc737db2e7007..69c68309099d08 100644 --- a/src/coreclr/debug/daccess/request.cpp +++ b/src/coreclr/debug/daccess/request.cpp @@ -5493,8 +5493,8 @@ HRESULT ClrDataAccess::GetGlobalAllocationContext( } SOSDacEnter(); - *allocPtr = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_ptr); - *allocLimit = (CLRDATA_ADDRESS)((&g_global_alloc_context)->alloc_limit); + *allocPtr = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_ptr); + *allocLimit = (CLRDATA_ADDRESS)(g_global_alloc_context->alloc_limit); SOSDacLeave(); return hr; } diff --git a/src/coreclr/gc/gc.cpp b/src/coreclr/gc/gc.cpp index 66e9efcaa15872..5d22871191ca42 100644 --- a/src/coreclr/gc/gc.cpp +++ b/src/coreclr/gc/gc.cpp @@ -44127,7 +44127,7 @@ size_t gc_heap::decommit_region (heap_segment* region, int bucket, int h_number) { #ifdef MULTIPLE_HEAPS // In return_free_region, we set heap_segment_heap (region) to nullptr so we cannot use it here. - // but since all heaps share the same mark array we simply pick the 0th heap to use.  + // but since all heaps share the same mark array we simply pick the 0th heap to use. gc_heap* hp = g_heaps [0]; #else gc_heap* hp = pGenGCHeap; @@ -49370,7 +49370,6 @@ bool GCHeap::StressHeap(gc_alloc_context * context) } \ } while (false) -#ifdef FEATURE_64BIT_ALIGNMENT // Allocate small object with an alignment requirement of 8-bytes. Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags) { @@ -49436,7 +49435,6 @@ Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t return newAlloc; } -#endif // FEATURE_64BIT_ALIGNMENT Object* GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_DCL) @@ -49497,15 +49495,11 @@ GCHeap::Alloc(gc_alloc_context* context, size_t size, uint32_t flags REQD_ALIGN_ } else { -#ifdef FEATURE_64BIT_ALIGNMENT if (flags & GC_ALLOC_ALIGN8) { newAlloc = AllocAlign8 (acontext, hp, size, flags); } else -#else - assert ((flags & GC_ALLOC_ALIGN8) == 0); -#endif { newAlloc = (Object*) hp->allocate (size + ComputeMaxStructAlignPad(requiredAlignment), acontext, flags); } diff --git a/src/coreclr/gc/gcpriv.h b/src/coreclr/gc/gcpriv.h index 1005d002029379..ed26fd10fc1b81 100644 --- a/src/coreclr/gc/gcpriv.h +++ b/src/coreclr/gc/gcpriv.h @@ -1465,9 +1465,7 @@ class gc_heap friend struct ::alloc_context; friend void ProfScanRootsHelper(Object** object, ScanContext *pSC, uint32_t dwFlags); friend void GCProfileWalkHeapWorker(BOOL fProfilerPinned, BOOL fShouldWalkHeapRootsForEtw, BOOL fShouldWalkHeapObjectsForEtw); -#ifdef FEATURE_64BIT_ALIGNMENT friend Object* AllocAlign8(alloc_context* acontext, gc_heap* hp, size_t size, uint32_t flags); -#endif //FEATURE_64BIT_ALIGNMENT friend class t_join; friend class gc_mechanisms; friend class seg_free_spaces; diff --git a/src/coreclr/inc/dacvars.h b/src/coreclr/inc/dacvars.h index 03995176313c24..18fb2c382313b9 100644 --- a/src/coreclr/inc/dacvars.h +++ b/src/coreclr/inc/dacvars.h @@ -140,7 +140,7 @@ DEFINE_DACVAR(ProfControlBlock, dac__g_profControlBlock, ::g_profControlBlock) DEFINE_DACVAR(PTR_DWORD, dac__g_card_table, ::g_card_table) DEFINE_DACVAR(PTR_BYTE, dac__g_lowest_address, ::g_lowest_address) DEFINE_DACVAR(PTR_BYTE, dac__g_highest_address, ::g_highest_address) -DEFINE_DACVAR(gc_alloc_context, dac__g_global_alloc_context, ::g_global_alloc_context) +DEFINE_DACVAR(UNKNOWN_POINTER_TYPE, dac__g_global_alloc_context, ::g_global_alloc_context) DEFINE_DACVAR(IGCHeap, dac__g_pGCHeap, ::g_pGCHeap) diff --git a/src/coreclr/inc/eventtracebase.h b/src/coreclr/inc/eventtracebase.h index 316104f649a1d8..ca3a559aa235da 100644 --- a/src/coreclr/inc/eventtracebase.h +++ b/src/coreclr/inc/eventtracebase.h @@ -1333,17 +1333,19 @@ namespace ETW #define ETWLoaderStaticLoad 0 // Static reference load #define ETWLoaderDynamicLoad 1 // Dynamic assembly load +#if defined(FEATURE_EVENT_TRACE) +EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context; +EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context; +EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context; +EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context; +#endif // FEATURE_EVENT_TRACE + #if defined(FEATURE_EVENT_TRACE) && !defined(HOST_UNIX) // // The ONE and only ONE global instantiation of this class // extern ETW::CEtwTracer * g_pEtwTracer; -EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context; -EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Context; -EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_DOTNET_Context; -EXTERN_C DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_STRESS_PROVIDER_DOTNET_Context; - // // Special Handling of Startup events // diff --git a/src/coreclr/nativeaot/Runtime/AsmOffsets.h b/src/coreclr/nativeaot/Runtime/AsmOffsets.h index 32abd406175e76..36efed6a3d4951 100644 --- a/src/coreclr/nativeaot/Runtime/AsmOffsets.h +++ b/src/coreclr/nativeaot/Runtime/AsmOffsets.h @@ -46,15 +46,16 @@ ASM_OFFSET( 0, 0, MethodTable, m_uFlags) ASM_OFFSET( 4, 4, MethodTable, m_uBaseSize) ASM_OFFSET( 14, 18, MethodTable, m_VTable) -ASM_OFFSET( 0, 0, Thread, m_rgbAllocContextBuffer) -ASM_OFFSET( 28, 38, Thread, m_ThreadStateFlags) -ASM_OFFSET( 2c, 40, Thread, m_pTransitionFrame) -ASM_OFFSET( 30, 48, Thread, m_pDeferredTransitionFrame) -ASM_OFFSET( 40, 68, Thread, m_ppvHijackedReturnAddressLocation) -ASM_OFFSET( 44, 70, Thread, m_pvHijackedReturnAddress) -ASM_OFFSET( 48, 78, Thread, m_uHijackedReturnValueFlags) -ASM_OFFSET( 4c, 80, Thread, m_pExInfoStackHead) -ASM_OFFSET( 50, 88, Thread, m_threadAbortException) +ASM_OFFSET( 0, 0, Thread, m_combined_limit) +ASM_OFFSET( 4, 8, Thread, m_rgbAllocContextBuffer) +ASM_OFFSET( 2c, 40, Thread, m_ThreadStateFlags) +ASM_OFFSET( 30, 48, Thread, m_pTransitionFrame) +ASM_OFFSET( 34, 50, Thread, m_pDeferredTransitionFrame) +ASM_OFFSET( 44, 70, Thread, m_ppvHijackedReturnAddressLocation) +ASM_OFFSET( 48, 78, Thread, m_pvHijackedReturnAddress) +ASM_OFFSET( 4c, 80, Thread, m_uHijackedReturnValueFlags) +ASM_OFFSET( 50, 88, Thread, m_pExInfoStackHead) +ASM_OFFSET( 54, 90, Thread, m_threadAbortException) ASM_SIZEOF( 14, 20, EHEnum) diff --git a/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt b/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt index f9b390e18d117a..fa3f5d0f8112c0 100644 --- a/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt +++ b/src/coreclr/nativeaot/Runtime/Full/CMakeLists.txt @@ -127,4 +127,4 @@ if (CLR_CMAKE_TARGET_ARCH_AMD64) if (CLR_CMAKE_TARGET_WIN32) install_static_library(Runtime.VxsortEnabled.GuardCF aotsdk nativeaot) endif (CLR_CMAKE_TARGET_WIN32) -endif (CLR_CMAKE_TARGET_ARCH_AMD64) \ No newline at end of file +endif (CLR_CMAKE_TARGET_ARCH_AMD64) diff --git a/src/coreclr/nativeaot/Runtime/GCHelpers.cpp b/src/coreclr/nativeaot/Runtime/GCHelpers.cpp index b038d9d33541bd..a5952315900bfa 100644 --- a/src/coreclr/nativeaot/Runtime/GCHelpers.cpp +++ b/src/coreclr/nativeaot/Runtime/GCHelpers.cpp @@ -29,6 +29,12 @@ #include "gcdesc.h" +#ifdef FEATURE_EVENT_TRACE + #include "clretwallmain.h" +#else // FEATURE_EVENT_TRACE + #include "etmdummy.h" +#endif // FEATURE_EVENT_TRACE + #define RH_LARGE_OBJECT_SIZE 85000 MethodTable g_FreeObjectEEType; @@ -471,6 +477,32 @@ EXTERN_C int64_t QCALLTYPE RhGetTotalAllocatedBytesPrecise() return allocated; } +inline void FireAllocationSampled(GC_ALLOC_FLAGS flags, size_t size, size_t samplingBudgetOffset, Object* orObject) +{ + void* typeId = GetLastAllocEEType(); + // Note: like for AllocationTick, the type name cannot be retrieved + WCHAR* name = nullptr; + + if (typeId != nullptr) + { + unsigned int allocKind = + (flags & GC_ALLOC_PINNED_OBJECT_HEAP) ? 2 : + (flags & GC_ALLOC_LARGE_OBJECT_HEAP) ? 1 : + 0; // SOH + unsigned int heapIndex = 0; +#ifdef BACKGROUND_GC + gc_heap* hp = gc_heap::heap_of((BYTE*)orObject); + heapIndex = hp->heap_number; +#endif + FireEtwAllocationSampled(allocKind, GetClrInstanceId(), typeId, name, heapIndex, (BYTE*)orObject, size, samplingBudgetOffset); + } +} + +inline size_t AlignUp(size_t value, uint32_t alignment) +{ + return (value + alignment - 1) & ~(size_t)(alignment - 1); +} + static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t numElements, Thread* pThread) { ASSERT(!pThread->IsDoNotTriggerGcSet()); @@ -539,10 +571,66 @@ static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t // Save the MethodTable for instrumentation purposes. tls_pLastAllocationEEType = pEEType; + // check for dynamic allocation sampling + gc_alloc_context* acontext = pThread->GetAllocContext(); + bool isSampled = false; + size_t availableSpace = 0; + size_t aligned_size = 0; + size_t samplingBudget = 0; + + bool isRandomizedSamplingEnabled = Thread::IsRandomizedSamplingEnabled(); + if (isRandomizedSamplingEnabled) + { + // object allocations are always padded up to pointer size + aligned_size = AlignUp(cbSize, sizeof(uintptr_t)); + + // The number bytes we can allocate before we need to emit a sampling event. + // This calculation is only valid if combined_limit < alloc_limit. + samplingBudget = (size_t)(*pThread->GetCombinedLimit() - acontext->alloc_ptr); + + // The number of bytes available in the current allocation context + availableSpace = (size_t)(acontext->alloc_limit - acontext->alloc_ptr); + + // Check to see if the allocated object overlaps a sampled byte + // in this AC. This happens when both: + // 1) The AC contains a sampled byte (combined_limit < alloc_limit) + // 2) The object is large enough to overlap it (samplingBudget < aligned_size) + // + // Note that the AC could have no remaining space for allocations (alloc_ptr = + // alloc_limit = combined_limit). When a thread hasn't done any SOH allocations + // yet it also starts in an empty state where alloc_ptr = alloc_limit = + // combined_limit = nullptr. The (1) check handles both of these situations + // properly as an empty AC can not have a sampled byte inside of it. + isSampled = + (*pThread->GetCombinedLimit() < acontext->alloc_limit) && + (samplingBudget < aligned_size); + + // if the object overflows the AC, we need to sample the remaining bytes + // the sampling budget only included at most the bytes inside the AC + if (aligned_size > availableSpace && !isSampled) + { + samplingBudget = pThread->ComputeGeometricRandom() + availableSpace; + isSampled = (samplingBudget < aligned_size); + } + } + Object* pObject = GCHeapUtilities::GetGCHeap()->Alloc(pThread->GetAllocContext(), cbSize, uFlags); if (pObject == NULL) return NULL; + if (isSampled) + { + FireAllocationSampled((GC_ALLOC_FLAGS)uFlags, aligned_size, samplingBudget, pObject); + } + + // There are a variety of conditions that may have invalidated the previous combined_limit value + // such as not allocating the object in the AC memory region (UOH allocations), moving the AC, adding + // extra alignment padding, allocating a new AC, or allocating an object that consumed the sampling budget. + // Rather than test for all the different invalidation conditions individually we conservatively always + // recompute it. If sampling isn't enabled this inlined function is just trivially setting + // combined_limit=alloc_limit. + pThread->UpdateCombinedLimit(isRandomizedSamplingEnabled); + pObject->set_EEType(pEEType); if (pEEType->HasComponentSize()) { @@ -555,7 +643,6 @@ static Object* GcAllocInternal(MethodTable* pEEType, uint32_t uFlags, uintptr_t #ifdef _DEBUG // We assume that the allocation quantum is never big enough for LARGE_OBJECT_SIZE. - gc_alloc_context* acontext = pThread->GetAllocContext(); ASSERT(acontext->alloc_limit - acontext->alloc_ptr <= RH_LARGE_OBJECT_SIZE); #endif diff --git a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S index 6cb85bcc507a09..e6891cb26d61a2 100644 --- a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S +++ b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.S @@ -28,7 +28,7 @@ NESTED_ENTRY RhpNewFast, _TEXT, NoHandler mov rsi, [rax + OFFSETOF__Thread__m_alloc_context__alloc_ptr] add rdx, rsi - cmp rdx, [rax + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp rdx, [rax + OFFSETOF__Thread__m_combined_limit] ja LOCAL_LABEL(RhpNewFast_RarePath) // set the new alloc pointer @@ -143,7 +143,7 @@ NESTED_ENTRY RhNewString, _TEXT, NoHandler // rcx == Thread* // rdx == string size // r12 == element count - cmp rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp rax, [rcx + OFFSETOF__Thread__m_combined_limit] ja LOCAL_LABEL(RhNewString_RarePath) mov [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax @@ -226,7 +226,7 @@ NESTED_ENTRY RhpNewArray, _TEXT, NoHandler // rcx == Thread* // rdx == array size // r12 == element count - cmp rax, [rcx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp rax, [rcx + OFFSETOF__Thread__m_combined_limit] ja LOCAL_LABEL(RhpNewArray_RarePath) mov [rcx + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax diff --git a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm index 37be558c3cef1d..ad3dd89821a97c 100644 --- a/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm +++ b/src/coreclr/nativeaot/Runtime/amd64/AllocFast.asm @@ -25,7 +25,7 @@ LEAF_ENTRY RhpNewFast, _TEXT mov rax, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_ptr] add r8, rax - cmp r8, [rdx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp r8, [rdx + OFFSETOF__Thread__m_combined_limit] ja RhpNewFast_RarePath ;; set the new alloc pointer @@ -118,7 +118,7 @@ LEAF_ENTRY RhNewString, _TEXT ; rdx == element count ; r8 == array size ; r10 == thread - cmp rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp rax, [r10 + OFFSETOF__Thread__m_combined_limit] ja RhpNewArrayRare mov [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax @@ -179,7 +179,7 @@ LEAF_ENTRY RhpNewArray, _TEXT ; rdx == element count ; r8 == array size ; r10 == thread - cmp rax, [r10 + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp rax, [r10 + OFFSETOF__Thread__m_combined_limit] ja RhpNewArrayRare mov [r10 + OFFSETOF__Thread__m_alloc_context__alloc_ptr], rax diff --git a/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc b/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc index 33089b6643d382..96d3be1ee31a8a 100644 --- a/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc +++ b/src/coreclr/nativeaot/Runtime/amd64/AsmMacros.inc @@ -337,8 +337,6 @@ TSF_DoNotTriggerGc equ 10h ;; Rename fields of nested structs ;; OFFSETOF__Thread__m_alloc_context__alloc_ptr equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -OFFSETOF__Thread__m_alloc_context__alloc_limit equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit - ;; GC type flags diff --git a/src/coreclr/nativeaot/Runtime/arm/AllocFast.S b/src/coreclr/nativeaot/Runtime/arm/AllocFast.S index 31b54d1bca313a..501923cc77f204 100644 --- a/src/coreclr/nativeaot/Runtime/arm/AllocFast.S +++ b/src/coreclr/nativeaot/Runtime/arm/AllocFast.S @@ -26,7 +26,7 @@ LEAF_ENTRY RhpNewFast, _TEXT ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_ptr] add r2, r3 - ldr r1, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr r1, [r0, #OFFSETOF__Thread__m_combined_limit] cmp r2, r1 bhi LOCAL_LABEL(RhpNewFast_RarePath) @@ -132,7 +132,7 @@ LEAF_ENTRY RhNewString, _TEXT adds r6, r12 bcs LOCAL_LABEL(RhNewString_RarePath) // if we get a carry here, the string is too large to fit below 4 GB - ldr r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr r12, [r0, #OFFSETOF__Thread__m_combined_limit] cmp r6, r12 bhi LOCAL_LABEL(RhNewString_RarePath) @@ -213,7 +213,7 @@ LOCAL_LABEL(ArrayAlignSize): adds r6, r12 bcs LOCAL_LABEL(RhpNewArray_RarePath) // if we get a carry here, the array is too large to fit below 4 GB - ldr r12, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr r12, [r0, #OFFSETOF__Thread__m_combined_limit] cmp r6, r12 bhi LOCAL_LABEL(RhpNewArray_RarePath) @@ -349,7 +349,7 @@ LEAF_ENTRY RhpNewFastAlign8, _TEXT // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add r2, r3 - ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr r3, [r0, #OFFSETOF__Thread__m_combined_limit] cmp r2, r3 bhi LOCAL_LABEL(Alloc8Failed) @@ -412,7 +412,7 @@ LEAF_ENTRY RhpNewFastMisalign, _TEXT // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add r2, r3 - ldr r3, [r0, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr r3, [r0, #OFFSETOF__Thread__m_combined_limit] cmp r2, r3 bhi LOCAL_LABEL(BoxAlloc8Failed) diff --git a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S index 966b052a2b9f9e..6cd6f044965b8d 100644 --- a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S +++ b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S @@ -11,8 +11,6 @@ GC_ALLOC_FINALIZE = 1 // Rename fields of nested structs // OFFSETOF__Thread__m_alloc_context__alloc_ptr = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -OFFSETOF__Thread__m_alloc_context__alloc_limit = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit - // Allocate non-array, non-finalizable object. If the allocation doesn't fit into the current thread's @@ -44,7 +42,7 @@ OFFSETOF__Thread__m_alloc_context__alloc_limit = OFFSETOF__Thread__m_rgbAll // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x13, [x1, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x13, [x1, #OFFSETOF__Thread__m_combined_limit] cmp x2, x13 bhi LOCAL_LABEL(RhpNewFast_RarePath) @@ -139,7 +137,7 @@ LOCAL_LABEL(NewOutOfMemory): // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x12, [x3, #OFFSETOF__Thread__m_combined_limit] cmp x2, x12 bhi LOCAL_LABEL(RhNewString_Rare) @@ -207,7 +205,7 @@ LOCAL_LABEL(RhNewString_Rare): // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x12, [x3, #OFFSETOF__Thread__m_combined_limit] cmp x2, x12 bhi LOCAL_LABEL(RhpNewArray_Rare) diff --git a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm index e6849b87312669..54176ad2920e6f 100644 --- a/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm +++ b/src/coreclr/nativeaot/Runtime/arm64/AllocFast.asm @@ -30,7 +30,7 @@ ;; Determine whether the end of the object would lie outside of the current allocation context. If so, ;; we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x13, [x1, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x13, [x1, #OFFSETOF__Thread__m_combined_limit] cmp x2, x13 bhi RhpNewFast_RarePath @@ -118,7 +118,7 @@ NewOutOfMemory ;; Determine whether the end of the object would lie outside of the current allocation context. If so, ;; we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x12, [x3, #OFFSETOF__Thread__m_combined_limit] cmp x2, x12 bhi RhpNewArrayRare @@ -179,7 +179,7 @@ StringSizeOverflow ;; Determine whether the end of the object would lie outside of the current allocation context. If so, ;; we abandon the attempt to allocate the object directly and fall back to the slow helper. add x2, x2, x12 - ldr x12, [x3, #OFFSETOF__Thread__m_alloc_context__alloc_limit] + ldr x12, [x3, #OFFSETOF__Thread__m_combined_limit] cmp x2, x12 bhi RhpNewArrayRare diff --git a/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h b/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h index 94a559df719e02..8bce14dd02a3e4 100644 --- a/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h +++ b/src/coreclr/nativeaot/Runtime/arm64/AsmMacros.h @@ -88,7 +88,6 @@ STATUS_REDHAWK_THREAD_ABORT equ 0x43 ;; Rename fields of nested structs ;; OFFSETOF__Thread__m_alloc_context__alloc_ptr equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -OFFSETOF__Thread__m_alloc_context__alloc_limit equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit ;; ;; IMPORTS diff --git a/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp b/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp index f0944fdf295179..886c9bb5cbb091 100644 --- a/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp +++ b/src/coreclr/nativeaot/Runtime/disabledeventtrace.cpp @@ -13,6 +13,8 @@ void EventTracing_Initialize() { } void ETW::GCLog::FireGcStart(ETW_GC_INFO * pGcInfo) { } +bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword) { return false; } + #ifdef FEATURE_ETW BOOL ETW::GCLog::ShouldTrackMovementForEtw() { return FALSE; } diff --git a/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst b/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst index 901af659ff84b6..77c9d8cb15a3da 100644 --- a/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst +++ b/src/coreclr/nativeaot/Runtime/eventpipe/gen-eventing-event-inc.lst @@ -1,5 +1,6 @@ # Native runtime events supported by aot runtime. +AllocationSampled BGC1stConEnd BGC1stNonConEnd BGC1stSweepEnd diff --git a/src/coreclr/nativeaot/Runtime/eventtrace.cpp b/src/coreclr/nativeaot/Runtime/eventtrace.cpp index 8b3d134f5c4f24..a7d72b55fca53c 100644 --- a/src/coreclr/nativeaot/Runtime/eventtrace.cpp +++ b/src/coreclr/nativeaot/Runtime/eventtrace.cpp @@ -39,6 +39,23 @@ DOTNET_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_PRIVATE_PROVIDER_DOTNET_Con volatile LONGLONG ETW::GCLog::s_l64LastClientSequenceNumber = 0; +bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword) +{ + // EventPipe is always taken into account + bool isEnabled = DotNETRuntimeProvider_IsEnabled(level, keyword); + +#ifdef FEATURE_ETW + // ETW is also taken into account on Windows + isEnabled |= ( + ETW_TRACING_INITIALIZED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_Context.RegistrationHandle) && + ETW_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_Context, level, keyword) + ); +#endif // FEATURE_ETW + + return isEnabled; +} + + //--------------------------------------------------------------------------------------- // // Helper to fire the GCStart event. Figures out which version of GCStart to fire, and @@ -245,4 +262,4 @@ void EventPipeEtwCallbackDotNETRuntimePrivate( _Inout_opt_ PVOID CallbackContext) { EtwCallbackCommon(DotNETRuntimePrivate, ControlCode, Level, MatchAnyKeyword, FilterData, true); -} \ No newline at end of file +} diff --git a/src/coreclr/nativeaot/Runtime/eventtrace.h b/src/coreclr/nativeaot/Runtime/eventtrace.h index 72f0ffa0f7a1fc..2483b692ee02ae 100644 --- a/src/coreclr/nativeaot/Runtime/eventtrace.h +++ b/src/coreclr/nativeaot/Runtime/eventtrace.h @@ -50,6 +50,8 @@ struct ProfilingScanContext : ScanContext }; #endif // defined(FEATURE_EVENT_TRACE) +bool IsRuntimeProviderEnabled(uint8_t level, uint64_t keyword); + namespace ETW { // Class to wrap all GC logic for ETW diff --git a/src/coreclr/nativeaot/Runtime/eventtracebase.h b/src/coreclr/nativeaot/Runtime/eventtracebase.h index 241c795c0d02fc..f0c1a6a99cfa12 100644 --- a/src/coreclr/nativeaot/Runtime/eventtracebase.h +++ b/src/coreclr/nativeaot/Runtime/eventtracebase.h @@ -102,6 +102,7 @@ struct ProfilingScanContext; #define CLR_GCHEAPSURVIVALANDMOVEMENT_KEYWORD 0x400000 #define CLR_MANAGEDHEAPCOLLECT_KEYWORD 0x800000 #define CLR_GCHEAPANDTYPENAMES_KEYWORD 0x1000000 +#define CLR_ALLOCATIONSAMPLING_KEYWORD 0x80000000000 // // Using KEYWORDZERO means when checking the events category ignore the keyword diff --git a/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp b/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp index f041e499c11d4b..0fdf4642f22a34 100644 --- a/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp +++ b/src/coreclr/nativeaot/Runtime/gcenv.ee.cpp @@ -132,11 +132,41 @@ void GCToEEInterface::GcScanRoots(ScanFunc* fn, int condemned, int max_gen, Scan sc->thread_under_crawl = NULL; } +void InvokeGCAllocCallback(Thread* pThread, enum_alloc_context_func* fn, void* param) +{ + // NOTE: Its possible that alloc_ptr = alloc_limit = combined_limit = NULL at this point + gc_alloc_context* pAllocContext = pThread->GetAllocContext(); + + // The allocation context might be modified by the callback, so we need to save + // the remaining sampling budget and restore it after the callback if needed. + size_t currentSamplingBudget = (size_t)(*pThread->GetCombinedLimit() - pAllocContext->alloc_ptr); + size_t currentSize = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr); + + fn(pAllocContext, param); + + // If the GC changed the size of the allocation context, we need to recompute the sampling limit + // This includes the case where the AC was initially zero-sized/uninitialized. + // Functionally we'd get valid results if we called UpdateCombinedLimit() unconditionally but its + // empirically a little more performant to only call it when the AC size has changed. + if (currentSize != (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr)) + { + pThread->UpdateCombinedLimit(); + } + else + { + // Restore the remaining sampling budget as the size is the same. + *pThread->GetCombinedLimit() = pAllocContext->alloc_ptr + currentSamplingBudget; + } +} + void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* param) { FOREACH_THREAD(thread) { - (*fn) (thread->GetAllocContext(), param); + //(*fn) (thread->GetAllocContext(), param); + + // update the combined limit is needed + InvokeGCAllocCallback(thread, fn, param); } END_FOREACH_THREAD } diff --git a/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm b/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm index 8d28e94c944177..4ddfab93ed1dbe 100644 --- a/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm +++ b/src/coreclr/nativeaot/Runtime/i386/AllocFast.asm @@ -29,7 +29,7 @@ FASTCALL_FUNC RhpNewFast, 4 ;; add eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr] - cmp eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp eax, [edx + OFFSETOF__Thread__m_combined_limit] ja AllocFailed ;; set the new alloc pointer @@ -165,7 +165,7 @@ FASTCALL_FUNC RhNewString, 8 mov ecx, eax add eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr] jc StringAllocContextOverflow - cmp eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp eax, [edx + OFFSETOF__Thread__m_combined_limit] ja StringAllocContextOverflow ; ECX == allocation size @@ -282,7 +282,7 @@ ArrayAlignSize: mov ecx, eax add eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_ptr] jc ArrayAllocContextOverflow - cmp eax, [edx + OFFSETOF__Thread__m_alloc_context__alloc_limit] + cmp eax, [edx + OFFSETOF__Thread__m_combined_limit] ja ArrayAllocContextOverflow ; ECX == array size diff --git a/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc b/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc index 896bf8e67dab53..f22b8f0bb5b814 100644 --- a/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc +++ b/src/coreclr/nativeaot/Runtime/i386/AsmMacros.inc @@ -141,7 +141,6 @@ STATUS_REDHAWK_THREAD_ABORT equ 43h ;; Rename fields of nested structs ;; OFFSETOF__Thread__m_alloc_context__alloc_ptr equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -OFFSETOF__Thread__m_alloc_context__alloc_limit equ OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit ;; ;; CONSTANTS -- SYMBOLS diff --git a/src/coreclr/nativeaot/Runtime/inc/rhbinder.h b/src/coreclr/nativeaot/Runtime/inc/rhbinder.h index db238e24acbc16..6cf67845d86d30 100644 --- a/src/coreclr/nativeaot/Runtime/inc/rhbinder.h +++ b/src/coreclr/nativeaot/Runtime/inc/rhbinder.h @@ -496,15 +496,15 @@ struct PInvokeTransitionFrame #define PInvokeTransitionFrame_MAX_SIZE (sizeof(PInvokeTransitionFrame) + (POINTER_SIZE * PInvokeTransitionFrame_SaveRegs_count)) #ifdef TARGET_AMD64 -#define OFFSETOF__Thread__m_pTransitionFrame 0x40 +#define OFFSETOF__Thread__m_pTransitionFrame 0x48 #elif defined(TARGET_ARM64) -#define OFFSETOF__Thread__m_pTransitionFrame 0x40 +#define OFFSETOF__Thread__m_pTransitionFrame 0x48 #elif defined(TARGET_LOONGARCH64) -#define OFFSETOF__Thread__m_pTransitionFrame 0x40 +#define OFFSETOF__Thread__m_pTransitionFrame 0x48 #elif defined(TARGET_X86) -#define OFFSETOF__Thread__m_pTransitionFrame 0x2c +#define OFFSETOF__Thread__m_pTransitionFrame 0x30 #elif defined(TARGET_ARM) -#define OFFSETOF__Thread__m_pTransitionFrame 0x2c +#define OFFSETOF__Thread__m_pTransitionFrame 0x30 #endif typedef DPTR(MethodTable) PTR_EEType; diff --git a/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S b/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S index dc344183e927ba..6974bebfb829bf 100644 --- a/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S +++ b/src/coreclr/nativeaot/Runtime/loongarch64/AllocFast.S @@ -11,9 +11,7 @@ GC_ALLOC_FINALIZE = 1 // Rename fields of nested structs // OFFSETOF__Thread__m_alloc_context__alloc_ptr = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -OFFSETOF__Thread__m_alloc_context__alloc_limit = OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit - - +// OFFSETOF__Thread__m_combined_limit is the sampling limit of the allocation context (or the end of it if no sampling - former alloc_limit) // Allocate non-array, non-finalizable object. If the allocation doesn't fit into the current thread's // allocation context then automatically fallback to the slow allocation path. @@ -44,7 +42,7 @@ OFFSETOF__Thread__m_alloc_context__alloc_limit = OFFSETOF__Thread__m_rgbAll // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add.d $a2, $a2, $t3 - ld.d $t4, $a1, OFFSETOF__Thread__m_alloc_context__alloc_limit + ld.d $t4, $a1, OFFSETOF__Thread__m_combined_limit bltu $t4, $a2, RhpNewFast_RarePath // Update the alloc pointer to account for the allocation. @@ -137,7 +135,7 @@ NewOutOfMemory: // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add.d $a2, $a2, $t3 - ld.d $t3, $a3, OFFSETOF__Thread__m_alloc_context__alloc_limit + ld.d $t3, $a3, OFFSETOF__Thread__m_combined_limit bltu $t3, $a2, RhNewString_Rare // Reload new object address into r12. @@ -199,7 +197,7 @@ RhNewString_Rare: // Determine whether the end of the object would lie outside of the current allocation context. If so, // we abandon the attempt to allocate the object directly and fall back to the slow helper. add.d $a2, $a2, $t3 - ld.d $t3, $a3, OFFSETOF__Thread__m_alloc_context__alloc_limit + ld.d $t3, $a3, OFFSETOF__Thread__m_combined_limit bltu $t3, $a2, RhpNewArray_Rare // Reload new object address into t3. diff --git a/src/coreclr/nativeaot/Runtime/thread.cpp b/src/coreclr/nativeaot/Runtime/thread.cpp index b796b052182260..2ba31535d1703a 100644 --- a/src/coreclr/nativeaot/Runtime/thread.cpp +++ b/src/coreclr/nativeaot/Runtime/thread.cpp @@ -28,6 +28,9 @@ #include "RhConfig.h" #include "GcEnum.h" +#include "eventtracebase.h" +#include "eventtrace.h" + #ifndef DACCESS_COMPILE static int (*g_RuntimeInitializationCallback)(); @@ -193,10 +196,13 @@ void * Thread::GetCurrentThreadPInvokeReturnAddress() } #endif // !DACCESS_COMPILE -#if defined(FEATURE_GC_STRESS) & !defined(DACCESS_COMPILE) void Thread::SetRandomSeed(uint32_t seed) { +#ifndef FEATURE_GC_STRESS ASSERT(!IsStateSet(TSF_IsRandSeedSet)); +#endif + + m_rng.InitSeed(seed); m_uRand = seed; SetState(TSF_IsRandSeedSet); } @@ -243,7 +249,6 @@ bool Thread::IsRandInited() { return IsStateSet(TSF_IsRandSeedSet); } -#endif // FEATURE_GC_STRESS & !DACCESS_COMPILE PTR_ExInfo Thread::GetCurExInfo() { @@ -300,11 +305,19 @@ void Thread::Construct() ASSERT(m_pGCFrameRegistrations == NULL); ASSERT(m_threadAbortException == NULL); + ASSERT(m_combined_limit == NULL); #ifdef FEATURE_SUSPEND_REDIRECTION ASSERT(m_redirectionContextBuffer == NULL); #endif //FEATURE_SUSPEND_REDIRECTION ASSERT(m_interruptedContext == NULL); + + if (!IsStateSet(TSF_IsRandSeedSet)) + { + // Initialize the random number generator seed + uint32_t seed = (uint32_t)PalGetTickCount64(); + SetRandomSeed(seed); + } } bool Thread::IsInitialized() @@ -347,12 +360,15 @@ uint64_t Thread::GetDeadThreadsNonAllocBytes() #endif } +uint32_t SamplingDistributionMean = (100 * 1024); + void Thread::Detach() { // clean up the alloc context gc_alloc_context* context = GetAllocContext(); s_DeadThreadsNonAllocBytes += context->alloc_limit - context->alloc_ptr; GCHeapUtilities::GetGCHeap()->FixAllocContext(context, NULL, NULL); + m_combined_limit = NULL; SetDetached(); } @@ -1321,6 +1337,45 @@ FCIMPL1(void, RhpReversePInvokeReturn, ReversePInvokeFrame * pFrame) } FCIMPLEND + +bool Thread::IsRandomizedSamplingEnabled() +{ + return IsRuntimeProviderEnabled(TRACE_LEVEL_INFORMATION, CLR_ALLOCATIONSAMPLING_KEYWORD); +} + +int Thread::ComputeGeometricRandom() +{ + const double maxValue = 0xFFFFFFFF; + + // compute a random sample from the Geometric distribution + double probability = (maxValue - (double)m_rng.next()) / maxValue; + int threshold = (int)(-log(1 - probability) * SamplingDistributionMean); + return threshold; +} + +void Thread::UpdateCombinedLimit(bool samplingEnabled) +{ + gc_alloc_context* alloc_context = GetAllocContext(); + if (!samplingEnabled) + { + m_combined_limit = alloc_context->alloc_limit; + } + else + { + // compute the next sampling limit based on a geometric distribution + uint8_t* sampling_limit = alloc_context->alloc_ptr + ComputeGeometricRandom(); + + // if the sampling limit is larger than the allocation context, no sampling will occur in this AC + m_combined_limit = (sampling_limit < alloc_context->alloc_limit) ? sampling_limit : alloc_context->alloc_limit; + } +} + +// Regenerate the randomized sampling limit and update the m_combined_limit field. +void Thread::UpdateCombinedLimit() +{ + UpdateCombinedLimit(IsRandomizedSamplingEnabled()); +} + #ifdef USE_PORTABLE_HELPERS FCIMPL1(void, RhpPInvoke2, PInvokeTransitionFrame* pFrame) diff --git a/src/coreclr/nativeaot/Runtime/thread.h b/src/coreclr/nativeaot/Runtime/thread.h index 4c0a21e9f9ab7f..f26cd3b3413813 100644 --- a/src/coreclr/nativeaot/Runtime/thread.h +++ b/src/coreclr/nativeaot/Runtime/thread.h @@ -6,6 +6,7 @@ #include "StackFrameIterator.h" #include "slist.h" // DefaultSListTraits +#include "xoshiro128plusplus.h" struct gc_alloc_context; class RuntimeInstance; @@ -83,8 +84,33 @@ struct InlinedThreadStaticRoot TypeManager* m_typeManager; }; +extern uint32_t SamplingDistributionMean; + struct RuntimeThreadLocals { + // Any allocation that would overlap combined_limit needs to be handled by the allocation slow path. + // combined_limit is the minimum of: + // - gc_alloc_context.alloc_limit (the end of the current AC) + // - the sampling_limit + // + // In the simple case that randomized sampling is disabled, combined_limit is always equal to alloc_limit. + // + // There are two different useful interpretations for the sampling_limit. One is to treat the sampling_limit + // as an address and when we allocate an object that overlaps that address we should emit a sampling event. + // The other is that we can treat (sampling_limit - alloc_ptr) as a budget of how many bytes we can allocate + // before emitting a sampling event. If we always allocated objects contiguously in the AC and incremented + // alloc_ptr by the size of the object, these two interpretations would be equivalent. However, when objects + // don't fit in the AC we allocate them in some other address range. The budget interpretation is more + // flexible to handle those cases. + // + // The sampling limit isn't stored in any separate field explicitly, instead it is implied: + // - if combined_limit == alloc_limit there is no sampled byte in the AC. In the budget interpretation + // we can allocate (alloc_limit - alloc_ptr) unsampled bytes. We'll need a new random number after + // that to determine whether future allocated bytes should be sampled. + // This occurs either because the sampling feature is disabled, or because the randomized selection + // of sampled bytes didn't select a byte in this AC. + // - if combined_limit < alloc_limit there is a sample limit in the AC. sample_limit = combined_limit. + uint8_t* m_combined_limit; uint8_t m_rgbAllocContextBuffer[SIZEOF_ALLOC_CONTEXT]; uint32_t volatile m_ThreadStateFlags; // see Thread::ThreadStateFlags enum PInvokeTransitionFrame* m_pTransitionFrame; @@ -99,6 +125,7 @@ struct RuntimeThreadLocals #endif // FEATURE_HIJACK PTR_ExInfo m_pExInfoStackHead; Object* m_threadAbortException; // ThreadAbortException instance -set only during thread abort + #ifdef TARGET_X86 PCODE m_LastRedirectIP; uint64_t m_SpinCount; @@ -115,9 +142,9 @@ struct RuntimeThreadLocals uint8_t* m_redirectionContextBuffer; // storage for redirection context, allocated on demand #endif //FEATURE_SUSPEND_REDIRECTION -#ifdef FEATURE_GC_STRESS uint32_t m_uRand; // current per-thread random number -#endif // FEATURE_GC_STRESS + // TODO: replace m_uRand with m_rng + sxoshiro128pp m_rng; // random number generator }; struct ReversePInvokeFrame @@ -144,9 +171,7 @@ class Thread : private RuntimeThreadLocals TSF_DoNotTriggerGc = 0x00000010, // Do not allow hijacking of this thread, also intended to // ...be checked during allocations in debug builds. TSF_IsGcSpecialThread = 0x00000020, // Set to indicate a GC worker thread used for background GC -#ifdef FEATURE_GC_STRESS - TSF_IsRandSeedSet = 0x00000040, // set to indicate the random number generator for GCStress was inited -#endif // FEATURE_GC_STRESS + TSF_IsRandSeedSet = 0x00000040, // set to indicate the random number generator was inited (used by GCSTRESS and AllocationSampled) #ifdef FEATURE_SUSPEND_REDIRECTION TSF_Redirected = 0x00000080, // Set to indicate the thread is redirected and will inevitably @@ -216,6 +241,12 @@ class Thread : private RuntimeThreadLocals bool IsInitialized(); gc_alloc_context * GetAllocContext(); + static bool IsRandomizedSamplingEnabled(); + uint8_t** GetCombinedLimit(); + int ComputeGeometricRandom(); + void UpdateCombinedLimit(); + // TODO: probably private + void UpdateCombinedLimit(bool samplingEnabled); uint64_t GetPalThreadIdForLogging(); @@ -256,11 +287,9 @@ class Thread : private RuntimeThreadLocals #ifndef DACCESS_COMPILE void SetThreadStressLog(void * ptsl); #endif // DACCESS_COMPILE -#ifdef FEATURE_GC_STRESS void SetRandomSeed(uint32_t seed); uint32_t NextRand(); bool IsRandInited(); -#endif // FEATURE_GC_STRESS PTR_ExInfo GetCurExInfo(); bool IsCurrentThreadInCooperativeMode(); diff --git a/src/coreclr/nativeaot/Runtime/thread.inl b/src/coreclr/nativeaot/Runtime/thread.inl index 2daffd06922134..fb148d5e8c6faa 100644 --- a/src/coreclr/nativeaot/Runtime/thread.inl +++ b/src/coreclr/nativeaot/Runtime/thread.inl @@ -1,6 +1,16 @@ // Licensed to the .NET Foundation under one or more agreements. // The .NET Foundation licenses this file to you under the MIT license. +#ifndef __thread_inl__ +#define __thread_inl__ + +// TODO: try to find out where the events symbols are defined +//#include "eventtracebase.h" +//#include "ClrEtwAll.h" + +#include "thread.h" + + #ifndef DACCESS_COMPILE // Set the m_pDeferredTransitionFrame field for GC allocation helpers that setup transition frame // in assembly code. Do not use anywhere else. @@ -64,6 +74,12 @@ inline gc_alloc_context* Thread::GetAllocContext() return (gc_alloc_context*)m_rgbAllocContextBuffer; } +inline uint8_t** Thread::GetCombinedLimit() +{ + return &m_combined_limit; +} + + inline bool Thread::IsStateSet(ThreadStateFlags flags) { return ((m_ThreadStateFlags & flags) == (uint32_t)flags); @@ -156,3 +172,5 @@ FORCEINLINE bool Thread::InlineTryFastReversePInvoke(ReversePInvokeFrame* pFrame return true; } + +#endif // __thread_inl__ diff --git a/src/coreclr/nativeaot/Runtime/threadstore.cpp b/src/coreclr/nativeaot/Runtime/threadstore.cpp index fb6255ba118a8e..10687f08ae1eeb 100644 --- a/src/coreclr/nativeaot/Runtime/threadstore.cpp +++ b/src/coreclr/nativeaot/Runtime/threadstore.cpp @@ -127,7 +127,7 @@ void ThreadStore::AttachCurrentThread(bool fAcquireThreadStoreLock) // Init the thread buffer // pAttachingThread->Construct(); - ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_Unknown); + ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_IsRandSeedSet); // fAcquireThreadStoreLock is false when threads are created/attached for GC purpose // in such case the lock is already held and GC takes care to ensure safe access to the threadstore @@ -138,7 +138,7 @@ void ThreadStore::AttachCurrentThread(bool fAcquireThreadStoreLock) // // Set thread state to be attached // - ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_Unknown); + ASSERT(pAttachingThread->m_ThreadStateFlags == Thread::TSF_IsRandSeedSet); pAttachingThread->m_ThreadStateFlags = Thread::TSF_Attached; pTS->m_ThreadList.PushHead(pAttachingThread); diff --git a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc index f8ec8f5037b1b2..78d1a461d1628f 100644 --- a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc +++ b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosamd64.inc @@ -241,7 +241,6 @@ C_FUNC(\Name): // Rename fields of nested structs // #define OFFSETOF__Thread__m_alloc_context__alloc_ptr OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr -#define OFFSETOF__Thread__m_alloc_context__alloc_limit OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit // GC type flags #define GC_ALLOC_FINALIZE 1 diff --git a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc index 68631819f7dee4..eea96fdd17d812 100644 --- a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc +++ b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosarm.inc @@ -29,7 +29,6 @@ // Rename fields of nested structs #define OFFSETOF__Thread__m_alloc_context__alloc_ptr (OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_ptr) -#define OFFSETOF__Thread__m_alloc_context__alloc_limit (OFFSETOF__Thread__m_rgbAllocContextBuffer + OFFSETOF__gc_alloc_context__alloc_limit) // GC minimal sized object. We use this to switch between 4 and 8 byte alignment in the GC heap (see AllocFast.asm). #define SIZEOF__MinObject 12 diff --git a/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h b/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h new file mode 100644 index 00000000000000..ad275526a51155 --- /dev/null +++ b/src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h @@ -0,0 +1,131 @@ +#pragma once + +/* Written in 2019 by David Blackman and Sebastiano Vigna (vigna@acm.org) + +To the extent possible under law, the author has dedicated all copyright +and related and neighboring rights to this software to the public domain +worldwide. + +Permission to use, copy, modify, and/or distribute this software for any +purpose with or without fee is hereby granted. + +THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR +IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ + +#include + +/* This is xoshiro128++ 1.0, one of our 32-bit all-purpose, rock-solid + generators. It has excellent speed, a state size (128 bits) that is + large enough for mild parallelism, and it passes all tests we are aware + of. + + For generating just single-precision (i.e., 32-bit) floating-point + numbers, xoshiro128+ is even faster. + + The state must be seeded so that it is not everywhere zero. */ + +// Note: the code has been changed to avoid static state in multi-threaded usage + +struct sxoshiro128pp +{ + static inline uint32_t rotl(const uint32_t x, int k) { + return (x << k) | (x >> (32 - k)); + } + + uint32_t s[4]; + + void InitSeed(uint32_t seed) + { + if (seed == 0) + { + seed = 997; + } + + s[0] = seed; + s[1] = seed; + s[2] = seed; + s[3] = seed; + jump(); + } + + uint32_t next(void) { + const uint32_t result = rotl(s[0] + s[3], 7) + s[0]; + + const uint32_t t = s[1] << 9; + + s[2] ^= s[0]; + s[3] ^= s[1]; + s[1] ^= s[2]; + s[0] ^= s[3]; + + s[2] ^= t; + + s[3] = rotl(s[3], 11); + + return result; + } + + + /* This is the jump function for the generator. It is equivalent + to 2^64 calls to next(); it can be used to generate 2^64 + non-overlapping subsequences for parallel computations. */ + + void jump(void) { + static const uint32_t JUMP[] = { 0x8764000b, 0xf542d2d3, 0x6fa035c3, 0x77f2db5b }; + + uint32_t s0 = 0; + uint32_t s1 = 0; + uint32_t s2 = 0; + uint32_t s3 = 0; + for (int i = 0; i < sizeof JUMP / sizeof * JUMP; i++) + for (int b = 0; b < 32; b++) { + if (JUMP[i] & UINT32_C(1) << b) { + s0 ^= s[0]; + s1 ^= s[1]; + s2 ^= s[2]; + s3 ^= s[3]; + } + next(); + } + + s[0] = s0; + s[1] = s1; + s[2] = s2; + s[3] = s3; + } + + + /* This is the long-jump function for the generator. It is equivalent to + 2^96 calls to next(); it can be used to generate 2^32 starting points, + from each of which jump() will generate 2^32 non-overlapping + subsequences for parallel distributed computations. */ + + void long_jump(void) { + static const uint32_t LONG_JUMP[] = { 0xb523952e, 0x0b6f099f, 0xccf5a0ef, 0x1c580662 }; + + uint32_t s0 = 0; + uint32_t s1 = 0; + uint32_t s2 = 0; + uint32_t s3 = 0; + for (int i = 0; i < sizeof LONG_JUMP / sizeof * LONG_JUMP; i++) + for (int b = 0; b < 32; b++) { + if (LONG_JUMP[i] & UINT32_C(1) << b) { + s0 ^= s[0]; + s1 ^= s[1]; + s2 ^= s[2]; + s3 ^= s[3]; + } + next(); + } + + s[0] = s0; + s[1] = s1; + s[2] = s2; + s[3] = s3; + } +}; \ No newline at end of file diff --git a/src/coreclr/vm/ClrEtwAll.man b/src/coreclr/vm/ClrEtwAll.man index 265d7a07726cf6..8309a0eea51979 100644 --- a/src/coreclr/vm/ClrEtwAll.man +++ b/src/coreclr/vm/ClrEtwAll.man @@ -91,6 +91,8 @@ message="$(string.RuntimePublisher.ProfilerKeywordMessage)" symbol="CLR_PROFILER_KEYWORD" /> + @@ -461,7 +463,13 @@ - + + + + + @@ -998,7 +1006,7 @@ - + - + + + @@ -3566,7 +3598,7 @@ keywords ="ThreadingKeyword" opcode="Wait" task="ThreadPoolWorkerThread" symbol="ThreadPoolWorkerThreadWait" message="$(string.RuntimePublisher.ThreadPoolWorkerThreadEventMessage)"/> - + @@ -4257,6 +4289,12 @@ task="WaitHandleWait" symbol="WaitHandleWaitStop" message="$(string.RuntimePublisher.WaitHandleWaitStopEventMessage)"/> + + + @@ -4372,14 +4410,14 @@ - + - + @@ -7297,7 +7335,7 @@ keywords="PrivateFusionKeyword" opcode="NgenBind" task="CLRNgenBinder" symbol="NgenBindEvent" message="$(string.PrivatePublisher.NgenBinderMessage)"/> - + - + - + - + - + - + + @@ -8659,7 +8698,7 @@ - + @@ -8791,6 +8830,7 @@ + @@ -9155,6 +9195,7 @@ + @@ -9287,7 +9328,7 @@ - + diff --git a/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm b/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm new file mode 100644 index 00000000000000..b5ee78274d7f14 --- /dev/null +++ b/src/coreclr/vm/amd64/JitHelpers_InlineGetThread.asm @@ -0,0 +1,263 @@ +; Licensed to the .NET Foundation under one or more agreements. +; The .NET Foundation licenses this file to you under the MIT license. + +; *********************************************************************** +; File: JitHelpers_InlineGetThread.asm, see history in jithelp.asm +; +; Notes: These routinues will be patched at runtime with the location in +; the TLS to find the Thread* and are the fastest implementation +; of their specific functionality. +; *********************************************************************** + +include AsmMacros.inc +include asmconstants.inc + +; Min amount of stack space that a nested function should allocate. +MIN_SIZE equ 28h + +JIT_NEW equ ?JIT_New@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@@Z +CopyValueClassUnchecked equ ?CopyValueClassUnchecked@@YAXPEAX0PEAVMethodTable@@@Z +JIT_Box equ ?JIT_Box@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@PEAX@Z +g_pStringClass equ ?g_pStringClass@@3PEAVMethodTable@@EA +FramedAllocateString equ ?FramedAllocateString@@YAPEAVStringObject@@K@Z +JIT_NewArr1 equ ?JIT_NewArr1@@YAPEAVObject@@PEAUCORINFO_CLASS_STRUCT_@@_J@Z + +INVALIDGCVALUE equ 0CCCCCCCDh + +extern JIT_NEW:proc +extern CopyValueClassUnchecked:proc +extern JIT_Box:proc +extern g_pStringClass:QWORD +extern FramedAllocateString:proc +extern JIT_NewArr1:proc + +extern JIT_InternalThrow:proc + +; IN: rcx: MethodTable* +; OUT: rax: new object +LEAF_ENTRY JIT_TrialAllocSFastMP_InlineGetThread, _TEXT + mov edx, [rcx + OFFSET__MethodTable__m_BaseSize] + + ; m_BaseSize is guaranteed to be a multiple of 8. + + INLINE_GETTHREAD r11 + mov r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit] + mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] + + add rdx, rax + + cmp rdx, r10 + ja AllocFailed + + mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx + mov [rax], rcx + + ret + + AllocFailed: + jmp JIT_NEW +LEAF_END JIT_TrialAllocSFastMP_InlineGetThread, _TEXT + +; HCIMPL2(Object*, JIT_Box, CORINFO_CLASS_HANDLE type, void* unboxedData) +NESTED_ENTRY JIT_BoxFastMP_InlineGetThread, _TEXT + + ; m_BaseSize is guaranteed to be a multiple of 8. + mov r8d, [rcx + OFFSET__MethodTable__m_BaseSize] + + INLINE_GETTHREAD r11 + mov r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit] + mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] + + add r8, rax + + cmp r8, r10 + ja AllocFailed + + test rdx, rdx + je NullRef + + mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8 + mov [rax], rcx + + ; Check whether the object contains pointers + test dword ptr [rcx + OFFSETOF__MethodTable__m_dwFlags], MethodTable__enum_flag_ContainsPointers + jnz ContainsPointers + + ; We have no pointers - emit a simple inline copy loop + ; Copy the contents from the end + mov ecx, [rcx + OFFSET__MethodTable__m_BaseSize] + sub ecx, 18h ; sizeof(ObjHeader) + sizeof(Object) + last slot + +align 16 + CopyLoop: + mov r8, [rdx+rcx] + mov [rax+rcx+8], r8 + sub ecx, 8 + jge CopyLoop + REPRET + + ContainsPointers: + ; Do call to CopyValueClassUnchecked(object, data, pMT) + push_vol_reg rax + alloc_stack 20h + END_PROLOGUE + + mov r8, rcx + lea rcx, [rax + 8] + call CopyValueClassUnchecked + + add rsp, 20h + pop rax + ret + + AllocFailed: + NullRef: + jmp JIT_Box +NESTED_END JIT_BoxFastMP_InlineGetThread, _TEXT + +LEAF_ENTRY AllocateStringFastMP_InlineGetThread, _TEXT + ; We were passed the number of characters in ECX + + ; we need to load the method table for string from the global + mov r9, [g_pStringClass] + + ; Instead of doing elaborate overflow checks, we just limit the number of elements + ; to (LARGE_OBJECT_SIZE - 256)/sizeof(WCHAR) or less. + ; This will avoid all overflow problems, as well as making sure + ; big string objects are correctly allocated in the big object heap. + + cmp ecx, (ASM_LARGE_OBJECT_SIZE - 256)/2 + jae OversizedString + + ; Calculate the final size to allocate. + ; We need to calculate baseSize + cnt*2, then round that up by adding 7 and anding ~7. + + lea edx, [STRING_BASE_SIZE + ecx*2 + 7] + and edx, -8 + + INLINE_GETTHREAD r11 + mov r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit] + mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] + + add rdx, rax + + cmp rdx, r10 + ja AllocFailed + + mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], rdx + mov [rax], r9 + + mov [rax + OFFSETOF__StringObject__m_StringLength], ecx + + ret + + OversizedString: + AllocFailed: + jmp FramedAllocateString +LEAF_END AllocateStringFastMP_InlineGetThread, _TEXT + +; HCIMPL2(Object*, JIT_NewArr1VC_MP_InlineGetThread, CORINFO_CLASS_HANDLE arrayMT, INT_PTR size) +LEAF_ENTRY JIT_NewArr1VC_MP_InlineGetThread, _TEXT + ; We were passed a (shared) method table in RCX, which contains the element type. + + ; The element count is in RDX + + ; NOTE: if this code is ported for CORINFO_HELP_NEWSFAST_ALIGN8, it will need + ; to emulate the double-specific behavior of JIT_TrialAlloc::GenAllocArray. + + ; Do a conservative check here. This is to avoid overflow while doing the calculations. We don't + ; have to worry about "large" objects, since the allocation quantum is never big enough for + ; LARGE_OBJECT_SIZE. + + ; For Value Classes, this needs to be 2^16 - slack (2^32 / max component size), + ; The slack includes the size for the array header and round-up ; for alignment. Use 256 for the + ; slack value out of laziness. + + ; In both cases we do a final overflow check after adding to the alloc_ptr. + + cmp rdx, (65535 - 256) + jae OversizedArray + + movzx r8d, word ptr [rcx + OFFSETOF__MethodTable__m_dwFlags] ; component size is low 16 bits + imul r8d, edx + add r8d, dword ptr [rcx + OFFSET__MethodTable__m_BaseSize] + + ; round the size to a multiple of 8 + + add r8d, 7 + and r8d, -8 + + + INLINE_GETTHREAD r11 + mov r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit] + mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] + + add r8, rax + jc AllocFailed + + cmp r8, r10 + ja AllocFailed + + mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8 + mov [rax], rcx + + mov dword ptr [rax + OFFSETOF__ArrayBase__m_NumComponents], edx + + ret + + OversizedArray: + AllocFailed: + jmp JIT_NewArr1 +LEAF_END JIT_NewArr1VC_MP_InlineGetThread, _TEXT + + +; HCIMPL2(Object*, JIT_NewArr1OBJ_MP_InlineGetThread, CORINFO_CLASS_HANDLE arrayMT, INT_PTR size) +LEAF_ENTRY JIT_NewArr1OBJ_MP_InlineGetThread, _TEXT + ; We were passed a (shared) method table in RCX, which contains the element type. + + ; The element count is in RDX + + ; NOTE: if this code is ported for CORINFO_HELP_NEWSFAST_ALIGN8, it will need + ; to emulate the double-specific behavior of JIT_TrialAlloc::GenAllocArray. + + ; Verifies that LARGE_OBJECT_SIZE fits in 32-bit. This allows us to do array size + ; arithmetic using 32-bit registers. + .erre ASM_LARGE_OBJECT_SIZE lt 100000000h + + cmp rdx, (ASM_LARGE_OBJECT_SIZE - 256)/8 ; sizeof(void*) + jae OversizedArray + + ; In this case we know the element size is sizeof(void *), or 8 for x64 + ; This helps us in two ways - we can shift instead of multiplying, and + ; there's no need to align the size either + + mov r8d, dword ptr [rcx + OFFSET__MethodTable__m_BaseSize] + lea r8d, [r8d + edx * 8] + + ; No need for rounding in this case - element size is 8, and m_BaseSize is guaranteed + ; to be a multiple of 8. + + INLINE_GETTHREAD r11 + mov r10, [r11 + OFFSET__Thread__m_alloc_context__combined_limit] + mov rax, [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr] + + add r8, rax + + cmp r8, r10 + ja AllocFailed + + mov [r11 + OFFSET__Thread__m_alloc_context__alloc_ptr], r8 + mov [rax], rcx + + mov dword ptr [rax + OFFSETOF__ArrayBase__m_NumComponents], edx + + ret + + OversizedArray: + AllocFailed: + jmp JIT_NewArr1 +LEAF_END JIT_NewArr1OBJ_MP_InlineGetThread, _TEXT + + + end + diff --git a/src/coreclr/vm/amd64/JitHelpers_Slow.asm b/src/coreclr/vm/amd64/JitHelpers_Slow.asm index 6d322248cdeeec..41a80794c97bbe 100644 --- a/src/coreclr/vm/amd64/JitHelpers_Slow.asm +++ b/src/coreclr/vm/amd64/JitHelpers_Slow.asm @@ -169,7 +169,7 @@ endif extern g_global_alloc_lock:dword -extern g_global_alloc_context:qword +extern g_global_ee_alloc_context:qword LEAF_ENTRY JIT_TrialAllocSFastSP, _TEXT @@ -180,15 +180,15 @@ LEAF_ENTRY JIT_TrialAllocSFastSP, _TEXT inc [g_global_alloc_lock] jnz JIT_NEW - mov rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr] ; alloc_ptr - mov r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit] ; limit_ptr + mov rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr] ; alloc_ptr + mov r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit add r8, rax cmp r8, r10 ja AllocFailed - mov qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8 ; update the alloc ptr + mov qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8 ; update the alloc ptr mov [rax], rcx mov [g_global_alloc_lock], -1 @@ -208,8 +208,8 @@ NESTED_ENTRY JIT_BoxFastUP, _TEXT inc [g_global_alloc_lock] jnz JIT_Box - mov rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr] ; alloc_ptr - mov r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit] ; limit_ptr + mov rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr] ; alloc_ptr + mov r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit add r8, rax @@ -219,7 +219,7 @@ NESTED_ENTRY JIT_BoxFastUP, _TEXT test rdx, rdx je NullRef - mov qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8 ; update the alloc ptr + mov qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8 ; update the alloc ptr mov [rax], rcx mov [g_global_alloc_lock], -1 @@ -287,15 +287,15 @@ LEAF_ENTRY AllocateStringFastUP, _TEXT inc [g_global_alloc_lock] jnz FramedAllocateString - mov rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr] ; alloc_ptr - mov r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit] ; limit_ptr + mov rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr] ; alloc_ptr + mov r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit add r8, rax cmp r8, r10 ja AllocFailed - mov qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8 ; update the alloc ptr + mov qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8 ; update the alloc ptr mov [rax], r11 mov [g_global_alloc_lock], -1 @@ -343,8 +343,8 @@ LEAF_ENTRY JIT_NewArr1VC_UP, _TEXT inc [g_global_alloc_lock] jnz JIT_NewArr1 - mov rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr] ; alloc_ptr - mov r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit] ; limit_ptr + mov rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr] ; alloc_ptr + mov r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit add r8, rax jc AllocFailed @@ -352,7 +352,7 @@ LEAF_ENTRY JIT_NewArr1VC_UP, _TEXT cmp r8, r10 ja AllocFailed - mov qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8 ; update the alloc ptr + mov qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8 ; update the alloc ptr mov [rax], rcx mov [g_global_alloc_lock], -1 @@ -396,15 +396,15 @@ LEAF_ENTRY JIT_NewArr1OBJ_UP, _TEXT inc [g_global_alloc_lock] jnz JIT_NewArr1 - mov rax, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr] ; alloc_ptr - mov r10, [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_limit] ; limit_ptr + mov rax, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr] ; alloc_ptr + mov r10, [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__combined_limit] ; combined_limit add r8, rax cmp r8, r10 ja AllocFailed - mov qword ptr [g_global_alloc_context + OFFSETOF__gc_alloc_context__alloc_ptr], r8 ; update the alloc ptr + mov qword ptr [g_global_ee_alloc_context + OFFSETOF__ee_alloc_context__alloc_ptr], r8 ; update the alloc ptr mov [rax], rcx mov [g_global_alloc_lock], -1 diff --git a/src/coreclr/vm/amd64/asmconstants.h b/src/coreclr/vm/amd64/asmconstants.h index 524e1fd40b7ae8..0eefdbbf9c5d97 100644 --- a/src/coreclr/vm/amd64/asmconstants.h +++ b/src/coreclr/vm/amd64/asmconstants.h @@ -111,11 +111,24 @@ ASMCONSTANTS_C_ASSERT(OFFSETOF__Thread__m_pFrame #define Thread_m_pFrame OFFSETOF__Thread__m_pFrame -#define OFFSETOF__gc_alloc_context__alloc_ptr 0x0 -ASMCONSTANT_OFFSETOF_ASSERT(gc_alloc_context, alloc_ptr); +// ---------------------------------- +// TODO: all these offsets are now invalid because the allocation context is now in a TLS instead of being relative to a Thread instance -#define OFFSETOF__gc_alloc_context__alloc_limit 0x8 -ASMCONSTANT_OFFSETOF_ASSERT(gc_alloc_context, alloc_limit); +#define OFFSET__Thread__m_alloc_context__alloc_ptr 0x50 +//ASMCONSTANTS_C_ASSERT(OFFSET__Thread__m_alloc_context__alloc_ptr == offsetof(Thread, m_alloc_context) + offsetof(ee_alloc_context, gc_alloc_context) + offsetof(gc_alloc_context, alloc_ptr)); + +#define OFFSET__Thread__m_alloc_context__combined_limit 0x48 +//ASMCONSTANTS_C_ASSERT(OFFSET__Thread__m_alloc_context__combined_limit == offsetof(Thread, m_alloc_context) + offsetof(ee_alloc_context, combined_limit)); + +#define OFFSETOF__ee_alloc_context__alloc_ptr 0x8 +//ASMCONSTANTS_C_ASSERT(OFFSETOF__ee_alloc_context__alloc_ptr == offsetof(ee_alloc_context, gc_alloc_context) + offsetof(gc_alloc_context, alloc_ptr)); + +// if we keep the ee_alloc_context idea, this should be the offset of the alloc_ptr (after the combined_limit field +#define OFFSETOF__gc_alloc_context__alloc_ptr 0x8 +// ---------------------------------- + +#define OFFSETOF__ee_alloc_context__combined_limit 0x0 +ASMCONSTANTS_C_ASSERT(OFFSETOF__ee_alloc_context__combined_limit == offsetof(ee_alloc_context, combined_limit)); #define OFFSETOF__ThreadExceptionState__m_pCurrentTracker 0x000 ASMCONSTANTS_C_ASSERT(OFFSETOF__ThreadExceptionState__m_pCurrentTracker diff --git a/src/coreclr/vm/common.h b/src/coreclr/vm/common.h index 92e9c5f1d58a6e..48630557f22aa2 100644 --- a/src/coreclr/vm/common.h +++ b/src/coreclr/vm/common.h @@ -159,6 +159,7 @@ typedef VPTR(class VirtualCallStubManager) PTR_VirtualCallStubManager; typedef VPTR(class VirtualCallStubManagerManager) PTR_VirtualCallStubManagerManager; typedef VPTR(class IGCHeap) PTR_IGCHeap; typedef VPTR(class ModuleBase) PTR_ModuleBase; +typedef DPTR(struct gc_alloc_context) PTR_gc_alloc_context; // // _UNCHECKED_OBJECTREF is for code that can't deal with DEBUG OBJECTREFs diff --git a/src/coreclr/vm/comutilnative.cpp b/src/coreclr/vm/comutilnative.cpp index a281ac7505d089..eca0a8b80803b0 100644 --- a/src/coreclr/vm/comutilnative.cpp +++ b/src/coreclr/vm/comutilnative.cpp @@ -848,7 +848,7 @@ FCIMPL0(INT64, GCInterface::GetAllocatedBytesForCurrentThread) INT64 currentAllocated = 0; Thread *pThread = GetThread(); - gc_alloc_context* ac = &t_runtime_thread_locals.alloc_context; + gc_alloc_context* ac = &t_runtime_thread_locals.alloc_context.gc_allocation_context; currentAllocated = ac->alloc_bytes + ac->alloc_bytes_uoh - (ac->alloc_limit - ac->alloc_ptr); return currentAllocated; diff --git a/src/coreclr/vm/gccover.cpp b/src/coreclr/vm/gccover.cpp index b7ae97613d507d..ab564c6ba17730 100644 --- a/src/coreclr/vm/gccover.cpp +++ b/src/coreclr/vm/gccover.cpp @@ -1834,7 +1834,7 @@ void DoGcStress (PCONTEXT regs, NativeCodeVersion nativeCodeVersion) // BUG(github #10318) - when not using allocation contexts, the alloc lock // must be acquired here. Until fixed, this assert prevents random heap corruption. assert(GCHeapUtilities::UseThreadAllocationContexts()); - GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context); + GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context); // StressHeap can exit early w/o forcing a SuspendEE to trigger the instruction update // We can not rely on the return code to determine if the instruction update happened diff --git a/src/coreclr/vm/gcenv.ee.cpp b/src/coreclr/vm/gcenv.ee.cpp index 852c655cf9591e..038267e80ea100 100644 --- a/src/coreclr/vm/gcenv.ee.cpp +++ b/src/coreclr/vm/gcenv.ee.cpp @@ -443,7 +443,34 @@ gc_alloc_context * GCToEEInterface::GetAllocContext() return nullptr; } - return &t_runtime_thread_locals.alloc_context; + return &t_runtime_thread_locals.alloc_context.gc_allocation_context; +} + +void InvokeGCAllocCallback(ee_alloc_context* pEEAllocContext, enum_alloc_context_func* fn, void* param) +{ + // NOTE: Its possible that alloc_ptr = alloc_limit = combined_limit = NULL at this point + gc_alloc_context* pAllocContext = &pEEAllocContext->gc_allocation_context; + + // The allocation context might be modified by the callback, so we need to save + // the remaining sampling budget and restore it after the callback if needed. + size_t currentSamplingBudget = (size_t)(pEEAllocContext->combined_limit - pAllocContext->alloc_ptr); + size_t currentSize = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr); + + fn(pAllocContext, param); + + // If the GC changed the size of the allocation context, we need to recompute the sampling limit + // This includes the case where the AC was initially zero-sized/uninitialized. + // Functionally we'd get valid results if we called UpdateCombinedLimit() unconditionally but its + // empirically a little more performant to only call it when the AC size has changed. + if (currentSize != (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr)) + { + pEEAllocContext->UpdateCombinedLimit(); + } + else + { + // Restore the remaining sampling budget as the size is the same. + pEEAllocContext->combined_limit = pAllocContext->alloc_ptr + currentSamplingBudget; + } } void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* param) @@ -460,16 +487,12 @@ void GCToEEInterface::GcEnumAllocContexts(enum_alloc_context_func* fn, void* par Thread * pThread = NULL; while ((pThread = ThreadStore::GetThreadList(pThread)) != NULL) { - gc_alloc_context* palloc_context = pThread->GetAllocContext(); - if (palloc_context != nullptr) - { - fn(palloc_context, param); - } + InvokeGCAllocCallback(pThread->GetEEAllocContext(), fn, param); } } else { - fn(&g_global_alloc_context, param); + InvokeGCAllocCallback(&g_global_ee_alloc_context, fn, param); } } diff --git a/src/coreclr/vm/gcheaputilities.cpp b/src/coreclr/vm/gcheaputilities.cpp index cd0259eef45d83..65d47130765044 100644 --- a/src/coreclr/vm/gcheaputilities.cpp +++ b/src/coreclr/vm/gcheaputilities.cpp @@ -41,7 +41,10 @@ bool g_sw_ww_enabled_for_gc_heap = false; #endif // FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP -GVAL_IMPL_INIT(gc_alloc_context, g_global_alloc_context, {}); +ee_alloc_context g_global_ee_alloc_context = {}; +GPTR_IMPL_INIT(gc_alloc_context, g_global_alloc_context, &(g_global_ee_alloc_context.gc_allocation_context)); + +thread_local ee_alloc_context::CLRRandomHolder ee_alloc_context::t_instance = CLRRandomHolder(); enum GC_LOAD_STATUS { GC_LOAD_STATUS_BEFORE_START, diff --git a/src/coreclr/vm/gcheaputilities.h b/src/coreclr/vm/gcheaputilities.h index c652cc52bf417c..b558a1a7f18712 100644 --- a/src/coreclr/vm/gcheaputilities.h +++ b/src/coreclr/vm/gcheaputilities.h @@ -4,7 +4,12 @@ #ifndef _GCHEAPUTILITIES_H_ #define _GCHEAPUTILITIES_H_ +#include "eventtracebase.h" #include "gcinterface.h" +#include "math.h" + +// TODO: trying to use Thread members but compilation errors +// #include "threads.h" // The singular heap instance. GPTR_DECL(IGCHeap, g_pGCHeap); @@ -12,6 +17,113 @@ GPTR_DECL(IGCHeap, g_pGCHeap); #ifndef DACCESS_COMPILE extern "C" { #endif // !DACCESS_COMPILE + + +const DWORD SamplingDistributionMean = (100 * 1024); + +// This struct adds some state that is only visible to the EE onto the standard gc_alloc_context +typedef struct _ee_alloc_context +{ + // Any allocation that would overlap combined_limit needs to be handled by the allocation slow path. + // combined_limit is the minimum of: + // - gc_alloc_context.alloc_limit (the end of the current AC) + // - the sampling_limit + // + // In the simple case that randomized sampling is disabled, combined_limit is always equal to alloc_limit. + // + // There are two different useful interpretations for the sampling_limit. One is to treat the sampling_limit + // as an address and when we allocate an object that overlaps that address we should emit a sampling event. + // The other is that we can treat (sampling_limit - alloc_ptr) as a budget of how many bytes we can allocate + // before emitting a sampling event. If we always allocated objects contiguously in the AC and incremented + // alloc_ptr by the size of the object, these two interpretations would be equivalent. However, when objects + // don't fit in the AC we allocate them in some other address range. The budget interpretation is more + // flexible to handle those cases. + // + // The sampling limit isn't stored in any separate field explicitly, instead it is implied: + // - if combined_limit == alloc_limit there is no sampled byte in the AC. In the budget interpretation + // we can allocate (alloc_limit - alloc_ptr) unsampled bytes. We'll need a new random number after + // that to determine whether future allocated bytes should be sampled. + // This occurs either because the sampling feature is disabled, or because the randomized selection + // of sampled bytes didn't select a byte in this AC. + // - if combined_limit < alloc_limit there is a sample limit in the AC. sample_limit = combined_limit. + uint8_t* combined_limit; + gc_alloc_context gc_allocation_context; + + public: + void init() + { + LIMITED_METHOD_CONTRACT; + combined_limit = nullptr; + gc_allocation_context.init(); + } + + static inline bool IsRandomizedSamplingEnabled() + { +#ifdef FEATURE_EVENT_TRACE + return ETW_TRACING_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context, + TRACE_LEVEL_INFORMATION, + CLR_ALLOCATIONSAMPLING_KEYWORD); +#else + return false; +#endif // FEATURE_EVENT_TRACE + } + + // Regenerate the randomized sampling limit and update the combined_limit field. + inline void UpdateCombinedLimit() + { + UpdateCombinedLimit(IsRandomizedSamplingEnabled()); + } + + inline void UpdateCombinedLimit(bool samplingEnabled) + { + if (!samplingEnabled) + { + combined_limit = gc_allocation_context.alloc_limit; + } + else + { + // compute the next sampling limit based on a geometric distribution + uint8_t* sampling_limit = gc_allocation_context.alloc_ptr + ComputeGeometricRandom(); + + // if the sampling limit is larger than the allocation context, no sampling will occur in this AC + combined_limit = Min(sampling_limit, gc_allocation_context.alloc_limit); + } + } + + static inline int ComputeGeometricRandom() + { + // compute a random sample from the Geometric distribution + double probability = GetRandomizer()->NextDouble(); + int threshold = (int)(-log(1 - probability) * SamplingDistributionMean); + return threshold; + } + +// per thread lazily allocated randomizer + struct CLRRandomHolder + { + CLRRandom* _p; + + CLRRandomHolder() + { + _p = new CLRRandom(); + _p->Init(); + } + + ~CLRRandomHolder() + { + delete _p; + } + }; + + static thread_local CLRRandomHolder t_instance; + +public: + static inline CLRRandom* GetRandomizer() + { + return t_instance._p; + } +} ee_alloc_context; + GPTR_DECL(uint8_t,g_lowest_address); GPTR_DECL(uint8_t,g_highest_address); GPTR_DECL(uint32_t,g_card_table); @@ -21,7 +133,11 @@ GVAL_DECL(GCHeapType, g_heap_type); // for all allocations. In order to avoid extra indirections in assembly // allocation helpers, the EE owns the global allocation context and the // GC will update it when it needs to. -GVAL_DECL(gc_alloc_context, g_global_alloc_context); +extern "C" ee_alloc_context g_global_ee_alloc_context; + +// This is a pointer into the g_global_ee_alloc_context for the GC visible +// subset of the data +GPTR_DECL(gc_alloc_context, g_global_alloc_context); #ifndef DACCESS_COMPILE } #endif // !DACCESS_COMPILE diff --git a/src/coreclr/vm/gchelpers.cpp b/src/coreclr/vm/gchelpers.cpp index 335bd3cb25caba..ce9a3bec72dfa7 100644 --- a/src/coreclr/vm/gchelpers.cpp +++ b/src/coreclr/vm/gchelpers.cpp @@ -40,7 +40,7 @@ // //======================================================================== -inline gc_alloc_context* GetThreadAllocContext() +inline ee_alloc_context* GetThreadAllocContext() { WRAPPER_NO_CONTRACT; @@ -183,6 +183,116 @@ inline void CheckObjectSize(size_t alloc_size) } } +inline void FireAllocationSampled(GC_ALLOC_FLAGS flags, size_t size, size_t samplingBudgetOffset, Object* orObject) +{ + // Note: this code is duplicated from GCToCLREventSink::FireGCAllocationTick_V4 + void* typeId = nullptr; + const WCHAR* name = nullptr; + InlineSString strTypeName; + EX_TRY + { + TypeHandle th = GetThread()->GetTHAllocContextObj(); + + if (th != 0) + { + th.GetName(strTypeName); + name = strTypeName.GetUnicode(); + typeId = th.GetMethodTable(); + } + } + EX_CATCH{} + EX_END_CATCH(SwallowAllExceptions) + // end of duplication + + if (typeId != nullptr) + { + unsigned int allocKind = + (flags & GC_ALLOC_PINNED_OBJECT_HEAP) ? 2 : + (flags & GC_ALLOC_LARGE_OBJECT_HEAP) ? 1 : + 0; // SOH + unsigned int heapIndex = 0; +#ifdef BACKGROUND_GC + gc_heap* hp = gc_heap::heap_of((BYTE*)orObject); + heapIndex = hp->heap_number; +#endif + FireEtwAllocationSampled(allocKind, GetClrInstanceId(), typeId, name, heapIndex, (BYTE*)orObject, size, samplingBudgetOffset); + } +} + +inline Object* Alloc(ee_alloc_context* pEEAllocContext, size_t size, GC_ALLOC_FLAGS flags) +{ + CONTRACTL { + THROWS; + GC_TRIGGERS; + MODE_COOPERATIVE; // returns an objref without pinning it => cooperative + } CONTRACTL_END; + + Object* retVal = nullptr; + gc_alloc_context* pAllocContext = &pEEAllocContext->gc_allocation_context; + auto pCurrentThread = GetThread(); + + bool isSampled = false; + size_t availableSpace = 0; + size_t aligned_size = 0; + size_t samplingBudget = 0; + bool isRandomizedSamplingEnabled = ee_alloc_context::IsRandomizedSamplingEnabled(); + if (isRandomizedSamplingEnabled) + { + // object allocations are always padded up to pointer size + aligned_size = AlignUp(size, sizeof(uintptr_t)); + + // The number bytes we can allocate before we need to emit a sampling event. + // This calculation is only valid if combined_limit < alloc_limit. + samplingBudget = (size_t)(pEEAllocContext->combined_limit - pAllocContext->alloc_ptr); + + // The number of bytes available in the current allocation context + availableSpace = (size_t)(pAllocContext->alloc_limit - pAllocContext->alloc_ptr); + + // Check to see if the allocated object overlaps a sampled byte + // in this AC. This happens when both: + // 1) The AC contains a sampled byte (combined_limit < alloc_limit) + // 2) The object is large enough to overlap it (samplingBudget < aligned_size) + // + // Note that the AC could have no remaining space for allocations (alloc_ptr = + // alloc_limit = combined_limit). When a thread hasn't done any SOH allocations + // yet it also starts in an empty state where alloc_ptr = alloc_limit = + // combined_limit = nullptr. The (1) check handles both of these situations + // properly as an empty AC can not have a sampled byte inside of it. + isSampled = + (pEEAllocContext->combined_limit < pAllocContext->alloc_limit) && + (samplingBudget < aligned_size); + + // if the object overflows the AC, we need to sample the remaining bytes + // the sampling budget only included at most the bytes inside the AC + if (aligned_size > availableSpace && !isSampled) + { + samplingBudget = ee_alloc_context::ComputeGeometricRandom() + availableSpace; + isSampled = (samplingBudget < aligned_size); + } + } + + GCStress::MaybeTrigger(pAllocContext); + + // for SOH, if there is enough space in the current allocation context, then + // the allocation will be done in place (like in the fast path), + // otherwise a new allocation context will be provided + retVal = GCHeapUtilities::GetGCHeap()->Alloc(pAllocContext, size, flags); + + if (isSampled) + { + FireAllocationSampled(flags, aligned_size, samplingBudget, retVal); + } + + // There are a variety of conditions that may have invalidated the previous combined_limit value + // such as not allocating the object in the AC memory region (UOH allocations), moving the AC, adding + // extra alignment padding, allocating a new AC, or allocating an object that consumed the sampling budget. + // Rather than test for all the different invalidation conditions individually we conservatively always + // recompute it. If sampling isn't enabled this inlined function is just trivially setting + // combined_limit=alloc_limit. + pEEAllocContext->UpdateCombinedLimit(isRandomizedSamplingEnabled); + + return retVal; +} // There are only two ways to allocate an object. // * Call optimized helpers that were generated on the fly. This is how JIT compiled code does most @@ -222,16 +332,12 @@ inline Object* Alloc(size_t size, GC_ALLOC_FLAGS flags) if (GCHeapUtilities::UseThreadAllocationContexts()) { - gc_alloc_context *threadContext = GetThreadAllocContext(); - GCStress::MaybeTrigger(threadContext); - retVal = GCHeapUtilities::GetGCHeap()->Alloc(threadContext, size, flags); + retVal = Alloc(GetThreadAllocContext(), size, flags); } else { GlobalAllocLockHolder holder(&g_global_alloc_lock); - gc_alloc_context *globalContext = &g_global_alloc_context; - GCStress::MaybeTrigger(globalContext); - retVal = GCHeapUtilities::GetGCHeap()->Alloc(globalContext, size, flags); + retVal = Alloc(&g_global_ee_alloc_context, size, flags); } @@ -424,70 +530,26 @@ OBJECTREF AllocateSzArray(MethodTable* pArrayMT, INT32 cElements, GC_ALLOC_FLAGS } else { -#ifndef FEATURE_64BIT_ALIGNMENT - if ((DATA_ALIGNMENT < sizeof(double)) && (pArrayMT->GetArrayElementType() == ELEMENT_TYPE_R8) && - (totalSize < GCHeapUtilities::GetGCHeap()->GetLOHThreshold() - MIN_OBJECT_SIZE)) +#ifdef FEATURE_DOUBLE_ALIGNMENT_HINT + if (pArrayMT->GetArrayElementType() == ELEMENT_TYPE_R8) { - // Creation of an array of doubles, not in the large object heap. - // We want to align the doubles to 8 byte boundaries, but the GC gives us pointers aligned - // to 4 bytes only (on 32 bit platforms). To align, we ask for 12 bytes more to fill with a - // dummy object. - // If the GC gives us a 8 byte aligned address, we use it for the array and place the dummy - // object after the array, otherwise we put the dummy object first, shifting the base of - // the array to an 8 byte aligned address. Also, we need to make sure that the syncblock of the - // second object is zeroed. GC won't take care of zeroing it out with GC_ALLOC_ZEROING_OPTIONAL. - // - // Note: on 64 bit platforms, the GC always returns 8 byte aligned addresses, and we don't - // execute this code because DATA_ALIGNMENT < sizeof(double) is false. - - _ASSERTE(DATA_ALIGNMENT == sizeof(double) / 2); - _ASSERTE((MIN_OBJECT_SIZE % sizeof(double)) == DATA_ALIGNMENT); // used to change alignment - _ASSERTE(pArrayMT->GetComponentSize() == sizeof(double)); - _ASSERTE(g_pObjectClass->GetBaseSize() == MIN_OBJECT_SIZE); - _ASSERTE(totalSize < totalSize + MIN_OBJECT_SIZE); - orArray = (ArrayBase*)Alloc(totalSize + MIN_OBJECT_SIZE, flags); - - Object* orDummyObject; - if (((size_t)orArray % sizeof(double)) != 0) - { - orDummyObject = orArray; - orArray = (ArrayBase*)((size_t)orArray + MIN_OBJECT_SIZE); - if (flags & GC_ALLOC_ZEROING_OPTIONAL) - { - // clean the syncblock of the aligned array. - *(((void**)orArray)-1) = 0; - } - } - else - { - orDummyObject = (Object*)((size_t)orArray + totalSize); - if (flags & GC_ALLOC_ZEROING_OPTIONAL) - { - // clean the syncblock of the dummy object. - *(((void**)orDummyObject)-1) = 0; - } - } - _ASSERTE(((size_t)orArray % sizeof(double)) == 0); - orDummyObject->SetMethodTable(g_pObjectClass); + flags |= GC_ALLOC_ALIGN8; } - else -#endif // FEATURE_64BIT_ALIGNMENT - { -#ifdef FEATURE_64BIT_ALIGNMENT - MethodTable* pElementMT = pArrayMT->GetArrayElementTypeHandle().GetMethodTable(); - if (pElementMT->RequiresAlign8() && pElementMT->IsValueType()) - { - // This platform requires that certain fields are 8-byte aligned (and the runtime doesn't provide - // this guarantee implicitly, e.g. on 32-bit platforms). Since it's the array payload, not the - // header that requires alignment we need to be careful. However it just so happens that all the - // cases we care about (single and multi-dim arrays of value types) have an even number of DWORDs - // in their headers so the alignment requirements for the header and the payload are the same. - _ASSERTE(((pArrayMT->GetBaseSize() - SIZEOF_OBJHEADER) & 7) == 0); - flags |= GC_ALLOC_ALIGN8; - } #endif - orArray = (ArrayBase*)Alloc(totalSize, flags); +#ifdef FEATURE_64BIT_ALIGNMENT + MethodTable* pElementMT = pArrayMT->GetArrayElementTypeHandle().GetMethodTable(); + if (pElementMT->RequiresAlign8() && pElementMT->IsValueType()) + { + // This platform requires that certain fields are 8-byte aligned (and the runtime doesn't provide + // this guarantee implicitly, e.g. on 32-bit platforms). Since it's the array payload, not the + // header that requires alignment we need to be careful. However it just so happens that all the + // cases we care about (single and multi-dim arrays of value types) have an even number of DWORDs + // in their headers so the alignment requirements for the header and the payload are the same. + _ASSERTE(((pArrayMT->GetBaseSize() - SIZEOF_OBJHEADER) & 7) == 0); + flags |= GC_ALLOC_ALIGN8; } +#endif + orArray = (ArrayBase*)Alloc(totalSize, flags); orArray->SetMethodTable(pArrayMT); } diff --git a/src/coreclr/vm/gcstress.h b/src/coreclr/vm/gcstress.h index 23b11d9989fcf6..a5626da1b6961c 100644 --- a/src/coreclr/vm/gcstress.h +++ b/src/coreclr/vm/gcstress.h @@ -298,7 +298,7 @@ namespace _GCStress // BUG(github #10318) - when not using allocation contexts, the alloc lock // must be acquired here. Until fixed, this assert prevents random heap corruption. _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts()); - GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context); + GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context); } FORCEINLINE diff --git a/src/coreclr/vm/gctoclreventsink.cpp b/src/coreclr/vm/gctoclreventsink.cpp index fff929d51567a5..ce75e4cc661830 100644 --- a/src/coreclr/vm/gctoclreventsink.cpp +++ b/src/coreclr/vm/gctoclreventsink.cpp @@ -162,6 +162,16 @@ void GCToCLREventSink::FireGCAllocationTick_V4(uint64_t allocationAmount, { LIMITED_METHOD_CONTRACT; +#ifdef FEATURE_EVENT_TRACE + if (ETW_TRACING_CATEGORY_ENABLED(MICROSOFT_WINDOWS_DOTNETRUNTIME_PROVIDER_DOTNET_Context, + TRACE_LEVEL_INFORMATION, + CLR_ALLOCATIONSAMPLING_KEYWORD)) + { + // skip AllocationTick if AllocationSampled is emitted + return; + } +#endif // FEATURE_EVENT_TRACE + void * typeId = nullptr; const WCHAR * name = nullptr; InlineSString strTypeName; diff --git a/src/coreclr/vm/i386/jitinterfacex86.cpp b/src/coreclr/vm/i386/jitinterfacex86.cpp index 3807b00a8ca6e1..ecad50a00a8644 100644 --- a/src/coreclr/vm/i386/jitinterfacex86.cpp +++ b/src/coreclr/vm/i386/jitinterfacex86.cpp @@ -237,8 +237,9 @@ void JIT_TrialAlloc::EmitCore(CPUSTUBLINKER *psl, CodeLabel *noLock, CodeLabel * if (flags & (ALIGN8 | SIZE_IN_EAX | ALIGN8OBJ)) { - // MOV EBX, [edx]gc_alloc_context.alloc_ptr - psl->X86EmitOffsetModRM(0x8B, kEBX, kEDX, offsetof(gc_alloc_context, alloc_ptr)); + // MOV EBX, [edx]alloc_context.gc_allocation_context.alloc_ptr + psl->X86EmitOffsetModRM(0x8B, kEBX, kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr)); + // add EAX, EBX psl->Emit16(0xC303); if (flags & ALIGN8) @@ -246,20 +247,20 @@ void JIT_TrialAlloc::EmitCore(CPUSTUBLINKER *psl, CodeLabel *noLock, CodeLabel * } else { - // add eax, [edx]gc_alloc_context.alloc_ptr - psl->X86EmitOffsetModRM(0x03, kEAX, kEDX, offsetof(gc_alloc_context, alloc_ptr)); + // add eax, [edx]alloc_context.gc_allocation_context.alloc_ptr + psl->X86EmitOffsetModRM(0x03, kEAX, kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr)); } - // cmp eax, [edx]gc_alloc_context.alloc_limit - psl->X86EmitOffsetModRM(0x3b, kEAX, kEDX, offsetof(gc_alloc_context, alloc_limit)); + // cmp eax, [edx]alloc_context.combined_limit + psl->X86EmitOffsetModRM(0x3b, kEAX, kEDX, offsetof(ee_alloc_context, combined_limit)); // ja noAlloc psl->X86EmitCondJump(noAlloc, X86CondCode::kJA); // Fill in the allocation and get out. - // mov [edx]gc_alloc_context.alloc_ptr, eax - psl->X86EmitIndexRegStore(kEDX, offsetof(gc_alloc_context, alloc_ptr), kEAX); + // mov [edx]alloc_context.gc_allocation_context.alloc_ptr, eax + psl->X86EmitIndexRegStore(kEDX, offsetof(ee_alloc_context, gc_allocation_context) + offsetof(gc_alloc_context, alloc_ptr), kEAX); if (flags & (ALIGN8 | SIZE_IN_EAX | ALIGN8OBJ)) { diff --git a/src/coreclr/vm/i386/stublinkerx86.cpp b/src/coreclr/vm/i386/stublinkerx86.cpp index cfe9eec74af2e5..2a3cbc765dfc52 100644 --- a/src/coreclr/vm/i386/stublinkerx86.cpp +++ b/src/coreclr/vm/i386/stublinkerx86.cpp @@ -2432,7 +2432,7 @@ VOID StubLinkerCPU::X86EmitCurrentThreadFetch(X86Reg dstreg, unsigned preservedR #ifdef TARGET_UNIX namespace { - gc_alloc_context* STDCALL GetAllocContextHelper() + ee_alloc_context* STDCALL GetAllocContextHelper() { return &t_runtime_thread_locals.alloc_context; } diff --git a/src/coreclr/vm/jithelpers.cpp b/src/coreclr/vm/jithelpers.cpp index 1bfeaf2b039289..b039b76a55c046 100644 --- a/src/coreclr/vm/jithelpers.cpp +++ b/src/coreclr/vm/jithelpers.cpp @@ -1668,7 +1668,8 @@ HCIMPL1_RAW(Object*, JIT_NewS_MP_FastPortable, CORINFO_CLASS_HANDLE typeHnd_) } CONTRACTL_END; _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts()); - gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context; + ee_alloc_context *eeAllocContext = &t_runtime_thread_locals.alloc_context; + gc_alloc_context *allocContext = &eeAllocContext->gc_allocation_context; TypeHandle typeHandle(typeHnd_); _ASSERTE(!typeHandle.IsTypeDesc()); // heap objects must have method tables @@ -1678,13 +1679,15 @@ HCIMPL1_RAW(Object*, JIT_NewS_MP_FastPortable, CORINFO_CLASS_HANDLE typeHnd_) _ASSERTE(size % DATA_ALIGNMENT == 0); BYTE *allocPtr = allocContext->alloc_ptr; - _ASSERTE(allocPtr <= allocContext->alloc_limit); - if (size > static_cast(allocContext->alloc_limit - allocPtr)) + _ASSERTE(allocPtr <= eeAllocContext->combined_limit); + if ((allocPtr == nullptr) || (size > static_cast(eeAllocContext->combined_limit - allocPtr))) { // Tail call to the slow helper return HCCALL1(JIT_New, typeHnd_); } + _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit); + allocContext->alloc_ptr = allocPtr + size; _ASSERTE(allocPtr != nullptr); @@ -1785,7 +1788,8 @@ HCIMPL1_RAW(StringObject*, AllocateString_MP_FastPortable, DWORD stringLength) return HCCALL1(FramedAllocateString, stringLength); } - gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context; + ee_alloc_context *eeAllocContext = &t_runtime_thread_locals.alloc_context; + gc_alloc_context *allocContext = &eeAllocContext->gc_allocation_context; SIZE_T totalSize = StringObject::GetSize(stringLength); @@ -1798,12 +1802,15 @@ HCIMPL1_RAW(StringObject*, AllocateString_MP_FastPortable, DWORD stringLength) totalSize = alignedTotalSize; BYTE *allocPtr = allocContext->alloc_ptr; - _ASSERTE(allocPtr <= allocContext->alloc_limit); - if (totalSize > static_cast(allocContext->alloc_limit - allocPtr)) + _ASSERTE(allocPtr <= eeAllocContext->combined_limit); + if ((allocPtr == nullptr) || (totalSize > static_cast(eeAllocContext->combined_limit - allocPtr))) { // Tail call to the slow helper return HCCALL1(FramedAllocateString, stringLength); } + + _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit); + allocContext->alloc_ptr = allocPtr + totalSize; _ASSERTE(allocPtr != nullptr); @@ -1901,7 +1908,8 @@ HCIMPL2_RAW(Object*, JIT_NewArr1VC_MP_FastPortable, CORINFO_CLASS_HANDLE arrayMT return HCCALL2(JIT_NewArr1, arrayMT, size); } - gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context; + ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context; + gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context; MethodTable *pArrayMT = (MethodTable *)arrayMT; @@ -1919,12 +1927,15 @@ HCIMPL2_RAW(Object*, JIT_NewArr1VC_MP_FastPortable, CORINFO_CLASS_HANDLE arrayMT totalSize = alignedTotalSize; BYTE *allocPtr = allocContext->alloc_ptr; - _ASSERTE(allocPtr <= allocContext->alloc_limit); - if (totalSize > static_cast(allocContext->alloc_limit - allocPtr)) + _ASSERTE(allocPtr <= eeAllocContext->combined_limit); + if ((allocPtr == nullptr) || (totalSize > static_cast(eeAllocContext->combined_limit - allocPtr))) { // Tail call to the slow helper return HCCALL2(JIT_NewArr1, arrayMT, size); } + + _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit); + allocContext->alloc_ptr = allocPtr + totalSize; _ASSERTE(allocPtr != nullptr); @@ -1970,14 +1981,18 @@ HCIMPL2_RAW(Object*, JIT_NewArr1OBJ_MP_FastPortable, CORINFO_CLASS_HANDLE arrayM _ASSERTE(ALIGN_UP(totalSize, DATA_ALIGNMENT) == totalSize); - gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context; + ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context; + gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context; BYTE *allocPtr = allocContext->alloc_ptr; - _ASSERTE(allocPtr <= allocContext->alloc_limit); - if (totalSize > static_cast(allocContext->alloc_limit - allocPtr)) + _ASSERTE(allocPtr <= eeAllocContext->combined_limit); + if ((allocPtr == nullptr) || (totalSize > static_cast(eeAllocContext->combined_limit - allocPtr))) { // Tail call to the slow helper return HCCALL2(JIT_NewArr1, arrayMT, size); } + + _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit); + allocContext->alloc_ptr = allocPtr + totalSize; _ASSERTE(allocPtr != nullptr); @@ -2120,7 +2135,8 @@ HCIMPL2_RAW(Object*, JIT_Box_MP_FastPortable, CORINFO_CLASS_HANDLE type, void* u } _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts()); - gc_alloc_context *allocContext = &t_runtime_thread_locals.alloc_context; + ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context; + gc_alloc_context* allocContext = &eeAllocContext->gc_allocation_context; TypeHandle typeHandle(type); _ASSERTE(!typeHandle.IsTypeDesc()); // heap objects must have method tables @@ -2139,13 +2155,15 @@ HCIMPL2_RAW(Object*, JIT_Box_MP_FastPortable, CORINFO_CLASS_HANDLE type, void* u _ASSERTE(size % DATA_ALIGNMENT == 0); BYTE *allocPtr = allocContext->alloc_ptr; - _ASSERTE(allocPtr <= allocContext->alloc_limit); - if (size > static_cast(allocContext->alloc_limit - allocPtr)) + _ASSERTE(allocPtr <= eeAllocContext->combined_limit); + if ((allocPtr == nullptr) || (size > static_cast(eeAllocContext->combined_limit - allocPtr))) { // Tail call to the slow helper return HCCALL2(JIT_Box, type, unboxedData); } + _ASSERTE(eeAllocContext->combined_limit <= allocContext->alloc_limit); + allocContext->alloc_ptr = allocPtr + size; _ASSERTE(allocPtr != nullptr); diff --git a/src/coreclr/vm/threads.cpp b/src/coreclr/vm/threads.cpp index f98a5cf58a2251..723aa1e90fd4b5 100644 --- a/src/coreclr/vm/threads.cpp +++ b/src/coreclr/vm/threads.cpp @@ -2766,8 +2766,8 @@ void Thread::CooperativeCleanup() // If the GC heap is initialized, we need to fix the alloc context for this detaching thread. // GetTotalAllocatedBytes reads dead_threads_non_alloc_bytes, but will suspend EE, being in COOP mode we cannot race with that // however, there could be other threads terminating and doing the same Add. - InterlockedExchangeAdd64((LONG64*)&dead_threads_non_alloc_bytes, t_runtime_thread_locals.alloc_context.alloc_limit - t_runtime_thread_locals.alloc_context.alloc_ptr); - GCHeapUtilities::GetGCHeap()->FixAllocContext(&t_runtime_thread_locals.alloc_context, NULL, NULL); + InterlockedExchangeAdd64((LONG64*)&dead_threads_non_alloc_bytes, t_runtime_thread_locals.alloc_context.gc_allocation_context.alloc_limit - t_runtime_thread_locals.alloc_context.gc_allocation_context.alloc_ptr); + GCHeapUtilities::GetGCHeap()->FixAllocContext(&t_runtime_thread_locals.alloc_context.gc_allocation_context, NULL, NULL); t_runtime_thread_locals.alloc_context.init(); // re-initialize the context. // Clear out the alloc context pointer for this thread. When TLS is gone, this pointer will point into freed memory. diff --git a/src/coreclr/vm/threads.h b/src/coreclr/vm/threads.h index 429031cf5493a1..5155097f2c9a3c 100644 --- a/src/coreclr/vm/threads.h +++ b/src/coreclr/vm/threads.h @@ -453,7 +453,7 @@ struct RuntimeThreadLocals { // on MP systems, each thread has its own allocation chunk so we can avoid // lock prefixes and expensive MP cache snooping stuff - gc_alloc_context alloc_context; + ee_alloc_context alloc_context; }; #ifdef _MSC_VER @@ -971,7 +971,14 @@ class Thread public: inline void InitRuntimeThreadLocals() { LIMITED_METHOD_CONTRACT; m_pRuntimeThreadLocals = PTR_RuntimeThreadLocals(&t_runtime_thread_locals); } - inline PTR_gc_alloc_context GetAllocContext() { LIMITED_METHOD_CONTRACT; return PTR_gc_alloc_context(&m_pRuntimeThreadLocals->alloc_context); } + inline ee_alloc_context *GetEEAllocContext() { LIMITED_METHOD_CONTRACT; return &m_pRuntimeThreadLocals->alloc_context; } + inline PTR_gc_alloc_context GetAllocContext() + { + LIMITED_METHOD_CONTRACT; + return (m_pRuntimeThreadLocals == nullptr) + ? nullptr + : PTR_gc_alloc_context(&m_pRuntimeThreadLocals->alloc_context.gc_allocation_context); + } // This is the type handle of the first object in the alloc context at the time // we fire the AllocationTick event. It's only for tooling purpose. @@ -3723,6 +3730,13 @@ class Thread // See ThreadStore::TriggerGCForDeadThreadsIfNecessary() bool m_fHasDeadThreadBeenConsideredForGCTrigger; + // lazily allocated + CLRRandom* m_pRandom; + + public: + // TODO: where to delete the allocated CLRRandom object? + CLRRandom* GetRandom() { if (m_pRandom == nullptr) { m_pRandom = new CLRRandom(); m_pRandom->Init(); } return m_pRandom; } + #ifdef FEATURE_COMINTEROP private: // Cookie returned from CoRegisterInitializeSpy diff --git a/src/coreclr/vm/threadsuspend.cpp b/src/coreclr/vm/threadsuspend.cpp index 9cdb8689984339..9649599df5181a 100644 --- a/src/coreclr/vm/threadsuspend.cpp +++ b/src/coreclr/vm/threadsuspend.cpp @@ -2363,7 +2363,7 @@ void Thread::PerformPreemptiveGC() // BUG(github #10318) - when not using allocation contexts, the alloc lock // must be acquired here. Until fixed, this assert prevents random heap corruption. _ASSERTE(GCHeapUtilities::UseThreadAllocationContexts()); - GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context); + GCHeapUtilities::GetGCHeap()->StressHeap(&t_runtime_thread_locals.alloc_context.gc_allocation_context); m_bGCStressing = FALSE; } m_GCOnTransitionsOK = TRUE; diff --git a/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs b/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs index 2d7f92cb4cb247..13634724422117 100644 --- a/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs +++ b/src/native/managed/cdacreader/src/Data/RuntimeThreadLocals.cs @@ -11,6 +11,15 @@ static RuntimeThreadLocals IData.Create(Target target, Targ public RuntimeThreadLocals(Target target, TargetPointer address) { Target.TypeInfo type = target.GetTypeInfo(DataType.RuntimeThreadLocals); + + // TODO: Before the GCAllocationContext, there is a pointer to the "combined limit" used to randomly sample allocations. + // How to get the size of a pointer here so the offset should be correct? + //ex: + // AllocContext = target.ProcessedData.GetOrAdd( + // address + + // (ulong)type.Fields[nameof(AllocContext)].Offset + // + sizeof(pointer) + // ); AllocContext = target.ProcessedData.GetOrAdd(address + (ulong)type.Fields[nameof(AllocContext)].Offset); } diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs new file mode 100644 index 00000000000000..1856bfc082cff9 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.cs @@ -0,0 +1,174 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Diagnostics.Tracing; +using System.IO; +using System.Linq; +using System.Text; +using System.Threading; +using System.Threading.Tasks; +using Microsoft.Diagnostics.Tracing; +using Microsoft.Diagnostics.Tracing.Parsers.Clr; +using Microsoft.Diagnostics.NETCore.Client; +using Tracing.Tests.Common; +using Xunit; + +namespace Tracing.Tests +{ + public class AllocationSamplingValidation + { + [Fact] + public static int TestEntryPoint() + { + // check that AllocationSampled events are generated and size + type name are correct + var ret = IpcTraceTest.RunAndValidateEventCounts( + new Dictionary() { { "Microsoft-Windows-DotNETRuntime", -1 } }, + _eventGeneratingActionForAllocations, + // AllocationSamplingKeyword (0x80000000000): 0b1000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000 + new List() { new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0x80000000000) }, + 1024, _DoesTraceContainEnoughAllocationSampledEvents, enableRundownProvider: false); + if (ret != 100) + return ret; + + return 100; + } + + const int InstanceCount = 2000000; + const int MinExpectedEvents = 1; + static List _objects128s = new List(InstanceCount); + + // allocate objects to trigger dynamic allocation sampling events + private static Action _eventGeneratingActionForAllocations = () => + { + _objects128s.Clear(); + for (int i = 0; i < InstanceCount; i++) + { + if ((i != 0) && (i % (InstanceCount/5) == 0)) + Logger.logger.Log($"Allocated {i} instances..."); + + Object128 obj = new Object128(); + _objects128s.Add(obj); + } + + Logger.logger.Log($"{_objects128s.Count} instances allocated"); + }; + + private static Func> _DoesTraceContainEnoughAllocationSampledEvents = (source) => + { + int AllocationSampledEvents = 0; + int Object128Count = 0; + source.Dynamic.All += (eventData) => + { + if (eventData.ID == (TraceEventID)303) // AllocationSampled is not defined in TraceEvent yet + { + AllocationSampledEvents++; + + AllocationSampledData payload = new AllocationSampledData(eventData, source.PointerSize); + // uncomment to see the allocation events payload + //Logger.logger.Log($"{payload.HeapIndex} - {payload.AllocationKind} | ({payload.ObjectSize}) {payload.TypeName} = 0x{payload.Address}"); + if (payload.TypeName == "Tracing.Tests.Object128") + { + Object128Count++; + } + } + }; + return () => { + Logger.logger.Log("AllocationSampled counts validation"); + Logger.logger.Log("Nb events: " + AllocationSampledEvents); + Logger.logger.Log("Nb object128: " + Object128Count); + return (AllocationSampledEvents >= MinExpectedEvents) && (Object128Count != 0) ? 100 : -1; + }; + }; + } + + internal class Object0 + { + } + + internal class Object128 : Object0 + { + private readonly UInt64 _x1; + private readonly UInt64 _x2; + private readonly UInt64 _x3; + private readonly UInt64 _x4; + private readonly UInt64 _x5; + private readonly UInt64 _x6; + private readonly UInt64 _x7; + private readonly UInt64 _x8; + private readonly UInt64 _x9; + private readonly UInt64 _x10; + private readonly UInt64 _x11; + private readonly UInt64 _x12; + private readonly UInt64 _x13; + private readonly UInt64 _x14; + private readonly UInt64 _x15; + private readonly UInt64 _x16; + } + + // AllocationSampled is not defined in TraceEvent yet + // + // + // + // + // + // + // + // + // + // + class AllocationSampledData + { + const int EndOfStringCharLength = 2; + private TraceEvent _payload; + private int _pointerSize; + public AllocationSampledData(TraceEvent payload, int pointerSize) + { + _payload = payload; + _pointerSize = pointerSize; + TypeName = "?"; + + ComputeFields(); + } + + public GCAllocationKind AllocationKind; + public int ClrInstanceID; + public UInt64 TypeID; + public string TypeName; + public int HeapIndex; + public UInt64 Address; + public long ObjectSize; + public long SampledByteOffset; + + private void ComputeFields() + { + int offsetBeforeString = 4 + 2 + _pointerSize; + + Span data = _payload.EventData().AsSpan(); + AllocationKind = (GCAllocationKind)BitConverter.ToInt32(data.Slice(0, 4)); + ClrInstanceID = BitConverter.ToInt16(data.Slice(4, 2)); + if (_pointerSize == 4) + { + TypeID = BitConverter.ToUInt32(data.Slice(6, _pointerSize)); + } + else + { + TypeID = BitConverter.ToUInt64(data.Slice(6, _pointerSize)); + } + TypeName = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength - 4 - _pointerSize - 8 - 8)); + HeapIndex = BitConverter.ToInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength, 4)); + if (_pointerSize == 4) + { + Address = BitConverter.ToUInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); + } + else + { + Address = BitConverter.ToUInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); + } + ObjectSize = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize, 8)); + SampledByteOffset = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize + 8, 8)); + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj new file mode 100644 index 00000000000000..040aac14727f59 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/allocationsampling.csproj @@ -0,0 +1,26 @@ + + + + true + .NETCoreApp + true + true + + true + true + + + true + + + + + guard + + + + + + + + diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj new file mode 100644 index 00000000000000..01e8ecfa42a698 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Allocate.csproj @@ -0,0 +1,9 @@ + + + + true + Exe + .NETCoreApp + + + diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs new file mode 100644 index 00000000000000..ee18309c547acf --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs @@ -0,0 +1,21 @@ +using System; +using System.Collections.Generic; +using System.Linq; + +namespace Allocate +{ + public class AllocateArraysOfDoubles : IAllocations + { + public void Allocate(int count) + { + List arrays = new List(count); + + for (int i = 0; i < count; i++) + { + arrays.Add(new double[1] { i }); + } + + Console.WriteLine($"Sum {arrays.Count} arrays of one double = {arrays.Sum(doubles => doubles[0])}"); + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs new file mode 100644 index 00000000000000..8dfaecb0cf3509 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateDifferentTypes.cs @@ -0,0 +1,49 @@ +using System; +using System.Collections.Generic; + +namespace Allocate +{ + public class AllocateDifferentTypes : IAllocations + { + public void Allocate(int count) + { + List objects = new List(count); + + for (int i = 0; i < count; i++) + { + objects.Add(new string('c', 37)); + objects.Add(new WithFinalizer(i)); + objects.Add(new byte[173]); + int[,] matrix = { { 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } }; + objects.Add(matrix); + } + + Console.WriteLine($"{objects.Count} objects"); + } + } + + public class WithFinalizer + { + private static int _counter; + + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + + public static int Counter => _counter; + + public WithFinalizer(int id) + { + _counter++; + + _x1 = (UInt16)(id % 10); + _x2 = (UInt16)(id % 100); + _x3 = (UInt16)(id % 1000); + } + + ~WithFinalizer() + { + _counter--; + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs new file mode 100644 index 00000000000000..5af08b3991593f --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateRatioSizedArrays.cs @@ -0,0 +1,44 @@ +using System; +using System.Collections.Generic; + +namespace Allocate +{ + public class AllocateRatioSizedArrays : IAllocations + { + public void Allocate(int count) + { + // We can't keep the objects in memory, just keep their size + List sizes= new List(count * 5); + + var gcCount = GC.CollectionCount(0); + + for (int i = 0; i < count; i++) + { + var bytes1 = new byte[1024]; + bytes1[1] = 1; + sizes.Add(bytes1.Length); + var bytes2 = new byte[10240]; + bytes2[2] = 2; + sizes.Add(bytes2.Length); + var bytes3 = new byte[102400]; + bytes3[3] = 3; + sizes.Add(bytes3.Length); + var bytes4 = new byte[1024000]; + bytes4[4] = 4; + sizes.Add(bytes4.Length); + var bytes5 = new byte[10240000]; + bytes5[5] = 5; + sizes.Add(bytes5.Length); + } + + Console.WriteLine($"+ {GC.CollectionCount(0) - gcCount} collections"); + + long totalAllocated = 0; + foreach (int size in sizes) + { + totalAllocated += size; + } + Console.WriteLine($"{sizes.Count} arrays for {totalAllocated / 1024} KB"); + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs new file mode 100644 index 00000000000000..5f8660be6a74d3 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateSmallAndBig.cs @@ -0,0 +1,180 @@ +#pragma warning disable CS0169 // Remove unused private members +#pragma warning disable IDE0049 // Simplify Names + +using System; +using System.Collections.Generic; + +namespace Allocate +{ + public class AllocateSmallAndBig : IAllocations + { + public void Allocate(int count) + { + Dictionary allocations = Initialize(); + List objects = new List(1024 * 1024); + + AllocateSmallThenBig(count/2, objects, allocations); + Console.WriteLine(); + AllocateBigThenSmall(count/2, objects, allocations); + Console.WriteLine(); + } + + private void AllocateSmallThenBig(int count, List objects, Dictionary allocations) + { + for (int i = 0; i < count; i++) + { + // allocate from smaller to larger + objects.Add(new Object24()); + objects.Add(new Object32()); + objects.Add(new Object48()); + objects.Add(new Object80()); + objects.Add(new Object144()); + } + + allocations[nameof(Object24)].Count = count; + allocations[nameof(Object24)].Size = count * 24; + allocations[nameof(Object32)].Count = count; + allocations[nameof(Object32)].Size = count * 32; + allocations[nameof(Object48)].Count = count; + allocations[nameof(Object48)].Size = count * 48; + allocations[nameof(Object80)].Count = count; + allocations[nameof(Object80)].Size = count * 80; + allocations[nameof(Object144)].Count = count; + allocations[nameof(Object144)].Size = count * 144; + + DumpAllocations(allocations); + Clear(allocations); + objects.Clear(); + } + + private void AllocateBigThenSmall(int count, List objects, Dictionary allocations) + { + for (int i = 0; i < count; i++) + { + // allocate from larger to smaller + objects.Add(new Object144()); + objects.Add(new Object80()); + objects.Add(new Object48()); + objects.Add(new Object32()); + objects.Add(new Object24()); + } + + allocations[nameof(Object24)].Count = count; + allocations[nameof(Object24)].Size = count * 24; + allocations[nameof(Object32)].Count = count; + allocations[nameof(Object32)].Size = count * 32; + allocations[nameof(Object48)].Count = count; + allocations[nameof(Object48)].Size = count * 48; + allocations[nameof(Object80)].Count = count; + allocations[nameof(Object80)].Size = count * 80; + allocations[nameof(Object144)].Count = count; + allocations[nameof(Object144)].Size = count * 144; + + DumpAllocations(allocations); + Clear(allocations); + objects.Clear(); + } + + private Dictionary Initialize() + { + var allocations = new Dictionary(16); + allocations[nameof(Object24)] = new AllocStats(); + allocations[nameof(Object32)] = new AllocStats(); + allocations[nameof(Object48)] = new AllocStats(); + allocations[nameof(Object80)] = new AllocStats(); + allocations[nameof(Object144)] = new AllocStats(); + + Clear(allocations); + return allocations; + } + + private void Clear(Dictionary allocations) + { + allocations[nameof(Object24)].Count = 0; + allocations[nameof(Object24)].Size = 0; + allocations[nameof(Object32)].Count = 0; + allocations[nameof(Object32)].Size = 0; + allocations[nameof(Object48)].Count = 0; + allocations[nameof(Object48)].Size = 0; + allocations[nameof(Object80)].Count = 0; + allocations[nameof(Object80)].Size = 0; + allocations[nameof(Object144)].Count = 0; + allocations[nameof(Object144)].Size = 0; + } + + private void DumpAllocations(Dictionary objects) + { + Console.WriteLine("Allocations start"); + foreach (var allocation in objects) + { + Console.WriteLine($"{allocation.Key}={allocation.Value.Count},{allocation.Value.Size}"); + } + + Console.WriteLine("Allocations end"); + } + + internal class AllocStats + { + public int Count { get; set; } + public long Size { get; set; } + } + + internal class Object0 + { + } + + internal class Object24 : Object0 + { + private readonly UInt32 _x1; + private readonly UInt32 _x2; + } + + internal class Object32 : Object0 + { + private readonly UInt64 _x1; + private readonly UInt64 _x2; + } + + internal class Object48 : Object0 + { + private readonly UInt64 _x1; + private readonly UInt64 _x2; + private readonly UInt64 _x3; + private readonly UInt64 _x4; + } + + internal class Object80 : Object0 + { + private readonly UInt64 _x1; + private readonly UInt64 _x2; + private readonly UInt64 _x3; + private readonly UInt64 _x4; + private readonly UInt64 _x5; + private readonly UInt64 _x6; + private readonly UInt64 _x7; + private readonly UInt64 _x8; + } + + internal class Object144 : Object0 + { + private readonly UInt64 _x1; + private readonly UInt64 _x2; + private readonly UInt64 _x3; + private readonly UInt64 _x4; + private readonly UInt64 _x5; + private readonly UInt64 _x6; + private readonly UInt64 _x7; + private readonly UInt64 _x8; + private readonly UInt64 _x9; + private readonly UInt64 _x10; + private readonly UInt64 _x11; + private readonly UInt64 _x12; + private readonly UInt64 _x13; + private readonly UInt64 _x14; + private readonly UInt64 _x15; + private readonly UInt64 _x16; + } + } +} +#pragma warning restore IDE0049 // Simplify Names +#pragma warning restore CS0169 // Remove unused private members \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs new file mode 100644 index 00000000000000..ee21414d2c2ddb --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocationsRunEventSource.cs @@ -0,0 +1,34 @@ +using System.Diagnostics.Tracing; + +namespace Allocate +{ + [EventSource(Name = "Allocations-Run")] + public class AllocationsRunEventSource : EventSource + { + public static readonly AllocationsRunEventSource Log = new AllocationsRunEventSource(); + + [Event(600, Level = EventLevel.Informational)] + public void StartRun(int iterationsCount, int allocationCount, string listOfTypes) + { + WriteEvent(eventId: 600, iterationsCount, allocationCount, listOfTypes); + } + + [Event(601, Level = EventLevel.Informational)] + public void StopRun() + { + WriteEvent(eventId: 601); + } + + [Event(602, Level = EventLevel.Informational)] + public void StartIteration(int iteration) + { + WriteEvent(eventId: 602, iteration); + } + + [Event(603, Level = EventLevel.Informational)] + public void StopIteration(int iteration) + { + WriteEvent(eventId: 603, iteration); + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs new file mode 100644 index 00000000000000..3ee00f39adcdfe --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/IAllocations.cs @@ -0,0 +1,8 @@ + +namespace Allocate +{ + public interface IAllocations + { + public void Allocate(int count); + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs new file mode 100644 index 00000000000000..f7220a11289752 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/Program.cs @@ -0,0 +1,133 @@ +using System; +using System.Diagnostics; + +namespace Allocate +{ + public enum Scenario + { + SmallAndBig = 1, + PerThread = 2, + ArrayOfDouble = 3, + FinalizerAndArraysAndStrings = 4, + RatioSizedArrays = 5, + } + + + internal class Program + { + static void Main(string[] args) + { + if (args.Length < 1) + { + Console.WriteLine("Usage: Allocate --scenario (1|2|3|4|5) [--iterations (number of iterations)] [--allocations (allocations count)]"); + Console.WriteLine(" 1: small and big allocations"); + Console.WriteLine(" 2: allocations per thread"); + Console.WriteLine(" 3: arrays of double (for x86)"); + Console.WriteLine(" 4: different types of objects"); + Console.WriteLine(" 5: ratio sized arrays"); + return; + } + ParseCommandLine(args, out Scenario scenario, out int allocationsCount, out int iterations); + + IAllocations allocationsRun = null; + string allocatedTypes = string.Empty; + + switch(scenario) + { + case Scenario.SmallAndBig: + allocationsRun = new AllocateSmallAndBig(); + allocatedTypes = "Object24;Object32;Object48;Object80;Object144"; + break; + case Scenario.PerThread: + allocationsRun = new ThreadedAllocations(); + allocatedTypes = "Object24;Object48;Object72;Object32;Object64;Object96"; + break; + case Scenario.ArrayOfDouble: + allocationsRun = new AllocateArraysOfDoubles(); + allocatedTypes = "System.Double[]"; + break; + case Scenario.FinalizerAndArraysAndStrings: + allocationsRun = new AllocateDifferentTypes(); + allocatedTypes = "System.String;Allocate.WithFinalizer;System.Byte[]"; + break; + case Scenario.RatioSizedArrays: + allocationsRun = new AllocateRatioSizedArrays(); + allocatedTypes = "System.Byte[]"; + break; + default: + Console.WriteLine($"Invalid scenario: '{scenario}'"); + return; + } + + Console.WriteLine($"pid = {Process.GetCurrentProcess().Id}"); + Console.ReadLine(); + + if (allocationsRun != null) + { + Stopwatch clock = new Stopwatch(); + clock.Start(); + + AllocationsRunEventSource.Log.StartRun(iterations, allocationsCount, allocatedTypes); + for (int i = 0; i < iterations; i++) + { + AllocationsRunEventSource.Log.StartIteration(i); + allocationsRun.Allocate(allocationsCount); + AllocationsRunEventSource.Log.StopIteration(i); + } + AllocationsRunEventSource.Log.StopRun(); + + clock.Stop(); + Console.WriteLine($"Duration = {clock.ElapsedMilliseconds} ms"); + } + } + + private static void ParseCommandLine(string[] args, out Scenario scenario, out int allocationsCount, out int iterations) + { + iterations = 100; + allocationsCount = 1_000_000; + scenario = Scenario.SmallAndBig; + + for (int i = 0; i < args.Length; i++) + { + string arg = args[i]; + + if ("--scenario".Equals(arg, StringComparison.OrdinalIgnoreCase)) + { + int valueOffset = i + 1; + if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number)) + { + scenario = (Scenario)number; + } + } + else + if ("--iterations".Equals(arg, StringComparison.OrdinalIgnoreCase)) + { + int valueOffset = i + 1; + if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number)) + { + if (number <= 0) + { + throw new ArgumentOutOfRangeException($"Invalid iterations count '{number}': must be > 0"); + } + + iterations = number; + } + } + else + if ("--allocations".Equals(arg, StringComparison.OrdinalIgnoreCase)) + { + int valueOffset = i + 1; + if (valueOffset < args.Length && int.TryParse(args[valueOffset], out var number)) + { + if (number <= 0) + { + throw new ArgumentOutOfRangeException($"Invalid numbers of allocations '{number}: must be > 0"); + } + + allocationsCount = number; + } + } + } + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs new file mode 100644 index 00000000000000..8172a19a9fa822 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/ThreadedAllocations.cs @@ -0,0 +1,176 @@ +#pragma warning disable CS0169 // Remove unused private members +#pragma warning disable IDE0049 // Simplify Names + +using System; +using System.Collections.Generic; +using System.Threading; + +namespace Allocate +{ + public class ThreadedAllocations : IAllocations + { + public void Allocate(int count) + { + List objects1 = new List(1024 * 1024); + List objects2 = new List(1024 * 1024); + + Thread[] threads = new Thread[2]; + threads[0] = new Thread(() => Allocate1(count, objects1)); + threads[1] = new Thread(() => Allocate2(count, objects2)); + + for (int i = 0; i < threads.Length; i++) { threads[i].Start(); } + for (int i = 0; i < threads.Length; i++) { threads[i].Join(); } + + Console.WriteLine($"Allocated {objects1.Count + objects2.Count} objects"); + } + + private void Allocate1(int count, List objects) + { + for (int i = 0; i < count; i++) + { + objects.Add(new Object24()); + objects.Add(new Object48()); + objects.Add(new Object72()); + } + } + + private void Allocate2(int count, List objects) + { + for (int i = 0; i < count; i++) + { + objects.Add(new Object32()); + objects.Add(new Object64()); + objects.Add(new Object96()); + } + } + + internal class Object0 + { + } + + internal class Object24 : Object0 + { + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + } + + internal class Object32 : Object0 + { + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + private readonly UInt16 _x4; + private readonly UInt16 _x5; + private readonly UInt16 _x6; + private readonly UInt16 _x7; + } + + internal class Object48 : Object0 + { + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + private readonly UInt16 _x4; + private readonly UInt16 _x5; + private readonly UInt16 _x6; + private readonly UInt16 _x7; + private readonly UInt16 _x8; + private readonly UInt16 _x9; + private readonly UInt16 _x10; + private readonly UInt16 _x11; + private readonly UInt16 _x12; + private readonly UInt16 _x13; + private readonly UInt16 _x14; + private readonly UInt16 _x15; + } + + internal class Object64 : Object0 + { + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + private readonly UInt16 _x4; + private readonly UInt16 _x5; + private readonly UInt16 _x6; + private readonly UInt16 _x7; + private readonly UInt16 _x8; + private readonly UInt16 _x9; + private readonly UInt16 _x10; + private readonly UInt16 _x11; + private readonly UInt16 _x12; + private readonly UInt16 _x13; + private readonly UInt16 _x14; + private readonly UInt16 _x15; + private readonly UInt16 _x16; + private readonly UInt16 _x17; + private readonly UInt16 _x18; + private readonly UInt16 _x19; + private readonly UInt16 _x20; + private readonly UInt16 _x21; + private readonly UInt16 _x22; + private readonly UInt16 _x23; + private readonly UInt16 _x24; + } + + internal class Object72 : Object0 + { + private readonly UInt16 _x1; + private readonly UInt16 _x2; + private readonly UInt16 _x3; + private readonly UInt16 _x4; + private readonly UInt16 _x5; + private readonly UInt16 _x6; + private readonly UInt16 _x7; + private readonly UInt16 _x8; + private readonly UInt16 _x9; + private readonly UInt16 _x10; + private readonly UInt16 _x11; + private readonly UInt16 _x12; + private readonly UInt16 _x13; + private readonly UInt16 _x14; + private readonly UInt16 _x15; + private readonly UInt16 _x16; + private readonly UInt16 _x17; + private readonly UInt16 _x18; + private readonly UInt16 _x19; + private readonly UInt16 _x20; + private readonly UInt16 _x21; + private readonly UInt16 _x22; + private readonly UInt16 _x23; + private readonly UInt16 _x24; + private readonly UInt16 _x25; + private readonly UInt16 _x26; + private readonly UInt16 _x27; + private readonly UInt16 _x28; + } + + internal class Object96 : Object0 + { + private readonly UInt32 _x1; + private readonly UInt32 _x2; + private readonly UInt32 _x3; + private readonly UInt32 _x4; + private readonly UInt32 _x5; + private readonly UInt32 _x6; + private readonly UInt32 _x7; + private readonly UInt32 _x8; + private readonly UInt32 _x9; + private readonly UInt32 _x10; + private readonly UInt32 _x11; + private readonly UInt32 _x12; + private readonly UInt32 _x13; + private readonly UInt32 _x14; + private readonly UInt32 _x15; + private readonly UInt32 _x16; + private readonly UInt32 _x17; + private readonly UInt32 _x18; + private readonly UInt32 _x19; + private readonly UInt32 _x20; + } + } +} + + +#pragma warning restore IDE0049 // Simplify Names +#pragma warning restore CS0169 // Remove unused private members \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj new file mode 100644 index 00000000000000..4a1f3d25c23b34 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.csproj @@ -0,0 +1,14 @@ + + + + true + Exe + .NETCoreApp + + + + + + + + diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln new file mode 100644 index 00000000000000..6e5beeaa3691f3 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/AllocationProfiler.sln @@ -0,0 +1,51 @@ + +Microsoft Visual Studio Solution File, Format Version 12.00 +# Visual Studio Version 17 +VisualStudioVersion = 17.9.34616.47 +MinimumVisualStudioVersion = 10.0.40219.1 +Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "AllocationProfiler", "AllocationProfiler.csproj", "{1530D7FB-8635-4267-A7B0-EB1280780CAA}" +EndProject +Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Allocate", "Allocate\Allocate.csproj", "{883FD439-6B92-421F-A68B-D22FFC21BF0A}" +EndProject +Global + GlobalSection(SolutionConfigurationPlatforms) = preSolution + Debug|Any CPU = Debug|Any CPU + Debug|x64 = Debug|x64 + Debug|x86 = Debug|x86 + Release|Any CPU = Release|Any CPU + Release|x64 = Release|x64 + Release|x86 = Release|x86 + EndGlobalSection + GlobalSection(ProjectConfigurationPlatforms) = postSolution + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|Any CPU.Build.0 = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x64.ActiveCfg = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x64.Build.0 = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x86.ActiveCfg = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Debug|x86.Build.0 = Debug|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|Any CPU.ActiveCfg = Release|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|Any CPU.Build.0 = Release|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x64.ActiveCfg = Release|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x64.Build.0 = Release|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x86.ActiveCfg = Release|Any CPU + {1530D7FB-8635-4267-A7B0-EB1280780CAA}.Release|x86.Build.0 = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|Any CPU.Build.0 = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x64.ActiveCfg = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x64.Build.0 = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x86.ActiveCfg = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Debug|x86.Build.0 = Debug|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|Any CPU.ActiveCfg = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|Any CPU.Build.0 = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x64.ActiveCfg = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x64.Build.0 = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x86.ActiveCfg = Release|Any CPU + {883FD439-6B92-421F-A68B-D22FFC21BF0A}.Release|x86.Build.0 = Release|Any CPU + EndGlobalSection + GlobalSection(SolutionProperties) = preSolution + HideSolutionNode = FALSE + EndGlobalSection + GlobalSection(ExtensibilityGlobals) = postSolution + SolutionGuid = {64F6D2D8-C43C-41D5-8CEA-8F45ADF2EC6C} + EndGlobalSection +EndGlobal diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs new file mode 100644 index 00000000000000..72719d6bf97395 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/Program.cs @@ -0,0 +1,470 @@ +using Microsoft.Diagnostics.NETCore.Client; +using Microsoft.Diagnostics.Tracing.Parsers; +using Microsoft.Diagnostics.Tracing; +using System.Diagnostics.Tracing; +using Microsoft.Diagnostics.Tracing.Parsers.Clr; +using System.Text; +using System.Runtime.CompilerServices; + +namespace DynamicAllocationSampling +{ + internal class TypeInfo + { + public string TypeName = "?"; + public int Count; + public long Size; + public long TotalSize; + public long RemainderSize; + + public override int GetHashCode() + { + return (TypeName+Size).GetHashCode(); + } + + public override bool Equals(object obj) + { + if (obj == null) + { + return false; + } + + if (!(obj is TypeInfo)) + { + return false; + } + + return (TypeName+Size).Equals(((TypeInfo)obj).TypeName+Size); + } + } + + internal class Program + { + private static Dictionary _sampledTypes = new Dictionary(); + private static Dictionary _tickTypes = new Dictionary(); + private static List> _sampledTypesInRun = null; + private static List> _tickTypesInRun = null; + private static int _allocationsCount = 0; + private static List _allocatedTypes = new List(); + private static EventPipeEventSource _source; +; + + static void Main(string[] args) + { + if (args.Length == 0) + { + Console.WriteLine("No process ID specified"); + return; + } + + int pid = -1; + if (!int.TryParse(args[0], out pid)) + { + Console.WriteLine($"Invalid specified process ID '{args[0]}'"); + return; + } + + try + { + PrintEventsLive(pid); + } + catch (Exception x) + { + Console.WriteLine(x.Message); + } + } + + + public static void PrintEventsLive(int processId) + { + var providers = new List() + { + new EventPipeProvider( + "Microsoft-Windows-DotNETRuntime", + EventLevel.Verbose, // verbose is required for AllocationTick + (long)0x80000000001 // new AllocationSamplingKeyword + GCKeyword + ), + new EventPipeProvider( + "Allocations-Run", + EventLevel.Informational + ), + }; + var client = new DiagnosticsClient(processId); + + using (var session = client.StartEventPipeSession(providers, false)) + { + Console.WriteLine(); + + Task streamTask = Task.Run(() => + { + var source = new EventPipeEventSource(session.EventStream); + _source = source; + + ClrTraceEventParser clrParser = new ClrTraceEventParser(source); + clrParser.GCAllocationTick += OnAllocationTick; + source.Dynamic.All += OnEvents; + + try + { + source.Process(); + } + catch (Exception e) + { + Console.WriteLine($"Error encountered while processing events: {e.Message}"); + } + }); + + Task inputTask = Task.Run(() => + { + while (Console.ReadKey().Key != ConsoleKey.Enter) + { + Thread.Sleep(100); + } + session.Stop(); + }); + + Task.WaitAny(streamTask, inputTask); + } + + // not all cases are emitting allocations run events + if ((_sampledTypesInRun == null) && (_sampledTypes.Count > 0)) + { + ShowIterationResults(); + } + } + + private const long SAMPLING_MEAN = 100 * 1024; + private const double SAMPLING_RATIO = 0.999990234375 / 0.000009765625; + private static long UpscaleSize(long totalSize, int count, long mean, long sizeRemainder) + { + //// This is the Poisson process based scaling + //var averageSize = (double)totalSize / (double)count; + //var scale = 1 / (1 - Math.Exp(-averageSize / mean)); + //return (long)(totalSize * scale); + + // use the upscaling method detailed in the PR + // = sq/p + u + // s = # of samples for a type + // q = 1 - 1/102400 + // p = 1/102400 + // u = sum of object remainders = Sum(object_size - sampledByteOffset) for all samples + return (long)(SAMPLING_RATIO * count + sizeRemainder); + } + + private static void OnAllocationTick(GCAllocationTickTraceData payload) + { + // skip unexpected types + if (!_allocatedTypes.Contains(payload.TypeName)) return; + + if (!_tickTypes.TryGetValue(payload.TypeName + payload.ObjectSize, out TypeInfo typeInfo)) + { + typeInfo = new TypeInfo() { TypeName = payload.TypeName, Count = 0, Size = payload.ObjectSize, TotalSize = 0 }; + _tickTypes.Add(payload.TypeName + payload.ObjectSize, typeInfo); + } + typeInfo.Count++; + typeInfo.TotalSize += (int)payload.ObjectSize; + } + + private static void OnEvents(TraceEvent eventData) + { + if (eventData.ID == (TraceEventID)303) + { + AllocationSampledData payload = new AllocationSampledData(eventData, _source.PointerSize); + + // skip unexpected types + if (!_allocatedTypes.Contains(payload.TypeName)) return; + + if (!_sampledTypes.TryGetValue(payload.TypeName+payload.ObjectSize, out TypeInfo typeInfo)) + { + typeInfo = new TypeInfo() { TypeName = payload.TypeName, Count = 0, Size = (int)payload.ObjectSize, TotalSize = 0, RemainderSize = payload.ObjectSize - payload.SampledByteOffset }; + _sampledTypes.Add(payload.TypeName + payload.ObjectSize, typeInfo); + } + typeInfo.Count++; + typeInfo.TotalSize += (int)payload.ObjectSize; + typeInfo.RemainderSize += (payload.ObjectSize - payload.SampledByteOffset); + + return; + } + + if (eventData.ID == (TraceEventID)600) + { + AllocationsRunData payload = new AllocationsRunData(eventData); + Console.WriteLine($"> starts {payload.Iterations} iterations allocating {payload.Count} instances"); + + _sampledTypesInRun = new List>(payload.Iterations); + _tickTypesInRun = new List>(payload.Iterations); + _allocationsCount = payload.Count; + string allocatedTypes = payload.AllocatedTypes; + if (allocatedTypes.Length > 0) + { + _allocatedTypes = allocatedTypes.Split(';').ToList(); + } + + return; + } + + if (eventData.ID == (TraceEventID)601) + { + Console.WriteLine("\n< run stops\n"); + + ShowRunResults(); + return; + } + + if (eventData.ID == (TraceEventID)602) + { + AllocationsRunIterationData payload = new AllocationsRunIterationData(eventData); + Console.Write($"{payload.Iteration}"); + + _sampledTypes.Clear(); + _tickTypes.Clear(); + return; + } + + if (eventData.ID == (TraceEventID)603) + { + Console.WriteLine("|"); + ShowIterationResults(); + + _sampledTypesInRun.Add(_sampledTypes); + _sampledTypes = new Dictionary(); + _tickTypesInRun.Add(_tickTypes); + _tickTypes = new Dictionary(); + return; + } + } + + private static void ShowRunResults() + { + var iterations = _sampledTypesInRun.Count; + + // for each type, get the percent diff between upscaled count and expected _allocationsCount + Dictionary> typeDistribution = new Dictionary>(); + foreach (var iteration in _sampledTypesInRun) + { + foreach (var info in iteration.Values) + { + // ignore types outside of the allocations run + if (info.Count < 16) continue; + + if (!typeDistribution.TryGetValue(info, out List distribution)) + { + distribution = new List(iterations); + typeDistribution.Add(info, distribution); + } + + var upscaledCount = (long)info.Count * UpscaleSize(info.TotalSize, info.Count, SAMPLING_MEAN, info.RemainderSize) / info.TotalSize; + var percentDiff = (double)(upscaledCount - _allocationsCount) / (double)_allocationsCount; + distribution.Add(percentDiff); + } + } + + foreach (var type in typeDistribution.Keys.OrderBy(t => t.Size)) + { + var distribution = typeDistribution[type]; + + string typeName = type.TypeName; + if (typeName.Contains("[]")) + { + typeName += $" ({type.Size} bytes)"; + } + Console.WriteLine(typeName); + Console.WriteLine("-------------------------"); + int current = 1; + foreach (var diff in distribution.OrderBy(v => v)) + { + if (iterations > 20) + { + if ((current <= 5) || ((current >= 49) && (current < 52)) || (current >= 96)) + { + Console.WriteLine($"{current,4} {diff,8:0.0 %}"); + } + else + if ((current == 6) || (current == 95)) + { + Console.WriteLine(" ..."); + } + } + else + { + Console.WriteLine($"{current,4} {diff,8:0.0 %}"); + } + + current++; + } + Console.WriteLine(); + } + } + + private static void ShowIterationResults() + { + // NOTE: need to take the size into account for array types + // print the sampled types for both AllocationTick and AllocationSampled + Console.WriteLine("Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name"); + Console.WriteLine("--------------------------------------------------------------------------------------------------"); + foreach (var type in _sampledTypes.Values.OrderBy(v => v.Size)) + { + string tag = "S"; + if (_tickTypes.TryGetValue(type.TypeName + type.Size, out TypeInfo tickType)) + { + tag += "T"; + } + + Console.Write($"{tag,3} {type.Count,6}"); + if (tag == "S") + { + Console.Write($" {0,6}"); + } + else + { + Console.Write($" {tickType.Count,6}"); + } + + Console.Write($" {type.TotalSize,13}"); + if (tag == "S") + { + Console.Write($" {0,13}"); + } + else + { + Console.Write($" {tickType.TotalSize,13}"); + } + + string typeName = type.TypeName; + if (typeName.Contains("[]")) + { + typeName += $" ({type.Size} bytes)"; + } + + if (type.Count != 0) + { + Console.WriteLine($" {type.TotalSize / type.Count,9} {UpscaleSize(type.TotalSize, type.Count, SAMPLING_MEAN, type.RemainderSize),13} {(long)type.Count * UpscaleSize(type.TotalSize, type.Count, SAMPLING_MEAN, type.RemainderSize) / type.TotalSize,10} {typeName}"); + } + } + + foreach (var type in _tickTypes.Values) + { + string tag = "T"; + + if (!_sampledTypes.ContainsKey(type.TypeName + type.Size)) + { + string typeName = type.TypeName; + if (typeName.Contains("[]")) + { + typeName += $" ({type.Size} bytes)"; + } + + Console.WriteLine($"{tag,3} {"0",6} {type.Count,6} {"0",13} {type.TotalSize,13} {type.TotalSize / type.Count,9} {"0",13} {"0",10} {typeName}"); + } + } + } + } + + + // + // + // + // + // + // + // + // + class AllocationSampledData + { + const int EndOfStringCharLength = 2; + private TraceEvent _payload; + private int _pointerSize; + public AllocationSampledData(TraceEvent payload, int pointerSize) + { + _payload = payload; + _pointerSize = pointerSize; + TypeName = "?"; + + ComputeFields(); + } + + public GCAllocationKind AllocationKind; + public int ClrInstanceID; + public UInt64 TypeID; + public string TypeName; + public int HeapIndex; + public UInt64 Address; + public long ObjectSize; + public long SampledByteOffset; + + private void ComputeFields() + { + int offsetBeforeString = 4 + 2 + _pointerSize; + + Span data = _payload.EventData().AsSpan(); + AllocationKind = (GCAllocationKind)BitConverter.ToInt32(data.Slice(0, 4)); + ClrInstanceID = BitConverter.ToInt16(data.Slice(4, 2)); + if (_pointerSize == 4) + { + TypeID = BitConverter.ToUInt32(data.Slice(6, _pointerSize)); + } + else + { + TypeID = BitConverter.ToUInt64(data.Slice(6, _pointerSize)); + } + // \0 should not be included for GetString to work + TypeName = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength - 4 - _pointerSize - 8 - 8)); + HeapIndex = BitConverter.ToInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength, 4)); + if (_pointerSize == 4) + { + Address = BitConverter.ToUInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); + } + else + { + Address = BitConverter.ToUInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); + } + ObjectSize = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize, 8)); + SampledByteOffset = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + _pointerSize + 8, 8)); + } + } + + class AllocationsRunData + { + const int EndOfStringCharLength = 2; + private TraceEvent _payload; + + public AllocationsRunData(TraceEvent payload) + { + _payload = payload; + + ComputeFields(); + } + + public int Iterations; + public int Count; + public string AllocatedTypes; + + private void ComputeFields() + { + int offsetBeforeString = 4 + 4; + + Span data = _payload.EventData().AsSpan(); + Iterations = BitConverter.ToInt32(data.Slice(0, 4)); + Count = BitConverter.ToInt32(data.Slice(4, 4)); + AllocatedTypes = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength)); + } + } + + class AllocationsRunIterationData + { + private TraceEvent _payload; + public AllocationsRunIterationData(TraceEvent payload) + { + _payload = payload; + + ComputeFields(); + } + + public int Iteration; + + private void ComputeFields() + { + Span data = _payload.EventData().AsSpan(); + Iteration = BitConverter.ToInt32(data.Slice(0, 4)); + } + } +} diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md new file mode 100644 index 00000000000000..7f0d274ba1530b --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/README.md @@ -0,0 +1,112 @@ +# Manual Testing for Randomized Allocation Sampling + +This folder has a test app (Allocate sub-folder) and a profiler (AllocationProfiler.csproj) that together can be used to experimentally +observe the distribution of sampling events that are generated for different allocation scenarios. To run it: + +1. Build both projects +2. Run the Allocate app with corerun and use the --scenario argument to select an allocation scenario you want to validate +3. The Allocate app will print its own PID to the console and wait. +4. Run the AllocationProfiler passing in the allocate app PID as an argument +5. Hit Enter in the Allocate app to begin the allocations. You will see output in the profiler app's console showing the measurements. For example: + + ``` + Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name + ------------------------------------------------------------------------------------------- + S 1 0 24 0 24 102412 4267 System.Int16 + ST 44 61 1056 1464 24 4506128 187755 Object8 + ST 1 1 32 32 32 102416 3200 System.Reflection.MetadataImport + ST 67 30 2144 960 32 6861872 214433 Object16 + ST 80 169 3840 8112 48 8193920 170706 Object32 + S 1 0 56 0 56 102428 1829 MemberInfoCache`1[System.Reflection.RuntimeMethodInfo] + ST 2 3 160 240 80 204880 2561 System.String + S 2 0 128 0 64 204864 3201 System.Reflection.RuntimeMethodBody + S 1 0 80 0 80 102440 1280 System.Signature + ST 143 86 11440 6880 80 14648920 183111 Object64 + S 2 0 222 0 111 204911 1846 System.Byte[] + S 1 0 96 0 96 102448 1067 System.Reflection.RuntimeParameterInfo + S 1 0 112 0 112 102456 914 System.Reflection.ParameterInfo[] + ST 280 272 40320 39168 144 28692164 199251 Object128 + S 2 0 58224 0 29112 235289 8 EventMetadata[] + ST 1 1 8388632 8388640 8388632 8388632 1 Object0[] + T 0 1 0 336 336 0 0 System.Reflection.RuntimeFieldInfo[] + T 0 1 0 48 48 0 0 System.Text.StringBuilder +``` + +- The **Tag** column shows if Allocation**T**ick and/or Allocation**S**ampled events where received for instances of a given type +- The **S**-prefixed colums refer to data from AllocationSampled events payload +- The **T**-prefixed colums refer to data from AllocationTick events payload +- The final **Upscaled**XXX columns are computed from AllocationSampled events payload + +In this special case, the same number of 200000 instances were created and should be checked in the **UpscaledCount** column. + +In a second case, 2 threads allocate 200000 instances of objects with x1/x2/x3 size ratio to see how the relative size distribution is conserved: + +``` +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 47 67 1128 1608 24 4813364 200556 Object24 + ST 65 48 2080 1536 32 6657040 208032 Object32 + ST 108 94 5184 4512 48 11061792 230454 Object48 + ST 132 145 8448 9280 64 13521024 211266 Object64 + ST 155 87 11160 6264 72 15877580 220521 Object72 + ST 191 192 18336 18432 96 19567569 203828 Object96 + ST 2 2 16777264 16777280 8388632 16777264 2 Object0[] +``` + + +A dedicated `AllocationsRunEventSource` has been created to allow monitoring multiple allocation runs and compute percentiles: +``` +> starts 10 iterations allocating 1000000 instances +0| +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 246 224 5904 5376 24 25193352 1049723 Allocate.WithFinalizer + ST 5 7 320 448 64 512160 8002 System.RuntimeFieldInfoStub + ST 702 719 50544 51768 72 71910074 998751 System.Int32[,] + ST 946 859 90816 82464 96 96915815 1009539 System.String + ST 1842 1887 362874 377400 197 188802295 958387 System.Byte[] + ST 3 3 56000072 56000096 18666690 56000072 3 System.Object[] +1| +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 283 224 6792 5376 24 28982596 1207608 Allocate.WithFinalizer + ST 675 711 48600 51192 72 69144302 960337 System.Int32[,] + ST 974 867 93504 83232 96 99784359 1039420 System.String + ST 1861 1888 366617 377600 197 190749767 968272 System.Byte[] + ST 3 3 56000072 56000096 18666690 56000072 3 System.Object[] +2| +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 215 236 5160 5664 24 22018580 917440 Allocate.WithFinalizer + ST 1 1 64 64 64 102432 1600 System.RuntimeFieldInfoStub + ST 697 650 50184 46800 72 71397894 991637 System.Int32[,] + ST 927 917 88992 88032 96 94969302 989263 System.String + ST 1895 1886 373315 377200 197 194234717 985963 System.Byte[] + ST 3 3 56000072 56000096 18666690 56000072 3 System.Object[] + T 0 1 0 288 288 0 0 System.GCMemoryInfoData +3| +... +8| +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 244 213 5856 5112 24 24988528 1041188 Allocate.WithFinalizer + ST 710 681 51120 49032 72 72729562 1010132 System.Int32[,] + ST 974 918 93504 88128 96 99784359 1039420 System.String + ST 1920 1875 378240 375000 197 196797180 998970 System.Byte[] + ST 3 3 56000072 56000096 18666690 56000072 3 System.Object[] +9| +Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name +------------------------------------------------------------------------------------------- + ST 236 219 5664 5256 24 24169232 1007051 Allocate.WithFinalizer + ST 698 682 50256 49104 72 71500330 993060 System.Int32[,] + ST 940 913 90240 87648 96 96301127 1003136 System.String + ST 1982 1874 390454 374800 197 203152089 1031228 System.Byte[] + ST 3 3 56000072 56000096 18666690 56000072 3 System.Object[] + +< run stops +``` + +**TODO: I guess the Pxx should be computed on the ***UpscaledCount*** column. TO BE CONFIRMED.** + + +Feel free to allocate the patterns you want in other methods of the **_Allocate_** project and use the _DynamicAllocationSampling_ events listener to get a synthetic view of the different allocation events. \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md new file mode 100644 index 00000000000000..e2c372e39fc3b9 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/README.md @@ -0,0 +1,50 @@ +# Test Results + +This folder has the results of manual testing observed for this feature. It is here so reviewers can see it but is planned to be deleted before the PR is merged. + +## statistical distribution measures +The manual folder contains code to allocate and count objects in different runs. + +# Perf benchmarking +The performance impact of the PR has been measured against a baseline. +Each branch is built on Windows for x64 with: + .\build.cmd -s clr+libs -c release + src\tests\build.cmd generatelayoutonly Release + +## Baseline +commit d1f0e2930f86e8771ccbefa96aead6f960ecc3f4 (HEAD) +Author: Stephen Toub +Date: Sat Feb 3 18:52:31 2024 -0500 + +This is what is used for all "Baseline" measurements because the changes in this PR started from here. + +## PR +Latest version of the modified CoreCLR + +## Tool +The GCPerfSim module from the Performance repository has been run 10 times to allocate 500 GB of mixed size objects on 4 threads with a 50MB live object size. +\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe C:\git\benchmarks\artifacts\bin\GCPerfSim\release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time + +Here is the command line to measure the impact of computed and emitted events: +dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- + +The goal is to emphasize the impact of allocations on performance and GC collection overhead. + +## Results +The two implementation are very close in terms of impact. +- GCPerfSimx10_Baseline.txt: .NET version before the PR + 19.6675793 (median) + 19.82903766 (average) + +- GCPerfSimx10_PullRequest.txt: PR without provider enabled + 19.7984609 (median) + 19.7717041 (average) + +It is expected that AllocationTick is more expensive because of the required Verbosity level that emits much more events than just AllocationTick: +- GCPerfSimx10_PullRequest+Events.txt: same but with AllocationSampled emitted + 21.0216025 (median) + 21.03864168 (average) + +- GCPerfSimx10_Baseline+AllocationTick.txt: same but with AllocationTick emitted + 22.6581132 (median) + 22.78253674 (average) diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt new file mode 100644 index 00000000000000..9f0bfade448dce --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline+AllocationTick.txt @@ -0,0 +1,286 @@ +Duration in seconds +------------------- +23.1662995 +22.2750725 +22.8078224 +23.3056539 +23.8455668 +23.7292667 +22.508404 +22.4228874 +22.1675368 +21.5968574 +---------------------- +22.6581132 (median) +22.78253674 (average) + + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 48684 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 23.1662995 +collection_counts: [50621, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 63121952 +final_heap_size_bytes: 58251120 +final_fragmentation_bytes: 3586800 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 53784 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.2750725 +collection_counts: [50648, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 54469184 +final_heap_size_bytes: 56118032 +final_fragmentation_bytes: 1796880 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 41684 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.8078224 +collection_counts: [50696, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 58087672 +final_heap_size_bytes: 56061112 +final_fragmentation_bytes: 1904512 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 62368 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 23.3056539 +collection_counts: [50746, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55920168 +final_heap_size_bytes: 55982344 +final_fragmentation_bytes: 1887904 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 53412 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 23.8455668 +collection_counts: [50616, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 61739896 +final_heap_size_bytes: 55838944 +final_fragmentation_bytes: 1884488 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 65728 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 23.7292667 +collection_counts: [50523, 323, 3] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 62671224 +final_heap_size_bytes: 55978144 +final_fragmentation_bytes: 1898280 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 64628 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.508404 +collection_counts: [50640, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 64216560 +final_heap_size_bytes: 56097368 +final_fragmentation_bytes: 1903136 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 57576 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.4228874 +collection_counts: [50659, 11, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 62785608 +final_heap_size_bytes: 56379328 +final_fragmentation_bytes: 1724816 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 27968 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.1675368 +collection_counts: [50609, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 59109808 +final_heap_size_bytes: 55923952 +final_fragmentation_bytes: 1858248 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x1:5 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 61168 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 21.5968574 +collection_counts: [50721, 14, 3] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66074328 +final_heap_size_bytes: 55833128 +final_fragmentation_bytes: 1745464 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt new file mode 100644 index 00000000000000..10f62f1b3d5188 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_Baseline.txt @@ -0,0 +1,284 @@ +Duration in seconds +------------------- +19.8961064 +19.206025 +20.0542862 +19.9611777 +20.4653831 +19.8848856 +19.9312506 +20.0551618 +19.671435 +20.231068 +---------------------- +19.93567794 (average) + + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 24520 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.8961064 +collection_counts: [50926, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 61251632 +final_heap_size_bytes: 58086336 +final_fragmentation_bytes: 3945408 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 41660 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.206025 +collection_counts: [50960, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55791536 +final_heap_size_bytes: 58300392 +final_fragmentation_bytes: 3948056 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 31336 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.0542862 +collection_counts: [50914, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 60374136 +final_heap_size_bytes: 57533224 +final_fragmentation_bytes: 3975448 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 42712 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.9611777 +collection_counts: [50897, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55580960 +final_heap_size_bytes: 58383224 +final_fragmentation_bytes: 3953624 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 33568 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.4653831 +collection_counts: [50857, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55607328 +final_heap_size_bytes: 58215040 +final_fragmentation_bytes: 3931776 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 33896 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.8848856 +collection_counts: [50934, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55872480 +final_heap_size_bytes: 58382160 +final_fragmentation_bytes: 3940656 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 41796 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.9312506 +collection_counts: [50931, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 64109136 +final_heap_size_bytes: 58342584 +final_fragmentation_bytes: 3973856 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 21784 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.0551618 +collection_counts: [50922, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 57598392 +final_heap_size_bytes: 57619760 +final_fragmentation_bytes: 3944216 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 44508 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.671435 +collection_counts: [50929, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55670080 +final_heap_size_bytes: 58372248 +final_fragmentation_bytes: 3927544 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ +__________________ + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 35560 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.231068 +collection_counts: [50914, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 58203344 +final_heap_size_bytes: 58321568 +final_fragmentation_bytes: 3975280 + +C:\github\chrisnas\runtime9_ref\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO __________________ \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt new file mode 100644 index 00000000000000..6a2ad9dbdb931c --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest+Events.txt @@ -0,0 +1,286 @@ +Duration in seconds +------------------- +21.4792368 +20.2993439 +21.1766376 +22.1099492 +20.6253209 +20.4882028 +20.8665674 +21.4560576 +21.3212808 +20.5638198 +---------------------- +21.0216025 (median) +21.03864168 (average) + + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 66312 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 21.4792368 +collection_counts: [50717, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66072144 +final_heap_size_bytes: 56478584 +final_fragmentation_bytes: 1818416 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 57168 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.2993439 +collection_counts: [50742, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 54695160 +final_heap_size_bytes: 55915936 +final_fragmentation_bytes: 1919816 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 30336 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 21.1766376 +collection_counts: [50704, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55273496 +final_heap_size_bytes: 56467736 +final_fragmentation_bytes: 1811040 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 37628 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 22.1099492 +collection_counts: [50531, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 57894200 +final_heap_size_bytes: 58227536 +final_fragmentation_bytes: 3606488 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 56096 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.6253209 +collection_counts: [50731, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 58559456 +final_heap_size_bytes: 56497384 +final_fragmentation_bytes: 1844328 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 75284 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.4882028 +collection_counts: [50703, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 56799888 +final_heap_size_bytes: 56426552 +final_fragmentation_bytes: 1773128 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 63320 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.8665674 +collection_counts: [50694, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 57237456 +final_heap_size_bytes: 55808024 +final_fragmentation_bytes: 1835496 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 56612 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 21.4560576 +collection_counts: [50648, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 58474568 +final_heap_size_bytes: 58284656 +final_fragmentation_bytes: 3691816 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 15564 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 21.3212808 +collection_counts: [50689, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 59064240 +final_heap_size_bytes: 57605248 +final_fragmentation_bytes: 3663600 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 3776 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.5638198 +collection_counts: [50761, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66419896 +final_heap_size_bytes: 56421288 +final_fragmentation_bytes: 1830984 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt new file mode 100644 index 00000000000000..6a8b99caa3f9a1 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/perf/GCPerfSimx10_PullRequest.txt @@ -0,0 +1,311 @@ +Duration in seconds +------------------- +19.1773522 +20.0924279 +20.1548909 +20.2996304 +19.9037176 +19.9528813 +19.6312431 +19.2613891 +19.6083818 +18.8774711 +---------------------- +19.69593854 (average) + + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 8 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.3572435 +collection_counts: [50920, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 65455968 +final_heap_size_bytes: 58357064 +final_fragmentation_bytes: 3915584 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 30128 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.1773522 +collection_counts: [50956, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 63548608 +final_heap_size_bytes: 58392416 +final_fragmentation_bytes: 3950576 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 34652 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.0924279 +collection_counts: [50912, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 57182464 +final_heap_size_bytes: 57644400 +final_fragmentation_bytes: 3965360 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 17960 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.1548909 +collection_counts: [50934, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55941992 +final_heap_size_bytes: 57975680 +final_fragmentation_bytes: 3982776 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 41712 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 20.2996304 +collection_counts: [50950, 10, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66670400 +final_heap_size_bytes: 58314976 +final_fragmentation_bytes: 3875168 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 28108 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.9037176 +collection_counts: [50942, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55632448 +final_heap_size_bytes: 58199336 +final_fragmentation_bytes: 3874504 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 40348 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.9528813 +collection_counts: [50921, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 55041704 +final_heap_size_bytes: 58356168 +final_fragmentation_bytes: 3914816 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 39776 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 0 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.6312431 +collection_counts: [50923, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 65766616 +final_heap_size_bytes: 58419512 +final_fragmentation_bytes: 3982760 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 22944 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.2613891 +collection_counts: [50951, 9, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66551888 +final_heap_size_bytes: 58370160 +final_fragmentation_bytes: 3928744 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 34784 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 3 stopping phase after 128000MB +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 19.6083818 +collection_counts: [50938, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 66478920 +final_heap_size_bytes: 58405344 +final_fragmentation_bytes: 3980528 + +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>ECHO _____________ +_____________ +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root> +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root> +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root> +C:\github\chrisnas\runtime\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root>corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +corerun C:\github\chrisnas\runtime9_ref\performance\artifacts\bin\GCPerfSim\Release\net7.0\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time +allocating 134,217,728,000 per thread +Running 64-bit? True +PID: 44316 +Running 4 threads. +time, ReferenceItem, tlgb 0.01249999925494194, tagb 125, totalMins 0, buckets: + 100-4000; surv every 0; pin every 0; weight 1000; isPoh False +Thread 2 stopping phase after 128000MB +Thread 0 stopping phase after 128000MB +Thread 1 stopping phase after 128000MB +Thread 3 stopping phase after 128000MB +=== STATS === +sohAllocatedBytes: 536870916212 +lohAllocatedBytes: 0 +pohAllocatedBytes: 0 +seconds_taken: 18.8774711 +collection_counts: [50909, 8, 2] +num_created_with_finalizers: 0 +num_finalized: 0 +final_total_memory_bytes: 58476048 +final_heap_size_bytes: 58355880 +final_fragmentation_bytes: 3991832 \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt new file mode 100644 index 00000000000000..8cd8ece0f1ee40 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_Finalizer+Array+String.txt @@ -0,0 +1,57 @@ +> starts 100 iterations allocating 1000000 instances of different types with different sizes (24 for WithFinalizer, 96 for string and 200 for byte[]) +... +< run stops + +Allocate.WithFinalizer +------------------------- + 1 -15.9 % + 2 -12.5 % + 3 -12.5 % + 4 -12.5 % + 5 -12.1 % + ... + 49 -0.6 % + 50 -0.6 % + 51 -0.6 % + ... + 96 11.8 % + 97 12.7 % + 98 12.7 % + 99 14.8 % + 100 17.8 % + +System.String +------------------------- + 1 -10.6 % + 2 -7.1 % + 3 -6.4 % + 4 -6.3 % + 5 -6.1 % + ... + 49 0.3 % + 50 0.3 % + 51 0.4 % + ... + 96 6.4 % + 97 6.4 % + 98 6.6 % + 99 6.6 % + 100 7.2 % + +System.Byte[] (200 bytes) +------------------------- + 1 -6.8 % + 2 -6.7 % + 3 -5.4 % + 4 -5.4 % + 5 -5.0 % + ... + 49 -0.2 % + 50 -0.1 % + 51 -0.1 % + ... + 96 3.4 % + 97 3.6 % + 98 4.7 % + 99 5.1 % + 100 5.3 % \ No newline at end of file diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt new file mode 100644 index 00000000000000..091819cb25ea73 --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/100x1000000_RatioAllocations.txt @@ -0,0 +1,113 @@ + +> starts 100 iterations allocating 1000000 instances of class with proportional sizes (24 bytes, 32 bytes, 48 bytes, 64 bytes, 72 bytes and 96 bytes) +... +< run stops + +Object24 +------------------------- + 1 -17.6 % + 2 -15.9 % + 3 -15.5 % + 4 -13.0 % + 5 -12.9 % + ... + 49 -0.6 % + 50 -0.6 % + 51 -0.6 % + ... + 96 10.9 % + 97 11.4 % + 98 11.8 % + 99 13.9 % + 100 15.6 % + +Object32 +------------------------- + 1 -17.7 % + 2 -10.4 % + 3 -9.1 % + 4 -8.5 % + 5 -8.5 % + ... + 49 -0.5 % + 50 -0.5 % + 51 -0.5 % + ... + 96 10.7 % + 97 11.4 % + 98 12.0 % + 99 13.9 % + 100 15.2 % + +Object48 +------------------------- + 1 -13.6 % + 2 -10.0 % + 3 -10.0 % + 4 -9.5 % + 5 -9.5 % + ... + 49 -0.6 % + 50 -0.4 % + 51 -0.4 % + ... + 96 9.0 % + 97 10.3 % + 98 10.3 % + 99 11.0 % + 100 12.5 % + +Object64 +------------------------- + 1 -10.1 % + 2 -9.4 % + 3 -8.8 % + 4 -8.3 % + 5 -7.8 % + ... + 49 -0.3 % + 50 -0.1 % + 51 -0.1 % + ... + 96 5.8 % + 97 6.1 % + 98 6.4 % + 99 8.8 % + 100 10.9 % + +Object72 +------------------------- + 1 -10.5 % + 2 -8.9 % + 3 -8.4 % + 4 -7.8 % + 5 -6.4 % + ... + 49 0.2 % + 50 0.2 % + 51 0.2 % + ... + 96 6.8 % + 97 6.8 % + 98 7.7 % + 99 8.6 % + 100 10.0 % + +Object96 +------------------------- + 1 -8.2 % + 2 -7.1 % + 3 -6.4 % + 4 -6.1 % + 5 -5.3 % + ... + 49 -0.2 % + 50 -0.2 % + 51 -0.2 % + ... + 96 4.9 % + 97 5.0 % + 98 5.5 % + 99 5.9 % + 100 7.6 % + diff --git a/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt new file mode 100644 index 00000000000000..d1337607cde42b --- /dev/null +++ b/src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/statistical_distribution/10x100000_RatioArrayAllocations.txt @@ -0,0 +1,70 @@ + +> starts 10 iterations allocating 100000 instances of System.Byte[] (1K, 10K, 100K, 1M, 10M) +... +< run stops + +System.Byte[] (1048 bytes) +------------------------- + 1 -6.6 % + 2 -4.5 % + 3 -2.8 % + 4 -0.4 % + 5 1.1 % + 6 1.4 % + 7 2.6 % + 8 2.6 % + 9 2.7 % + 10 2.8 % + +System.Byte[] (10264 bytes) +------------------------- + 1 -3.6 % + 2 -0.5 % + 3 -0.4 % + 4 -0.4 % + 5 -0.3 % + 6 0.2 % + 7 0.5 % + 8 1.1 % + 9 1.2 % + 10 2.0 % + +System.Byte[] (102424 bytes) +------------------------- + 1 -0.5 % + 2 -0.2 % + 3 -0.2 % + 4 -0.0 % + 5 -0.0 % + 6 0.0 % + 7 0.1 % + 8 0.2 % + 9 0.2 % + 10 0.4 % + +System.Byte[] (1024024 bytes) +------------------------- + 1 -0.0 % + 2 -0.0 % + 3 -0.0 % + 4 -0.0 % + 5 -0.0 % + 6 0.0 % + 7 0.0 % + 8 0.0 % + 9 0.0 % + 10 0.0 % + +System.Byte[] (10240024 bytes) +------------------------- + 1 -0.0 % + 2 -0.0 % + 3 -0.0 % + 4 -0.0 % + 5 0.0 % + 6 0.0 % + 7 0.0 % + 8 0.0 % + 9 0.0 % + 10 0.0 % + diff --git a/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs b/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs index 6472a2f995cc9f..d6be259e8704ef 100644 --- a/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs +++ b/src/tests/tracing/eventpipe/simpleruntimeeventvalidation/simpleruntimeeventvalidation.cs @@ -2,15 +2,17 @@ // The .NET Foundation licenses this file to you under the MIT license. using System; +using System.Collections.Generic; using System.Diagnostics.Tracing; using System.IO; using System.Linq; +using System.Text; using System.Threading; using System.Threading.Tasks; -using System.Collections.Generic; using Microsoft.Diagnostics.Tracing; -using Tracing.Tests.Common; +using Microsoft.Diagnostics.Tracing.Parsers.Clr; using Microsoft.Diagnostics.NETCore.Client; +using Tracing.Tests.Common; using Xunit; namespace Tracing.Tests.SimpleRuntimeEventValidation @@ -24,28 +26,30 @@ public static int TestEntryPoint() var ret = IpcTraceTest.RunAndValidateEventCounts( // Validation is done with _DoesTraceContainEvents new Dictionary(){{ "Microsoft-Windows-DotNETRuntime", -1 }}, - _eventGeneratingActionForGC, - // GCKeyword (0x1): 0b1, GCAllocationTick requries Verbose level + _eventGeneratingActionForGC, + // GCKeyword (0x1): 0b1, GCAllocationTick requires Verbose level new List(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Verbose, 0b1)}, 1024, _DoesTraceContainGCEvents, enableRundownProvider:false); + if (ret != 100) + return ret; // Run the 2nd test scenario only if the first one passes - if(ret== 100) + if (ret == 100) { ret = IpcTraceTest.RunAndValidateEventCounts( - new Dictionary(){{ "Microsoft-DotNETCore-EventPipe", 1 }}, - _eventGeneratingActionForExceptions, + new Dictionary(){{ "Microsoft-DotNETCore-EventPipe", 1 }}, + _eventGeneratingActionForExceptions, // ExceptionKeyword (0x8000): 0b1000_0000_0000_0000 - new List(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Warning, 0b1000_0000_0000_0000)}, + new List(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Warning, 0b1000_0000_0000_0000)}, 1024, _DoesTraceContainExceptionEvents, enableRundownProvider:false); - if(ret == 100) + if (ret == 100) { - ret = IpcTraceTest.RunAndValidateEventCounts( - new Dictionary(){{ "Microsoft-Windows-DotNETRuntime", -1}}, - _eventGeneratingActionForFinalizers, - new List(){new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0b1)}, - 1024, _DoesTraceContainFinalizerEvents, enableRundownProvider:false); + ret = IpcTraceTest.RunAndValidateEventCounts( + new Dictionary() { { "Microsoft-Windows-DotNETRuntime", -1 } }, + _eventGeneratingActionForFinalizers, + new List() { new EventPipeProvider("Microsoft-Windows-DotNETRuntime", EventLevel.Informational, 0b1) }, + 1024, _DoesTraceContainFinalizerEvents, enableRundownProvider: false); } } @@ -55,7 +59,7 @@ public static int TestEntryPoint() return 100; } - private static Action _eventGeneratingActionForGC = () => + private static Action _eventGeneratingActionForGC = () => { for (int i = 0; i < 50; i++) { @@ -70,7 +74,7 @@ public static int TestEntryPoint() } }; - private static Action _eventGeneratingActionForExceptions = () => + private static Action _eventGeneratingActionForExceptions = () => { for (int i = 0; i < 10; i++) { @@ -110,7 +114,7 @@ public static int TestEntryPoint() int GCRestartEEStartEvents = 0; int GCRestartEEStopEvents = 0; source.Clr.GCRestartEEStart += (eventData) => GCRestartEEStartEvents += 1; - source.Clr.GCRestartEEStop += (eventData) => GCRestartEEStopEvents += 1; + source.Clr.GCRestartEEStop += (eventData) => GCRestartEEStopEvents += 1; int GCSuspendEEEvents = 0; int GCSuspendEEEndEvents = 0; @@ -148,7 +152,7 @@ public static int TestEntryPoint() private static Func> _DoesTraceContainExceptionEvents = (source) => { int ExStartEvents = 0; - source.Clr.ExceptionStart += (eventData) => + source.Clr.ExceptionStart += (eventData) => { if(eventData.ToString().IndexOf("System.ArgumentNullException")>=0) ExStartEvents += 1;