JIT: initial work on general count reconstruction #99992

AndyAyersMS · 2024-03-20T02:58:31Z

Implements a Gauss-Seidel solver for cases where method have irreducible loops.

dotnet-policy-service · 2024-03-20T02:59:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

AndyAyersMS · 2024-03-20T02:59:31Z

Currently this kicks in whenever the initial profile is inconsistent... which turns out to be fairly often. Doing a "blend" may be too radical, we should perhaps do a "repair".

jakobbotsch · 2024-03-20T11:05:16Z

src/coreclr/jit/fgprofilesynthesis.cpp

    for (unsigned i = m_dfsTree->GetPostOrderCount(); i != 0; i--)
    {
        BasicBlock* block = m_dfsTree->GetPostOrder(i - 1);
        ComputeBlockWeight(block);
    }
+
+    m_approximate = (m_cappedCyclicProbabilities) || (m_improperLoopHeaders > 0);


Note that there can be cycles in the flow graph even without any improper loop headers or natural loops (it's the reason why I originally removed m_improperLoopHeaders from FlowGraphNaturalLoops). #96153 has some more information. I'm guessing it doesn't affect whether or not the computation is an approximation here because those cases happen only due to exceptional flow?

Right, we assume exceptional flow is sufficiently rare that any cycle it might form can be ignored. Also we don't model flow in and out of handlers, except for normally invoked finallys.

This might need some tweaks when we run this later after the importer rewires the flow.

AndyAyersMS · 2024-03-20T14:43:21Z

Looks like failures above are all timeouts. I think it's unrelated, SPMI TP doesn't show much impact.

AndyAyersMS · 2024-03-20T19:45:31Z

Assert failure(PID 13812 [0x000035f4], Thread: 14128 [0x3730]): promoted_bytes (heap_number) == promoted

CORECLR! SVR::gc_heap::mark_phase + 0x167C (0x00007fffc6c7149c) CORECLR! SVR::gc_heap::gc1 + 0x240 (0x00007fffc6c63220)
CORECLR! SVR::gc_heap::garbage_collect + 0x99B (0x00007fffc6c62f8b) CORECLR! SVR::gc_heap::gc_thread_function + 0x504 (0x00007fffc6c64934)
CORECLR! SVR::gc_heap::gc_thread_stub + 0x7E (0x00007fffc6c64dde) CORECLR! <lambda_69e1faed69ef1eb68c88ad291f228203>::<lambda_invoker_cdecl> + 0x74 (0x00007fffc692e204)
KERNEL32! BaseThreadInitThunk + 0x14 (0x00007ffff85484d4) NTDLL! RtlUserThreadStart + 0x21 (0x00007ffff8a01791)
File: D:\a_work\1\s\src\coreclr\gc\gc.cpp:24234

AndyAyersMS · 2024-03-22T18:41:48Z

Numeric Solvers for Profile Count Reconstruction

It may not be readily apparent how count reconstruction works. Perhaps these notes will shed some light on things.

In our flowgraph model we assume that the edge likelihoods are trustworthy and well formed (meaning each edge's likelihood is in [0,1] and the sum of all likelihoods for a block's successor edges is 1).

The appeal of edge well-formedness is easy to check and relatively easy to maintain during various optimizations. It is a local property.

We will use $p_{i,j}$ to denote the likelihood that block $i$ transfers control to block $j$. Thus local consistency means:

$$ 0 \le p_{i,j} \le 1 $$ and, for blocks with successors: $$ \sum_i p_{i,j} = 1 $$

By contrast, block weight consistency requires that the flow into a block be balanced by the flow out of a block. It is a global property and harder to maintain during optimizations. It may also not be true initially.

We will use $w_j$ for the weight of block $j$. We will also assume there is an external source and sink of weight for some blocks (method entry and exit points), $e_j$. Then block consistency means:

$$ e_j + \sum_i w_i p_{i,j} = \sum_k w_j p_{j,k} $$

where the LHS is flow in and the RHS is flow out of block $j$. But

$$ \sum_k w_j p_{j,k} = w_j \sum_k p_{j,k} = w_j $$

so we can restate this as saying the external flow plus the flow into the block must equal the block weight:

$$ e_j + \sum_i w_i p_{i,j} = w_j$$

The goal of this work is to explore methods for reconstructing a set of consistent block weights $w_j$ from the external weight sources and sinks $e_j$ and edge likelihoods $p_{i,j}$.

General Solution

The above can be summarized in matrix-vector form as

$$ \boldsymbol w = \boldsymbol e + \boldsymbol P \boldsymbol w $$

where to be able to express the sum of incoming flow as a standard matrix-vector product we have:

$$ \boldsymbol P_{i,j} = { p_{j,i} } $$

(that is, in $\boldsymbol P$, the flow from block $i$ is described by the entries in the $i\text{th}$ column, and the flow into block $i$ by the $i\text{th}$ row). A bit of rearranging puts this into the standard linear equation form

$$ (\boldsymbol I - \boldsymbol P) \boldsymbol w = \boldsymbol e$$

and this can be solved (in principle) for $\boldsymbol w$ by computing the inverse of $\boldsymbol I - \boldsymbol P$ (assuming this exists), giving

$$ \boldsymbol w = {(\boldsymbol I - \boldsymbol P)}^{-1} \boldsymbol e $$

For example, given the following graph with edge likelihoods a shown:

we have

$$\boldsymbol P = \begin{bmatrix} 0 & 0 & 0 & 0 \cr 1 & 0 & 0.8 & 0 \cr 0 & 0.5 & 0 & 0 \cr 0 & 0.5 & 0.2 & 0 \end{bmatrix}$$

Note each column save the last sums to 1.0, representing the fact that the outgoing likelihoods from each block must sum to 1.0, unless there are no successors.

Thus

$$(\boldsymbol I - \boldsymbol P) = \begin{bmatrix} 1 & 0 & 0 & 0 \\\\ -1 & 1 & -0.8 & 0 \\\\ 0 & -0.5 & 1 & 0 \\\\ 0 & -0.5 & -0.2 & 1 \end{bmatrix}$$

and so (details of computing the inverse left as exercise for the reader)

$${(\boldsymbol I - \boldsymbol P)}^{-1} = \begin{bmatrix} 1 & 0 & 0 & 0 \\\\ 1.67 & 1.67 & 1.33 & 0 \\\\ 0.83 & 0.83 & 1.67 & 0 \\\\ 1 & 1 & 1 & 1 \end{bmatrix}$$

Note the elements of ${(\boldsymbol I - \boldsymbol P)}^{-1}$ are all non-negative; intuitively, if we increase flow anywhere in the graph, it can only cause weights to increase or stay the same.

If we feed 6 units of flow into A, we have

$$\boldsymbol w = \begin{bmatrix} 6 \\\ 10 \\\ 5 \\\ 6 \end{bmatrix}$$

or graphically

However, explicit computation of the inverse of a matrix is computationally expensive.

Also note (though it's not fully obvious from such a small example) that the matrix $(\boldsymbol I - \boldsymbol P)$ is sparse: a typical block has only 1 or 2 successors, so the number of nonzero entries in each column will generally be either 2 or 3, no matter how many nodes we have. The inverse of a sparse matrix is typically not sparse, so computing it is not only costly in time but also in space.

So solution techniques that can leverage sparseness are of particular interest.

A More Practical Solution

Note the matrix $\boldsymbol I - \boldsymbol P$ has non-negative diagonal elements and negative non-diagonal elements, since all entries of $\boldsymbol P$ are in the range [0,1].

If we further restrict ourselves to the case where $p_{i,i} \lt 1$ (meaning there are are no infinite self-loops) then all the diagonal entries are positive and the matrix has an inverse with no negative elements.

Such matrices are known as M-matrices.

It is well known that for an M-matrix $(\boldsymbol I - \boldsymbol P)$ the inverse can be computed as the limit of an infinite series

$$ {(\boldsymbol I - \boldsymbol P)}^{-1} = \boldsymbol I + \boldsymbol P + \boldsymbol P^2 + \dots $$

This gives rise to a simple iterative procedure for computing an approximate value of $\boldsymbol w$ (here superscripts on $\boldsymbol w$ are successive iterates, not powers)

$$ \boldsymbol w^{(0)} = \boldsymbol e $$ $$ \boldsymbol w^{(1)} = (\boldsymbol I + \boldsymbol P) \boldsymbol e = \boldsymbol e + \boldsymbol P \boldsymbol w^{(0)} $$ $$ \boldsymbol w^{(2)} = (\boldsymbol I + \boldsymbol P + \boldsymbol P^2) \boldsymbol e = \boldsymbol e + \boldsymbol P \boldsymbol w^{(1)}$$ $$ \dots$$ $$ \boldsymbol w^{(k + 1)} = \boldsymbol e + \boldsymbol P \boldsymbol w^{(k)} $$

where we can achieve any desired precision for $\boldsymbol w$ by iterating until the successive $\boldsymbol w$ differ by a small amount.

Intuitively this should make sense, we are effectively pouring weight into the entry block(s) and letting the weights flow around in the graph until they reach a fixed point. If we do this for the example above, we get the following sequence of values for $\boldsymbol w^n$:

$$\boldsymbol w^{(0)} = \begin{bmatrix} 6 \\\ 0 \\\ 0 \\\ 0 \end{bmatrix}, \boldsymbol w^{(1)} = \begin{bmatrix} 6 \\\ 6 \\\ 0 \\\ 0 \end{bmatrix}, \boldsymbol w^{(2)} = \begin{bmatrix} 6 \\\ 6 \\\ 3 \\\ 3 \end{bmatrix}, \boldsymbol w^{(3)} = \begin{bmatrix} 6 \\\ 8.4 \\\ 3 \\\ 3.6 \end{bmatrix}, \boldsymbol w^{(4)} = \begin{bmatrix} 6 \\\ 8.4 \\\ 4.2 \\\ 3.6 \end{bmatrix}, \boldsymbol w^{(5)} = \begin{bmatrix} 6 \\\ 9.36 \\\ 4.2 \\\ 3.6 \end{bmatrix}, \dots, \boldsymbol w^{(20)} = \begin{bmatrix} 6 \\\ 9.9990 \\\ 4.9995 \\\ 5.9992 \end{bmatrix}, \dots$$

and the process converges to the weights found using the inverse. However convergence is fairly slow.

Classically this approach is known as Jacobi's method. At each iterative step, the new values are based only on the old values.

Jacobi's Method

If you read the math literature on iterative solvers, Jacobi's method is often described as follows. Given a linear system $\boldsymbol A \boldsymbol x = \boldsymbol b$, a splitting of $\boldsymbol A$ is $\boldsymbol A = \boldsymbol M - \boldsymbol N$, where $\boldsymbol M^{-1}$ exists. Then the iteration matrix $\boldsymbol H$ is given by $\boldsymbol H = \boldsymbol M^{-1} \boldsymbol N$. Given some initial guess at an answer $\boldsymbol x^{(0)}$ the iteration scheme is:

$$ \boldsymbol x^{(k+1)} = \boldsymbol H \boldsymbol x^{(k)} + \boldsymbol M^{-1}\boldsymbol b$$

And provided that $\rho(\boldsymbol H) \lt 1$,

$$\lim_{k \to \infty} \boldsymbol x^{(k)}=\boldsymbol A^{-1} \boldsymbol b$$

In our case $\boldsymbol A = \boldsymbol I - \boldsymbol P$ and so the splitting is simply $\boldsymbol M = \boldsymbol I$ and $\boldsymbol N = \boldsymbol P$. Since $\boldsymbol M = \boldsymbol I$, $\boldsymbol M^{-1} = \boldsymbol I$ (the identity matrix is its own inverse), $\boldsymbol H = \boldsymbol P$, $\boldsymbol x = \boldsymbol w$ and $\boldsymbol b = \boldsymbol e$, we end up with

$$ \boldsymbol w^{(k+1)} = \boldsymbol P \boldsymbol w^{(k)} + \boldsymbol e$$

as we derived above.

As an alternative we could split $\boldsymbol A = (\boldsymbol I - \boldsymbol P)$ into diagonal part $\boldsymbol M = \boldsymbol D$ and remainder part $\boldsymbol N$. This only leads to differences from the splitting above when there are self loops, otherwise the diagonal of $\boldsymbol P$ is all zeros.

With that splitting,

$$\boldsymbol D^{-1}_{i,i} = 1/a_{i,i} = 1/(1 - p_{i,i})$$

so as $p_{i,i}$ gets close to 1.0 the value can be quite large: these are the count amplifications caused by self-loops. If we write things out component-wise we get the classic formulation for Jacobi iteration:

$$x^{(k+1)}_i = \frac{1}{a_{i,i}} \left (b_i - \sum_{j \ne i} a_{i,j} x^{(k)}_j \right)$$

or in our block weight and edge likelihood notation

$$w^{(k+1)}_i = \frac{1}{(1 - p_{i,i})} \left (e_i + \sum_{j \ne i} p_{j,i} w^{(k)}_j \right)$$

Intuitively this reads: the new value of node $i$ is the sum of the external input (if any) plus the weights flowing in from (non-self) predecessors, with the sum scaled up by the self-loop factor.

On Convergence and Stability

While the iterative method above is guaranteed to converge when $\boldsymbol A$ is an M-matrix, its rate of convergence is potentially problematic. For an iterative scheme, the asymptotic rate of convergence can be shown to be $R \approx -log_{10} \rho(\boldsymbol H)$ digits / iteration.

Here the spectral radius $\rho(\boldsymbol H)$ is the magnitude of the largest eigenvalue of $\boldsymbol H$. For the example above $\boldsymbol H = \boldsymbol P$ and $\rho(\boldsymbol P) \approx 0.63$, giving $R = 0.2$. So to converge to $4$ decimal places takes about $20$ iterations, as the table of data above indicates.

it is also worth noting that for synthesis the matrix $(\boldsymbol I - \boldsymbol P)$ is often ill-conditioned, meaning that small changes in the input vector $\boldsymbol e$ (or small inaccuracies in the likelihoods $p_{i,j}$) can lead to large changes in the solution vector $\boldsymbol w$. In some sense this is a feature; we know that blocks in flow graphs can have widely varying weights, with some blocks rarely executed and others executed millions of times per call to the method. So it must be possible for $(\boldsymbol I - \boldsymbol P)$ to amplify the magnitude of a "small" input (say 1 call to the method) into large block counts.

Accelerating Convergence I: Gauss-Seidel and Reverse Postorder

It's also well-known that Gauss-Seidel iteration often converges faster than Jacobi iteration. Here instead of always using the old iteration values, we try and use the new iteration values that are available, where we presume each update happens in order of increasing $i$:

$$x^{(k+1)}_i = \frac{1}{a_{i,i}} \left(b_i - \sum_{j \lt i} a_{i,j} x^{(k+1)}_j - \sum_{j \gt i} a_{i,j} x^{(k)}_j \right) $$$$

or again in our notation

$$w^{(k+1)}_i = \frac{1}{(1 - p_{i,i})} \left(e_i + \sum_{j \lt i} p_{j,i} w^{(k + 1)}_j + \sum_{j \gt i} p_{j,i} w^{(k)}_j \right) $$$$

In the above scheme the order of visiting successive blocks is fixed unspecified, and (in principle) any order can be used. But by using a reverse postorder to index the blocks, we can ensure a maximal amount of forward propagation per iteration. Note that if a block has an incoming edge from a node that appears later in the reverse postorder, that block is a loop header.

If we do, that the code above nicely corresponds to our notion of forward and backward edges in the RPO:

$$w^{(k+1)}_i = \frac{1}{\underbrace{(1 - p_{i,i}}_\text{self edge})} \left(e_i + \underbrace{\sum_{j \lt i} p_{j,i} w^{(k + 1)}_j}_\text{forward edges in RPO} + \underbrace{\sum_{j \gt i} p_{j,i} w^{(k)}_j}_\text{backward edges in RPO} \right)$$

Note because of the order of reads and writes, $\boldsymbol w^{(k+1)}$ can share storage with $\boldsymbol w^{(k)}$.

On the example above this results in:

$$ \boldsymbol w^{(0)} = \begin{bmatrix} 6 \\ 6 \\ 3 \\ 3 \end{bmatrix}, \boldsymbol w^{(1)} = \begin{bmatrix} 6 \\ 8.4 \\ 4.2 \\ 5.04 \end{bmatrix}, \boldsymbol w^{(2)} = \begin{bmatrix} 6 \\ 9.36 \\ 4.68 \\ 5.62 \end{bmatrix}, \boldsymbol w^{(3)} = \begin{bmatrix} 6 \\ 9.74 \\ 4.87 \\ 5.85 \end{bmatrix}, \boldsymbol w^{(4)} = \begin{bmatrix} 6 \\ 9.90 \\ 4.95 \\ 5.94 \end{bmatrix}, \boldsymbol w^{(5)} = \begin{bmatrix} 6 \\ 9.96 \\ 4.98 \\ 5.98 \end{bmatrix}, \dots, \boldsymbol w^{(9)} = \begin{bmatrix} 6 \\ 9.9990 \\ 4.9995 \\ 5.9994 \end{bmatrix}, \dots $$

So it is converging about twice as fast. As with the Jacobi method one can re-express this as a splitting and determine an iteration matrix $\boldsymbol H$ and determine the dominant eigenvalue, and from this the rate of convergence, but we will not do so here.

Accelerating Convergence II: Cyclic Probabilities

A flow graph is reducible (or is said to have reducible loops) if every cycle in the graph has a block in the cycle that dominates the other blocks in the cycle. We will call such cycles natural loops, distinguished by their entry blocks.

For reducible loops we can compute the amount by which they amplify flow using a technique described by Wu and Larus: given a loop head $h$ we classify the predecessor into two sets: input edges that do not come from a block within the loop, and back edges that come from a block within the loop. We then inject one unit of flow into the block and propagate it through the loop, and compute the sum of the weights on the back edges. This value will be some $p$ where $0 \le p \le 1$. Then the cyclic probability $C_p$ for $h$ is $C_p(h) = 1 / (1 - p)$. To avoid dividing by zero we artificially cap $C_p$ at some value less than $1$.

If we add this refinement to our algorithm we end up with:

$$w^{(k+1)}_i = \begin{cases} C_p(i) \left(e_i + \sum_{j \lt i} p_{j,i} w^{(k + 1)}_j \right), \text{ block } i \text{ is a natural loop head} \\\ \frac{1}{(1 - p_{i,i})} \left(e_i + \sum_{j \lt i} p_{j,i} w^{(k + 1)}_j + \sum_{j \gt i} p_{j,i} w^{(k)}_j \right) \end{cases}$$

the second clause includes both blocks without any back edges and blocks with back edges that are not headers of natural loops.

On an example like the one above this converges in one pass. If any $C_p$ was capped then the solution will be approximate and we will have failed to achieve a global balance. But we will also (generally) have avoided creating infinite or very large counts.

One can imagine that if we cap some $C_p$ we could also try to alter some of the $p_{j,i}$ to bring things back into balance, but this seems tricky if there are multiple paths through the loop. And we're basically deciding at that point that consistency is more important than accuracy.

Since the remainder of the JIT is going to have to cope with lack of global balance anyways (recall it is hard to preserve) for now we are going to ty and tolerate reconstruction inconsistencies.

The algorithm described above is implemented in the code as the GaussSeidel solver.

Cycles That Are Not Natural Loops, More Sophisticated Solvers, and Deep Nests

If the flow graph has cycles that are not natural loops (irreducible loops) the above computations will converge but again may converge very slowly. On a sample of about 500 graphs with irreducible loops the modified Gauss-Seidel approach above required more than 20 iterations in 120 cases and more than 50 iterations in 70 cases, with worst-case around 500 iterations.

SOR is a classic convergence altering technique, but unfortunately, for M-Matrices SOR can only safely be used to slow down convergence.

There does not seem to be a good analog of $C_p$ for such cases, though it's possible that "block diagonal" solvers may be tackling exactly that problem.

It's possible that more sophisticated solution techniques like BiCGstab or CGS might be worth consideration. Or perhaps a least-squares solution, if we're forced to be approximate, to try and minimize the overall approximation error.

In very deep loop nests even $C_p$ is not enough to prevent creation of large counts. We could try and adjust the cap level downwards as the loops get deeper, or distribute the $C_p$ "tax" across all the loops. This tends to only be a problem for stress test cases.

References

Carl D. Meyer. Matrix Analysis and Applied Linear Algebra, in particular section 7.10.

Nick Higham. What is an M-Matrix?

Youfeng Wu and James R. Larus. Static branch frequency and program profile analysis, Micro-27 (1994).

AndyAyersMS · 2024-03-23T19:02:06Z

@jakobbotsch ptal
cc @dotnet/jit-contrib

Think repair works better than blend here.

Fair number of diffs, likely from cases where PGO counts were not consistent (and likely much of that from the approximate count probes).

Diffs

BruceForstall · 2024-03-27T17:56:11Z

Some notes:

This makes no sense:

$$ 0 \ge p_{i,j} \ge 1 $$

Shouldn't the extensive comments in this issue (JIT: initial work on general count reconstruction #99992 (comment)) be encapsulated in the code as comment, or adjacent as markdown?
I'm worried about the math-y "rocket science" complexity here: will anyone be able to debug a failure?

AndyAyersMS · 2024-03-27T18:34:45Z

Yeah I will copy this writeup somewhere.

This is a standard iterative linear equation solver; the only real nuance is using RPO to ensure maximum forward propagation (hopefully a familiar concept to anyone working on the jit) and then using the cyclic probabilities.

There's also a fairly obvious intuitive explanation, we pour some counts in the top and iteratively flow them around using the edge likelihoods until we reach a fixed point or run out of iterations. Most of what's above is just an explanation for why this works, why it might take a long time, and why there are cases it can't solve exactly.

I have held off on implementing something more "rocket science" state of the art (like BiCGSTAB or CGS) which rely heavily on less familiar bits of linear algebra.

AndyAyersMS · 2024-03-28T22:58:21Z

Some issues that have come up:

Assertion failed 'change >= 0' #100350 (fixed by JIT: fix count reconstruction problem #100385)
if a natural loop contains an improper header, that natural loop and all enclosing natural loop $C_p$ values may be inaccurate (too low), so we can't rely on them alone for convergence acceleration. It is probably ok to use them sparingly (perhaps initially), but there needs to be at least one trailing iteration that does not use them.

I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.

I left a long note on the algorithm as a comment on #99992. Move it to the doc folder.

I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.

JIT: initial work on general count reconstruction

b51cb47

Implements a Gauss-Seidel solver for cases where method have irreducible loops.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 20, 2024

dotnet-policy-service bot assigned AndyAyersMS Mar 20, 2024

build-analysis bot mentioned this pull request Mar 20, 2024

Tracking issue for CI build timeouts #76454

Closed

jakobbotsch reviewed Mar 20, 2024

View reviewed changes

check for self edges, add note about possible divergence

6a2e2a6

AndyAyersMS mentioned this pull request Mar 20, 2024

Assert failure(PID 13812 [0x000035f4], Thread: 14128 [0x3730]): promoted_bytes (heap_number) == promoted #100035

Closed

repair instead of blend

865e50a

build-analysis bot mentioned this pull request Mar 23, 2024

Failure in GC\API\NoGCRegion\Callback_Svr\Callback_Svr.cmd #100149

Closed

AndyAyersMS marked this pull request as ready for review March 23, 2024 19:00

jakobbotsch approved these changes Mar 25, 2024

View reviewed changes

Merge branch 'main' into IterativeSolver

60be72b

AndyAyersMS mentioned this pull request Mar 26, 2024

System.Runtime.Loader.Tests.AssemblyLoadContextTest.Unload_CollectibleWithTwoAssemblies [FAIL] #100295

Closed

AndyAyersMS merged commit ba84d1e into dotnet:main Mar 26, 2024
108 of 110 checks passed

BruceForstall mentioned this pull request Mar 27, 2024

Fix JitOptRepeat #94250

Merged

AndyAyersMS mentioned this pull request Mar 29, 2024

JIT: Flow Graph Modernization and Improved Block Layout #93020

Closed

51 tasks

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Apr 3, 2024

JIT: add note on profile reconstruction

966192b

I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.

AndyAyersMS mentioned this pull request Apr 3, 2024

JIT: add note on profile reconstruction #100601

Merged

AndyAyersMS added a commit that referenced this pull request Apr 3, 2024

JIT: add note on profile reconstruction (#100601)

95f68f5

I left a long note on the algorithm as a comment on #99992. Move it to the doc folder.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Apr 3, 2024

JIT: add note on profile reconstruction

21e63db

I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.

github-actions bot locked and limited conversation to collaborators Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: initial work on general count reconstruction #99992

JIT: initial work on general count reconstruction #99992

AndyAyersMS commented Mar 20, 2024

dotnet-policy-service bot commented Mar 20, 2024

AndyAyersMS commented Mar 20, 2024

jakobbotsch Mar 20, 2024 •

edited

Loading

AndyAyersMS Mar 20, 2024

AndyAyersMS commented Mar 20, 2024

AndyAyersMS commented Mar 20, 2024

AndyAyersMS commented Mar 22, 2024 •

edited

Loading

AndyAyersMS commented Mar 23, 2024

BruceForstall commented Mar 27, 2024

AndyAyersMS commented Mar 27, 2024

AndyAyersMS commented Mar 28, 2024

JIT: initial work on general count reconstruction #99992

JIT: initial work on general count reconstruction #99992

Conversation

AndyAyersMS commented Mar 20, 2024

dotnet-policy-service bot commented Mar 20, 2024

AndyAyersMS commented Mar 20, 2024

jakobbotsch Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

AndyAyersMS Mar 20, 2024

Choose a reason for hiding this comment

AndyAyersMS commented Mar 20, 2024

AndyAyersMS commented Mar 20, 2024

AndyAyersMS commented Mar 22, 2024 • edited Loading

Numeric Solvers for Profile Count Reconstruction

General Solution

A More Practical Solution

Jacobi's Method

On Convergence and Stability

Accelerating Convergence I: Gauss-Seidel and Reverse Postorder

Accelerating Convergence II: Cyclic Probabilities

Cycles That Are Not Natural Loops, More Sophisticated Solvers, and Deep Nests

References

AndyAyersMS commented Mar 23, 2024

BruceForstall commented Mar 27, 2024

AndyAyersMS commented Mar 27, 2024

AndyAyersMS commented Mar 28, 2024

jakobbotsch Mar 20, 2024 •

edited

Loading

AndyAyersMS commented Mar 22, 2024 •

edited

Loading