Skip to content

Commit

Permalink
maintain external alias set for more efficiency
Browse files Browse the repository at this point in the history
  • Loading branch information
aviatesk committed Dec 25, 2021
1 parent 5ea6139 commit 8f1a6aa
Show file tree
Hide file tree
Showing 2 changed files with 408 additions and 180 deletions.
122 changes: 76 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,59 +5,89 @@

This analysis works on a lattice called `x::EscapeLattice`, which holds the following properties:
- `x.Analyzed::Bool`: not formally part of the lattice, indicates `x` has not been analyzed at all
- `x.ReturnEscape::Bool`: indicates `x` may escape to the caller via return (possibly as a field),
where `x.ReturnEscape && 0 ∈ x.EscapeSites` has the special meaning that it's visible to
the caller simply because it's passed as call argument
- `x.ThrownEscape::Bool`: indicates `x` may escape to somewhere through an exception (possibly as a field)
- `x.EscapeSites::BitSet`: records program counters (SSA numbers) where `x` can escape
- `x.FieldSets::Union{Vector{IdSet{Any}},Bool}`: maintains the sets of possible values of fields of `x`:
* `x.FieldSets === false` indicates the fields of `x` isn't analyzed yet
* `x.FieldSets === true` indicates the fields of `x` can't be analyzed, e.g. the type of `x`
is not concrete and thus the number of its fields can't known precisely
* otherwise `x.FieldSets::Vector{IdSet{Any}}` holds all the possible values of each field,
where `x.FieldSets[i]` keeps all possibilities that the `i`th field can be
- `x.ArgEscape::Int` (not implemented yet): indicates it will escape to the caller through `setfield!` on argument(s)
* `-1` : no escape
* `0` : unknown or multiple
* `n` : through argument N
- `x.ReturnEscape::Bool`: indicates `x` may escape to the caller via return
- `x.ThrownEscape::Bool`: indicates `x` may escape to somewhere through an exception
- `x.EscapeSites::BitSet`: records program counters (SSA numbers) where `x` can escape (via any of the escapes)
- `x.FieldSets::Union{Vector{IdSet{Any}},Bool}`: maintains all possible values that impose
escape information on fields of `x`
- `x.ArgEscape::Int` (not implemented yet): indicates it will escape to the caller through
`setfield!` on argument(s)

These attributes can be combined to create a partial lattice that has a finite height, given
that input program has a finite number of statements, which is assured by Julia's semantics.

There are utility constructors to create common `EscapeLattice`s, e.g.,
- `NoEscape()`: the bottom element of this lattice, meaning it won't escape to anywhere
- `AllEscape()`: the topmost element of this lattice, meaning it will escape to everywhere
Escape analysis implementation is based on the data-flow algorithm described in the paper[^MM02].
The analysis works on the lattice of `EscapeLattice` and transitions lattice elements from the
bottom to the top until every lattice gets converged to a fixed point by maintaining a (conceptual)
working set that contains program counters corresponding to remaining SSA statements to be analyzed.
The analysis only manages a single global state that tracks `EscapeLattice` of each argument
and SSA statement, but also note that some flow-sensitivity is being encoded as program
counters recorded in the `EscapeSites` property of each lattice element, which can be
combined with domination analysis to reason about flow-sensitivity if necessary.

The escape analysis will transition these elements from the bottom to the top,
in the same direction as Julia's native type inference routine.
An abstract state will be initialized with the bottom(-like) elements:
- the call arguments are initialized as `ArgumentReturnEscape()`, because they're visible from a caller immediately
- the other states are initialized as `NotAnalyzed()`, which is a special lattice element that
is slightly lower than `NoEscape`, but at the same time doesn't represent any meaning
other than it's not analyzed yet (thus it's not formally part of the lattice).
One distinctive design of this analysis is that escape information is propagated in a
_backward_ way, i.e. data flows _from usages to definitions_.
For example, in the code snippet below, EA first analyzes the statement `return obj` and
imposes `ReturnEscape` on `obj`, and then it analyzes `obj = Expr(:new, Obj, val)` and
propagates `ReturnEscape` imposed on `obj` to `val`:
```julia
obj = Expr(:new, Obj, val) # lowered from `Obj(val)`
return obj
```
The key observation here is that this backward analysis allows escape information to flow
naturally along the use-def chain rather than control-flow, which can better handled by
forward analysis otherwise. As a result, this scheme enables a very simple implementation of
escape analysis, e.g. `PhiNode` for example can be handled relatively easily since we just
need to propagate escape information imposed on it to its predecessors.

Escape analysis implementation is based on the data-flow algorithm described in the old paper [^MM02].
The analysis works on the lattice of [`EscapeLattice`](@ref) and transitions lattice elements
from the bottom to the top in a _backward_ way, i.e. data flows from usage cites to definitions,
until every lattice gets converged to a fixed point by maintaining a (conceptual) working set
that contains program counters corresponding to remaining SSA statements to be analyzed.
The analysis only manages a single global state that tracks `EscapeLattice` of each argument
and SSA statement, but also note that some flow-sensitivity is encoded as program counters
recorded in the `EscapeSites` property of each each lattice element.
It would be also worth noting the `FieldSets` property enables a backward field analysis.
It tracks all possibilities that _can escape fields of object_, which can be analyzed at
"usage" sites, and escape information imposed on those tracked possibilities are propagated
to the actual field values later at "definition" site. Especially, the analysis records a
value that may impose escape information on field of object at `getfield` call, and then it
propagates that escape information to the field when analyzing `Expr(:new)` or `setfield!`
expressions.
```julia
obj = Expr(:new, Obj, val)
v = getfield(obj, :val)
return v
```
In the example above, `ReturnEscape` imposed on `v` is _not_ directly propagated to `obj`.
Rather the identity of `v` is recorded in `obj`'s `FieldSets[1]` and then `v`'s escape
information is propagated to `val` when `obj = Expr(:new, Obj, val)` is analyzed.

Finally, the analysis also needs to track which values can be aliased to each other. This is
needed because in Julia IR, the same object is sometimes represented by different IR elements.
Since the analysis maintains `EscapeLattice` per IR element, we need to make sure those different
IR elements that actually represent the same object to share the same escape information.
Those program constructs that return the same object as their operand(s) like `PiNode` and
`typeassert` are obvious examples that require this escape information aliasing.
But the escape information equalization between aliased values is needed for other cases as
well, most notably, it is necessary for correctly reasoning about mutations on `PhiNode`.
Now let's consider the following example; `ϕ1` and `ϕ2` are aliased and thus `ReturnEscape`
imposed on `y = ϕ1[]` needs to be propagated to `ϕ2[] = x`. The escape information can be
propagated if the escape states of _predecessors_ of `ϕ1` and `ϕ2` (i.e. those two
`RefValue` objects) are shared ("equalized"):
```julia
if cond::Bool
ϕ2 = ϕ1 = Ref("foo")
else
ϕ2 = ϕ1 = Ref("bar")
end
ϕ2[] = x::String
y = ϕ1[]
return y
```

The analysis also collects alias information using an approach, which is inspired by
the escape analysis algorithm explained in yet another old paper [^JVM05].
In addition to managing escape lattice elements, the analysis state also maintains an "alias set",
which is implemented as a disjoint set of aliased arguments and SSA statements.
When the fields of object `x` are known precisely (i.e. `x.FieldSets isa Vector{IdSet{Any}}` holds),
the alias set is updated each time `z = getfield(x, y)` is encountered in a way that `z` is
aliased to all values of `x.FieldSets[y]`, so that escape information imposed on `z` will be
propagated to all the aliased values and `z` can be replaced with an aliased value later.
Note that in a case when the fields of object `x` can't known precisely (i.e. `x.FieldSets` is `true`),
when `z = getfield(x, y)` is analyzed, escape information of `z` is propagated to `x` rather
than any of `x`'s fields, which is the most conservative propagation since escape information
imposed on `x` will end up being propagated to all of its fields anyway at definitions of `x`
(i.e. `:new` expression or `setfield!` call).
However, one interesting property of such alias information is that it is not known at "usage"
site but can be derived at "definition" site (as aliasing is conceptually equivalent to assignment),
and thus it doesn't naturally flow in a backward way. This means it can be inefficient if we
update escape information whenever we see new aliasing. Rather, in order to handle aliasing
effectively, EscapeAnalysis.jl uses an approach inspired by the escape analysis algorithm
explained in this old JVM paper[^JVM05]. That is, in addition to managing escape lattice
elements, the analysis also maintains an "equi-alias set", a disjoint set of aliased
arguments and SSA statements which allows escape information between newly aliased values
to be equalized efficiently.

[^MM02]: _A Graph-Free approach to Data-Flow Analysis_.
Markas Mohnen, 2002, April.
Expand Down
Loading

0 comments on commit 8f1a6aa

Please sign in to comment.