You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current move/init analysis is elegant and illustrates the expressive power of datalog. However, this elegance comes at the cost of efficiency, as it requires instantiating at least one fact (maybe_uninitialized and/or maybe_initialized) for every move path at every point in the CFG. This becomes intractable for large bodies like the one in #181 which requires several billion initialization facts.
Using a dense relation store (like Souffle's binary radix trie) will alleviate some of the pain by reducing the memory required per fact, and I'm kind of doubtful that rustc will ever use a datalog engine to compute move errors. However, I'm still interested in expressing this part of the compiler efficiently in datalog, if only to see how far we can take the "everything is horn clauses" mantra.
The idea is to take advantage of the sparsity of the path_accessed_at relation and compute initializedness inside a basic block only when necessary. All intra-block CFG nodes have exactly one predecessor (the previous statement in the block), so their initializedness is purely a function of their immediate predecessor and any operations within that node. If we were to compute the cumulative effect of all nodes inside a basic block and do our fixpoint algorithm on blocks instead of individual nodes, we could use that information to compute the state for intrablock nodes on-demand.
This is not groundbreaking; in fact it's pretty dull. This is the first optimization discussed in any introduction to dataflow analysis. However, implementing it in datalog poses some new challenges that require (AFAIK) some non-standard extensions to overcome.
First, let's assume that we have some mapping from CFG nodes to their corresponding basic block and statement index (a Location), as well as the control-flow edges for each basic block and the typical move/init input data in terms of those locations:
From there, we can compute the effect of each basic block on the initializedness of each move path in the program by looking for an assignment to (or move from) a path that comes after any other moves from (or assignments to) that path. Unfortunately , doing this without computing the full initializedness at each statement or negating the path_assigned_at and path_accessed_at relations (which are very sparse) requires either universal quantification within rule bodies or unbound variables within negated clauses, both of which are outside the realm of traditional datalog. Luckily, Soufflé has an extension called aggregates, which allow more complex functions on certain relations.
Here, we use the count aggregate to count the number of path_assigned_at relations for this move path in the same basic block that come after a path_moved_from relation. If none exist (count = 0), we know that the path will be moved from at the exit of the basic block.
Now that we have our block transfer functions, it's time to use them to compute the entry state for each block. A path is initialized at block exit if it is initialized at block entry and not uninitialized inside the block, or if it is initialized inside the block. This is just your typical fixpoint algorithm.
Finally, we can use this data to compute the intrablock initialization data (using the same count trick) and move errors. Note the inline on the path_maybe_uninit_at_location relation. That means that we don't compute the relation at every point, only the ones deemed significant by move_error (the predecessors of path_accessed_at points). In fact, that relation must be inlined into a rule that constrains stmt, since it is not grounded otherwise.
Unfortunately, this doesn't compile in Soufflé today due to shortcomings around aggregates and how they interact with inlining and record types. However, it can be made to work by manually inlining path_maybe_uninit_at_location into move_error using a disjunction and splitting Location into its subfields in all the input relations.
Phew. With that, we've emulated a simple dataflow optimization in datalog, and avoided storing initializedness data at all points in the CFG. However, I've not tested to see whether this method is actually faster than the naive one; I don't know how optimized the count queries will be. It will be fast in datafrog, since each count is just a range query on a static Relation.
The text was updated successfully, but these errors were encountered:
There's one more issue I didn't notice. In the version I tested locally, only one of the in_block relations was computed correctly using a count aggregate. However, when you try to compile the version I produced above, which uses count for both relations, you get an error about mutually dependent aggregates, despite the fact that path_becomes_{un}init_in_block should be in a separate stratum from path_maybe_init_at_block_{entry, exit}. I think this is just another shortcoming in Soufflé, and not a fundamental problem.
The current move/init analysis is elegant and illustrates the expressive power of datalog. However, this elegance comes at the cost of efficiency, as it requires instantiating at least one fact (
maybe_uninitialized
and/ormaybe_initialized
) for every move path at every point in the CFG. This becomes intractable for large bodies like the one in #181 which requires several billion initialization facts.Using a dense relation store (like Souffle's binary radix trie) will alleviate some of the pain by reducing the memory required per fact, and I'm kind of doubtful that
rustc
will ever use a datalog engine to compute move errors. However, I'm still interested in expressing this part of the compiler efficiently in datalog, if only to see how far we can take the "everything is horn clauses" mantra.The idea is to take advantage of the sparsity of the
path_accessed_at
relation and compute initializedness inside a basic block only when necessary. All intra-block CFG nodes have exactly one predecessor (the previous statement in the block), so their initializedness is purely a function of their immediate predecessor and any operations within that node. If we were to compute the cumulative effect of all nodes inside a basic block and do our fixpoint algorithm on blocks instead of individual nodes, we could use that information to compute the state for intrablock nodes on-demand.This is not groundbreaking; in fact it's pretty dull. This is the first optimization discussed in any introduction to dataflow analysis. However, implementing it in datalog poses some new challenges that require (AFAIK) some non-standard extensions to overcome.
First, let's assume that we have some mapping from CFG nodes to their corresponding basic block and statement index (a
Location
), as well as the control-flow edges for each basic block and the typical move/init input data in terms of those locations:From there, we can compute the effect of each basic block on the initializedness of each move path in the program by looking for an assignment to (or move from) a path that comes after any other moves from (or assignments to) that path. Unfortunately , doing this without computing the full initializedness at each statement or negating the
path_assigned_at
andpath_accessed_at
relations (which are very sparse) requires either universal quantification within rule bodies or unbound variables within negated clauses, both of which are outside the realm of traditional datalog. Luckily, Soufflé has an extension called aggregates, which allow more complex functions on certain relations.Here, we use the
count
aggregate to count the number ofpath_assigned_at
relations for this move path in the same basic block that come after apath_moved_from
relation. If none exist (count = 0
), we know that the path will be moved from at the exit of the basic block.Now that we have our block transfer functions, it's time to use them to compute the entry state for each block. A path is initialized at block exit if it is initialized at block entry and not uninitialized inside the block, or if it is initialized inside the block. This is just your typical fixpoint algorithm.
Finally, we can use this data to compute the intrablock initialization data (using the same
count
trick) and move errors. Note theinline
on thepath_maybe_uninit_at_location
relation. That means that we don't compute the relation at every point, only the ones deemed significant bymove_error
(the predecessors ofpath_accessed_at
points). In fact, that relation must be inlined into a rule that constrainsstmt
, since it is not grounded otherwise.Unfortunately, this doesn't compile in Soufflé today due to shortcomings around aggregates and how they interact with inlining and record types. However, it can be made to work by manually inlining
path_maybe_uninit_at_location
intomove_error
using a disjunction and splittingLocation
into its subfields in all the input relations.Phew. With that, we've emulated a simple dataflow optimization in datalog, and avoided storing initializedness data at all points in the CFG. However, I've not tested to see whether this method is actually faster than the naive one; I don't know how optimized the
count
queries will be. It will be fast indatafrog
, since eachcount
is just a range query on a staticRelation
.The text was updated successfully, but these errors were encountered: