Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid materializing initialization data at all CFG nodes #182

Open
ecstatic-morse opened this issue Nov 5, 2021 · 1 comment
Open

Avoid materializing initialization data at all CFG nodes #182

ecstatic-morse opened this issue Nov 5, 2021 · 1 comment

Comments

@ecstatic-morse
Copy link

ecstatic-morse commented Nov 5, 2021

The current move/init analysis is elegant and illustrates the expressive power of datalog. However, this elegance comes at the cost of efficiency, as it requires instantiating at least one fact (maybe_uninitialized and/or maybe_initialized) for every move path at every point in the CFG. This becomes intractable for large bodies like the one in #181 which requires several billion initialization facts.

Using a dense relation store (like Souffle's binary radix trie) will alleviate some of the pain by reducing the memory required per fact, and I'm kind of doubtful that rustc will ever use a datalog engine to compute move errors. However, I'm still interested in expressing this part of the compiler efficiently in datalog, if only to see how far we can take the "everything is horn clauses" mantra.

The idea is to take advantage of the sparsity of the path_accessed_at relation and compute initializedness inside a basic block only when necessary. All intra-block CFG nodes have exactly one predecessor (the previous statement in the block), so their initializedness is purely a function of their immediate predecessor and any operations within that node. If we were to compute the cumulative effect of all nodes inside a basic block and do our fixpoint algorithm on blocks instead of individual nodes, we could use that information to compute the state for intrablock nodes on-demand.

This is not groundbreaking; in fact it's pretty dull. This is the first optimization discussed in any introduction to dataflow analysis. However, implementing it in datalog poses some new challenges that require (AFAIK) some non-standard extensions to overcome.

First, let's assume that we have some mapping from CFG nodes to their corresponding basic block and statement index (a Location), as well as the control-flow edges for each basic block and the typical move/init input data in terms of those locations:

.type Location = [ block: Block, stmt: unsigned ]

.decl node_loc(node: Node, loc: Location)
.input node_loc

.decl bb_edge(from: Block, to: Block)
.input bb_edge

.decl path_assigned_at(path: Path, loc: Location)
.decl path_moved_at(path: Path, loc: Location)
.decl path_accessed_at(path: Path, loc: Location)
.input path_accessed_at
.input path_assigned_at
.input path_moved_at

From there, we can compute the effect of each basic block on the initializedness of each move path in the program by looking for an assignment to (or move from) a path that comes after any other moves from (or assignments to) that path. Unfortunately , doing this without computing the full initializedness at each statement or negating the path_assigned_at and path_accessed_at relations (which are very sparse) requires either universal quantification within rule bodies or unbound variables within negated clauses, both of which are outside the realm of traditional datalog. Luckily, Soufflé has an extension called aggregates, which allow more complex functions on certain relations.

.decl path_becomes_uninit_in_block(path: Path, bb: Block)

path_becomes_uninit_in_block(path, bb) :-
    path_moved_at(path, [bb, uninit_at]),
    count : { path_assigned_at(path, [bb, init_at]), init_at > uninit_at } = 0. // ∀i s.t. i > uninit_at (!path_assigned_at(path, [bb, i]))

.decl path_becomes_init_in_block(path: Path, bb: Block)

path_becomes_init_in_block(path, bb) :-
    path_assigned_at(path, [bb, init_at]),
    count : { path_moved_at(path, [bb, uninit_at]), uninit_at > init_at } = 0.

Here, we use the count aggregate to count the number of path_assigned_at relations for this move path in the same basic block that come after a path_moved_from relation. If none exist (count = 0), we know that the path will be moved from at the exit of the basic block.

Now that we have our block transfer functions, it's time to use them to compute the entry state for each block. A path is initialized at block exit if it is initialized at block entry and not uninitialized inside the block, or if it is initialized inside the block. This is just your typical fixpoint algorithm.

.decl path_maybe_init_at_block_exit(path: Path, bb: Block) inline

path_maybe_init_at_block_exit(path, bb) :-
    path_becomes_init_in_block(path, bb).

path_maybe_init_at_block_exit(path, bb) :-
    path_maybe_init_at_block_entry(path, bb),
    !path_becomes_uninit_in_block(path, bb).


.decl path_maybe_uninit_at_block_exit(path: Path, bb: Block) inline

path_maybe_uninit_at_block_exit(path, bb) :-
    path_becomes_uninit_in_block(path, bb).

path_maybe_uninit_at_block_exit(path, bb) :-
    path_maybe_uninit_at_block_entry(path, bb),
    !path_becomes_uninit_in_block(path, bb).


.decl path_maybe_init_at_block_entry(path: Path, bb: Block)

path_maybe_init_at_block_entry(path, bb) :-
    path_maybe_init_at_block_exit(path, pred),
    bb_edge(pred, bb).

.decl path_maybe_uninit_at_block_entry(path: Path, bb: Block)

path_maybe_uninit_at_block_entry(path, bb) :-
    path_maybe_uninit_at_block_exit(path, pred),
    bb_edge(pred, bb).

Finally, we can use this data to compute the intrablock initialization data (using the same count trick) and move errors. Note the inline on the path_maybe_uninit_at_location relation. That means that we don't compute the relation at every point, only the ones deemed significant by move_error (the predecessors of path_accessed_at points). In fact, that relation must be inlined into a rule that constrains stmt, since it is not grounded otherwise.

.decl path_maybe_uninit_at_location(path: Path, loc: Location) inline

path_maybe_uninit_at_location(path, [bb, stmt]) :-
    path_maybe_uninit_at_block_entry(path, bb),
    count : { path_assigned_at(path, [bb, init_at]), init_at <= stmt } = 0.

path_maybe_uninit_at_location(path, [bb, stmt]) :-
    path_moved_at(path, [bb, uninit_at]),
    uninit_at <= stmt,
    count : { path_assigned_at(path, [bb, init_at]), init_at > uninit_at, init_at <= stmt} = 0.


.decl move_error(path: Path, node: Node)
.output move_error

move_error(path, node) :-
    path_accessed_at(path, node),
    cfg_edge(pred, node),
    node_loc(pred, pred_loc),
    path_maybe_uninit_at_location(path, pred_loc).

Unfortunately, this doesn't compile in Soufflé today due to shortcomings around aggregates and how they interact with inlining and record types. However, it can be made to work by manually inlining path_maybe_uninit_at_location into move_error using a disjunction and splitting Location into its subfields in all the input relations.

Phew. With that, we've emulated a simple dataflow optimization in datalog, and avoided storing initializedness data at all points in the CFG. However, I've not tested to see whether this method is actually faster than the naive one; I don't know how optimized the count queries will be. It will be fast in datafrog, since each count is just a range query on a static Relation.

@ecstatic-morse
Copy link
Author

ecstatic-morse commented Nov 5, 2021

There's one more issue I didn't notice. In the version I tested locally, only one of the in_block relations was computed correctly using a count aggregate. However, when you try to compile the version I produced above, which uses count for both relations, you get an error about mutually dependent aggregates, despite the fact that path_becomes_{un}init_in_block should be in a separate stratum from path_maybe_init_at_block_{entry, exit}. I think this is just another shortcoming in Soufflé, and not a fundamental problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant