You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current compiler implementation re-runs all stages on all modules each time the compiler is invoked. This is fine when policies are loaded in bulk, however, when policies are loaded incrementally this can be a problem.
After instrumenting the compiler we can see the time taken on various stages. The table below is for the incremental load of this policy after ~70 similar policies had been loaded and compiled.
Total time reported by these stages is ~186ms. To compile the first module it takes <10ms. If we assume that modules are the unit of scale we want to optimize for, it's relatively straightforward to imagine how the compiler could be extended to avoid recompiling modules that have not changed. To do so it's useful to categorize the stages into whether they are read-only and module-local. In either case, any changes made are not visible to other modules:
stage
read-only
module-local
latency (ns)
notes
check_safety_rule_bodies
no
yes
87467258
check_types
no
no
25406648
rewrite_local_vars
no
yes
12469995
check_rule_conflicts
yes
no
9728224
resolve_refs
no
no
6515598
check_undefined_funcs
yes
no
6297699
check_unsafe_builtins
yes
yes
5282864
rewrite_with_values
no
no
5244724
set_graph
no
no
4274580
rewrite_comprehension_terms
no
no
4163807
rebuild_indices
no
no
3526997
rewrite_equals
no
yes
3009334
rewrite_expr_terms
no
yes
2929602
check_safety_rule_heads
yes
yes
2583032
init_local_var_gen
no
potentially
2553327
requires maintaining one per package
rewrite_dynamic_terms
no
no
2043770
check_recursion
yes
no
1623961
rewrite_refs_in_head
no
no
622904
set_rule_tree
no
potentially
474280
requires add/remove interface on tree
set_module_tree
no
potentially
249538
requires add/remove interface on tree
Modifying the compiler to only execute module-local stages on changed modules would save ~100ms. The check_rule_conflicts and rebuild_indices stages could be refactored to operate on packages/rules that contribute to their packages (this would save another ~10ms.) A similar change could be made to resolve_refs which would pave the way for similar improvements to:
stage
read-only
module-local
latency (ns)
rewrite_with_values
no
no
5244724
rewrite_comprehension_terms
no
no
4163807
rewrite_dynamic_terms
no
no
2043770
rewrite_refs_in_head
no
no
622904
This leaves type checking (~25ms) and rule dependency analysis (~12ms):
stage
read-only
module-local
latency (ns)
check_types
no
no
25406648
check_undefined_funcs
yes
no
6297699
set_graph
no
no
4274580
check_recursion
yes
no
1623961
Rule dependency analysis would need to be extended as follows:
the rule graph needs to be extended to support add/remove operations
set_graph stage needs to record rules added and removed from the graph
check_undefined_funcs stage needs to run on new and modified rules as well as dependents
of new, modified, or removed rules
check_recursion stage needs to run on new and modified rules as well as dependents
of new or modified rules
Identifying rules in the graph would be tricky at the moment because they are keyed by pointers to the rule AST structures which are not reliable across runs.
Type checking would need to re-run on any of the new or modified modules as well as any dependents of those modules. This would rely on the add/remove record from the set_graph stage.
All said, it seems feasible to convert all of the current stages into incremental operations assuming modules are the unit of scale.
EDIT: 2020-07-13:
In any solution we will need to make sure the compiler can rollback changes to internal data structures in case of failure. This would require undo operations on all data structures used internally.
The API for this feature is TBD. One option would be to allow callers to supply a set of new/updated modules and a set of removed module names.
Incremental compiles would help improve performance when multiple bundles are in used (as only modules from the bundle that changed would have to be recompiled) and it would also help with performance if push-based/delta bundles are implemented (Delta Bundles #1055).
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.
The current compiler implementation re-runs all stages on all modules each time the compiler is invoked. This is fine when policies are loaded in bulk, however, when policies are loaded incrementally this can be a problem.
After instrumenting the compiler we can see the time taken on various stages. The table below is for the incremental load of this policy after ~70 similar policies had been loaded and compiled.
Total time reported by these stages is ~186ms. To compile the first module it takes <10ms. If we assume that modules are the unit of scale we want to optimize for, it's relatively straightforward to imagine how the compiler could be extended to avoid recompiling modules that have not changed. To do so it's useful to categorize the stages into whether they are read-only and module-local. In either case, any changes made are not visible to other modules:
Modifying the compiler to only execute module-local stages on changed modules would save ~100ms. The
check_rule_conflicts
andrebuild_indices
stages could be refactored to operate on packages/rules that contribute to their packages (this would save another ~10ms.) A similar change could be made toresolve_refs
which would pave the way for similar improvements to:This leaves type checking (~25ms) and rule dependency analysis (~12ms):
Rule dependency analysis would need to be extended as follows:
set_graph
stage needs to record rules added and removed from the graphcheck_undefined_funcs
stage needs to run on new and modified rules as well as dependentsof new, modified, or removed rules
check_recursion
stage needs to run on new and modified rules as well as dependentsof new or modified rules
Identifying rules in the graph would be tricky at the moment because they are keyed by pointers to the rule AST structures which are not reliable across runs.
Type checking would need to re-run on any of the new or modified modules as well as any dependents of those modules. This would rely on the add/remove record from the
set_graph
stage.All said, it seems feasible to convert all of the current stages into incremental operations assuming modules are the unit of scale.
EDIT: 2020-07-13:
In any solution we will need to make sure the compiler can rollback changes to internal data structures in case of failure. This would require undo operations on all data structures used internally.
The API for this feature is TBD. One option would be to allow callers to supply a set of new/updated modules and a set of removed module names.
Incremental compiles would help improve performance when multiple bundles are in used (as only modules from the bundle that changed would have to be recompiled) and it would also help with performance if push-based/delta bundles are implemented (Delta Bundles #1055).
The text was updated successfully, but these errors were encountered: