Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compiler performance for incremental load scenarios #2282

Open
tsandall opened this issue Apr 8, 2020 · 3 comments
Open

Improve compiler performance for incremental load scenarios #2282

tsandall opened this issue Apr 8, 2020 · 3 comments

Comments

@tsandall
Copy link
Member

tsandall commented Apr 8, 2020

The current compiler implementation re-runs all stages on all modules each time the compiler is invoked. This is fine when policies are loaded in bulk, however, when policies are loaded incrementally this can be a problem.

After instrumenting the compiler we can see the time taken on various stages. The table below is for the incremental load of this policy after ~70 similar policies had been loaded and compiled.

+----------+------------------------------------------------------+
|    ns    |                        timer                         |
+----------+------------------------------------------------------+
| 87467258 | "timer_compile_stage_check_safety_rule_bodies_ns"    |
| 25406648 | "timer_compile_stage_check_types_ns"                 |
| 12469995 | "timer_compile_stage_rewrite_local_vars_ns"          |
| 9728224  | "timer_compile_stage_check_rule_conflicts_ns"        |
| 6515598  | "timer_compile_stage_resolve_refs_ns"                |
| 6297699  | "timer_compile_stage_check_undefined_funcs_ns"       |
| 5282864  | "timer_compile_state_check_unsafe_builtins_ns"       |
| 5244724  | "timer_compile_stage_rewrite_with_values_ns"         |
| 4274580  | "timer_compile_stage_set_graph_ns"                   |
| 4163807  | "timer_compile_stage_rewrite_comprehension_terms_ns" |
| 3526997  | "timer_compile_stage_rebuild_indices_ns"             |
| 3009334  | "timer_compile_stage_rewrite_equals_ns"              |
| 2929602  | "timer_compile_stage_rewrite_expr_terms_ns"          |
| 2583032  | "timer_compile_stage_check_safety_rule_heads_ns"     |
| 2553327  | "timer_compile_stage_init_local_var_gen_ns"          |
| 2043770  | "timer_compile_stage_rewrite_dynamic_terms_ns"       |
| 1623961  | "timer_compile_stage_check_recursion_ns"             |
| 622904   | "timer_compile_stage_rewrite_refs_in_head_ns"        |
| 474280   | "timer_compile_stage_set_rule_tree_ns"               |
| 249538   | "timer_compile_stage_set_module_tree_ns"             |
+----------+------------------------------------------------------+

Total time reported by these stages is ~186ms. To compile the first module it takes <10ms. If we assume that modules are the unit of scale we want to optimize for, it's relatively straightforward to imagine how the compiler could be extended to avoid recompiling modules that have not changed. To do so it's useful to categorize the stages into whether they are read-only and module-local. In either case, any changes made are not visible to other modules:

stage read-only module-local latency (ns) notes
check_safety_rule_bodies no yes 87467258
check_types no no 25406648
rewrite_local_vars no yes 12469995
check_rule_conflicts yes no 9728224
resolve_refs no no 6515598
check_undefined_funcs yes no 6297699
check_unsafe_builtins yes yes 5282864
rewrite_with_values no no 5244724
set_graph no no 4274580
rewrite_comprehension_terms no no 4163807
rebuild_indices no no 3526997
rewrite_equals no yes 3009334
rewrite_expr_terms no yes 2929602
check_safety_rule_heads yes yes 2583032
init_local_var_gen no potentially 2553327 requires maintaining one per package
rewrite_dynamic_terms no no 2043770
check_recursion yes no 1623961
rewrite_refs_in_head no no 622904
set_rule_tree no potentially 474280 requires add/remove interface on tree
set_module_tree no potentially 249538 requires add/remove interface on tree

Modifying the compiler to only execute module-local stages on changed modules would save ~100ms. The check_rule_conflicts and rebuild_indices stages could be refactored to operate on packages/rules that contribute to their packages (this would save another ~10ms.) A similar change could be made to resolve_refs which would pave the way for similar improvements to:

stage read-only module-local latency (ns)
rewrite_with_values no no 5244724
rewrite_comprehension_terms no no 4163807
rewrite_dynamic_terms no no 2043770
rewrite_refs_in_head no no 622904

This leaves type checking (~25ms) and rule dependency analysis (~12ms):

stage read-only module-local latency (ns)
check_types no no 25406648
check_undefined_funcs yes no 6297699
set_graph no no 4274580
check_recursion yes no 1623961

Rule dependency analysis would need to be extended as follows:

  • the rule graph needs to be extended to support add/remove operations
  • set_graph stage needs to record rules added and removed from the graph
  • check_undefined_funcs stage needs to run on new and modified rules as well as dependents
    of new, modified, or removed rules
  • check_recursion stage needs to run on new and modified rules as well as dependents
    of new or modified rules

Identifying rules in the graph would be tricky at the moment because they are keyed by pointers to the rule AST structures which are not reliable across runs.

Type checking would need to re-run on any of the new or modified modules as well as any dependents of those modules. This would rely on the add/remove record from the set_graph stage.

All said, it seems feasible to convert all of the current stages into incremental operations assuming modules are the unit of scale.

EDIT: 2020-07-13:

  • In any solution we will need to make sure the compiler can rollback changes to internal data structures in case of failure. This would require undo operations on all data structures used internally.

  • The API for this feature is TBD. One option would be to allow callers to supply a set of new/updated modules and a set of removed module names.

  • Incremental compiles would help improve performance when multiple bundles are in used (as only modules from the bundle that changed would have to be recompiled) and it would also help with performance if push-based/delta bundles are implemented (Delta Bundles #1055).

@stale
Copy link

stale bot commented Apr 4, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@hpvd
Copy link

hpvd commented Aug 18, 2023

+1 on this since time saving is always good and in addition, as I have just learned, it also helps to make clouds greener :-)

@stale stale bot removed the inactive label Aug 18, 2023
@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@stale stale bot added the inactive label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants