Speedup connectedness by doing it on a smaller hashtable #603

michael-schwarz · 2022-02-18T14:06:40Z

For FFmpeg, the CFG computation phase did previously not terminate in a reasonable amount of time.
With these changes (use a separate smaller hashtable to check for connectedness, this seems to work). Also this drastically reduces RAM usage, which makes me slightly suspicious

TODO:

Check this is actually correct
Cleanup

sim642 · 2022-02-18T14:12:02Z

The diff is impossible to follow, where's the hashtable that you made smaller?

michael-schwarz · 2022-02-18T14:13:42Z

Most of things marked as different is due to wrapping things into calls to Stats.time. I'll comment on the relevant changes.

michael-schwarz · 2022-02-18T14:14:22Z

src/framework/cfgTools.ml

+    let fun_cfgF = H.create 113 in
+    let fun_cfgB = H.create 113 in


Hashtables that are reset after every function, used for connectedness check.

michael-schwarz · 2022-02-18T14:14:51Z

src/framework/cfgTools.ml

+      H.add cfgB toNode (edges,fromNode); H.add fun_cfgB toNode (edges,fromNode);
+      H.add cfgF fromNode (edges,toNode); H.add fun_cfgF fromNode (edges,toNode);


Also add edges to per-function table

michael-schwarz · 2022-02-18T14:15:25Z

src/framework/cfgTools.ml

+          H.clear fun_cfgB;
+          H.clear fun_cfgF;


Reset per function tables

michael-schwarz · 2022-02-18T14:15:58Z

src/framework/cfgTools.ml

+            let next = H.find_all fun_cfgF
+            let prev = H.find_all fun_cfgB


Use per-function table here instead of global table

sim642 · 2022-02-18T14:17:05Z

But the connectedness check is only done using the current function's nodes using fd_nodes anyway, which follows the exact same clearing logic:

analyzer/src/framework/cfgTools.ml

Line 397 in 4c51356

    
           let (sccs, node_scc) = computeSCCs (module TmpCfg) (NH.keys fd_nodes |> BatList.of_enum) in

michael-schwarz · 2022-02-18T14:17:49Z

For FFmpeg, timing goes from aborting with:

    makeCFG                         1.047 s
    connect                        171.697 s

to finishing this phase with

    makeCFG                         7.298 s
    connect                        55.658 s

michael-schwarz · 2022-02-18T14:19:09Z

But the connectedness check is only done using the current function's nodes using fd_nodes anyway

Yes, but it still uses the large CFG hash table of all functions. We could be seeing effects where the per-function hashtable fits into a cache, and the huge one for all functions does not...

sim642 · 2022-02-18T14:23:05Z

Yes, but it still uses the large CFG hash table of all functions. We could be seeing effects where the per-function hashtable fits into a cache, and the huge one for all functions does not...

But that's far from specific to the connectedness check, Goblint has been using a single hashtable for all CFGs of all functions since the very beginning of time. And that also affects every single transfer function evaluation as well. If that's really the entirety of the bottleneck, then we have far greater problems and might have to reengineer our whole CFG representation throughout Goblint and all the downstream tools.

sim642 · 2022-02-18T14:26:42Z

I would really love if there was a profilable reproduction of the issue (~10s runtime) that already reveals the bottleneck, because that would reveal whether the true problem is cache locality (which no part of Goblint ever considers), unsuitable hashing or something else.

sim642 · 2022-02-18T14:28:53Z

Also this drastically reduces RAM usage, which makes me slightly suspicious

Indeed, this doesn't fit with the rest of the story. The huge hashtable is still there, so no memory is saved through this change. Thus it would be especially useful to profile.

sim642 · 2022-02-18T14:37:01Z

Maybe H.stats reveals something insightful about the hashtable?

michael-schwarz · 2022-02-18T14:41:15Z

I would really love if there was a profilable reproduction of the issue (~10s runtime) that already reveals the bottleneck, because that would reveal whether the true problem is cache locality (which no part of Goblint ever considers), unsuitable hashing or something else.

I tried with zstd, but there is no observable difference between both there.

michael-schwarz · 2022-02-18T14:54:31Z

Same for git, no difference there.

sim642 · 2022-02-18T15:13:44Z

Here's a wild guess: the ffmpeg code by coincidence contains functions and statements whose IDs interact particularly poorly with the hash function and the size of the large hashtable, such that unreasonably many nodes fall into the same bucket. Although the hashes for nodes (so statements and functions) are based on sequentially generated IDs, modulo the number of buckets might particularly collide.

This in turn causes find_all on the large hashtable to be particularly inefficient, because it always has to traverse the entire bucket unconditionally. Moreover, that bucket traversal is recursive, but not tail-recursive (ocaml/ocaml#8676), so memory usage skyrockets as well.

Of course this wouldn't be then limited to the large hashtable lookups in CFG construction, but also in transfer functions, but we haven't gotten far enough to investigate the bottleneck there. Also it would be far less obvious, since transfer functions are already the most expensive part, so it'd be hard to judge, what's unreasonably slow for them.

michael-schwarz · 2022-02-18T16:10:01Z

Yes, that seems to make sense, but still doesn't quite explain the memory usage...

michael-schwarz · 2022-02-18T16:38:00Z

I ran it again on the server for the openSSL benchmark (from what CIL dynamically combined from the compiledb) as opposed to a single already combined file (what I had done before), and there I observe the same gigantic speed difference again:

Without the optimization, I ran out of patience after a while:

    makeCFG                         2.576 s
    connect                        633.225 s
    
    max=61843.65MB

with it, it finished blazingly fast:

    makeCFG                         4.165 s
    connect                        14.605 s
    
    max=20216.80MB

Maybe this is related to the combined files being built before we disabled merge_inline and us having more functions after 🤔

I also noticed some functions have names such as ossl_check_ERR_STRING_DATA_lh_doallfunc_type___NNN with NNN some number.
In total we have >290k functions, so that seems fishy too.

I might want to benchmark this suspicion on Monday.

sim642 · 2022-02-28T12:24:54Z

I've spent extensive amount of time profiling the counterintuitive memory usage reduction and have finally come to a conclusion.
The reason isn't something about OCaml's memory management, but rather a blatantly simple one: this changes the amount of computations (and allocations) done by computeSCCs! If Cfg.next is based on cfgF, then it sometimes returns more elements than Cfg.next based on fun_cfgF when looking up the same node! How is this possible?

Function entry (and return) nodes are the culprit: their identity (equal and hash) is based on their underlying variable (svar) only. So the difference appears in the form of a single function entry node having multiple successors with different sids although semantically identical content. These conflicts appear after merging of inlines was disabled in goblint/cil#72. Turning that on again fixes both the time and memory issue.

sim642 · 2022-02-28T13:12:30Z

Thus, I believe that this PR alone isn't what we need. Even if we use this optimization just during the CFG construction, it does not and cannot do anything to avoid the issue in transfer functions during solving. There the same conflict would probably cause a slowdown of a similar factor (the number of unmerged copies of a function per each unmerged copy of that function), but to the analysis itself, not to a lightweight graph algorithm.

Save time by doing connectedness check on smaller hash table

840efeb

michael-schwarz requested a review from sim642 February 18, 2022 14:06

sim642 added the performance Analysis time, memory usage label Feb 18, 2022

michael-schwarz commented Feb 18, 2022

View reviewed changes

somehwat improve indent

2b18b62

somehwat improve indent

f44e7e7

Base automatically changed from cil_universial_character_names to master February 21, 2022 09:20

michael-schwarz mentioned this pull request Feb 21, 2022

FFmpeg goblint/bench#18

Open

sim642 mentioned this pull request Feb 23, 2022

Optimize CFG representation #609

Draft

3 tasks

sim642 mentioned this pull request Feb 28, 2022

Not merging inlines breaks fundec invariant goblint/cil#84

Closed

sim642 closed this Feb 28, 2022

This was referenced Feb 28, 2022

Parallel parsing #598

Closed

Optimize CFG construction #614

Merged

sim642 mentioned this pull request Mar 24, 2022

Bump goblint-cil (https://github.com/goblint/cil/pull/85) #665

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup connectedness by doing it on a smaller hashtable #603

Speedup connectedness by doing it on a smaller hashtable #603

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz Feb 18, 2022

michael-schwarz Feb 18, 2022

michael-schwarz Feb 18, 2022

michael-schwarz Feb 18, 2022

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022 •

edited

Loading

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022 •

edited

Loading

sim642 commented Feb 28, 2022

sim642 commented Feb 28, 2022

		let fun_cfgF = H.create 113 in
		let fun_cfgB = H.create 113 in

		H.add cfgB toNode (edges,fromNode); H.add fun_cfgB toNode (edges,fromNode);
		H.add cfgF fromNode (edges,toNode); H.add fun_cfgF fromNode (edges,toNode);

		let next = H.find_all fun_cfgF
		let prev = H.find_all fun_cfgB

Speedup connectedness by doing it on a smaller hashtable #603

Speedup connectedness by doing it on a smaller hashtable #603

Conversation

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz Feb 18, 2022

Choose a reason for hiding this comment

michael-schwarz Feb 18, 2022

Choose a reason for hiding this comment

michael-schwarz Feb 18, 2022

Choose a reason for hiding this comment

michael-schwarz Feb 18, 2022

Choose a reason for hiding this comment

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

sim642 commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022

sim642 commented Feb 18, 2022 • edited Loading

michael-schwarz commented Feb 18, 2022

michael-schwarz commented Feb 18, 2022 • edited Loading

sim642 commented Feb 28, 2022

sim642 commented Feb 28, 2022

sim642 commented Feb 18, 2022 •

edited

Loading

michael-schwarz commented Feb 18, 2022 •

edited

Loading