-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup connectedness by doing it on a smaller hashtable #603
Conversation
The diff is impossible to follow, where's the hashtable that you made smaller? |
Most of things marked as different is due to wrapping things into calls to |
src/framework/cfgTools.ml
Outdated
let fun_cfgF = H.create 113 in | ||
let fun_cfgB = H.create 113 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashtables that are reset after every function, used for connectedness check.
src/framework/cfgTools.ml
Outdated
H.add cfgB toNode (edges,fromNode); H.add fun_cfgB toNode (edges,fromNode); | ||
H.add cfgF fromNode (edges,toNode); H.add fun_cfgF fromNode (edges,toNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add edges to per-function table
src/framework/cfgTools.ml
Outdated
H.clear fun_cfgB; | ||
H.clear fun_cfgF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset per function tables
src/framework/cfgTools.ml
Outdated
let next = H.find_all fun_cfgF | ||
let prev = H.find_all fun_cfgB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use per-function table here instead of global table
But the connectedness check is only done using the current function's nodes using analyzer/src/framework/cfgTools.ml Line 397 in 4c51356
|
For FFmpeg, timing goes from aborting with:
to finishing this phase with
|
Yes, but it still uses the large CFG hash table of all functions. We could be seeing effects where the per-function hashtable fits into a cache, and the huge one for all functions does not... |
But that's far from specific to the connectedness check, Goblint has been using a single hashtable for all CFGs of all functions since the very beginning of time. And that also affects every single transfer function evaluation as well. If that's really the entirety of the bottleneck, then we have far greater problems and might have to reengineer our whole CFG representation throughout Goblint and all the downstream tools. |
I would really love if there was a profilable reproduction of the issue (~10s runtime) that already reveals the bottleneck, because that would reveal whether the true problem is cache locality (which no part of Goblint ever considers), unsuitable hashing or something else. |
Indeed, this doesn't fit with the rest of the story. The huge hashtable is still there, so no memory is saved through this change. Thus it would be especially useful to profile. |
Maybe |
I tried with zstd, but there is no observable difference between both there. |
Same for git, no difference there. |
Here's a wild guess: the ffmpeg code by coincidence contains functions and statements whose IDs interact particularly poorly with the hash function and the size of the large hashtable, such that unreasonably many nodes fall into the same bucket. Although the hashes for nodes (so statements and functions) are based on sequentially generated IDs, modulo the number of buckets might particularly collide. This in turn causes Of course this wouldn't be then limited to the large hashtable lookups in CFG construction, but also in transfer functions, but we haven't gotten far enough to investigate the bottleneck there. Also it would be far less obvious, since transfer functions are already the most expensive part, so it'd be hard to judge, what's unreasonably slow for them. |
Yes, that seems to make sense, but still doesn't quite explain the memory usage... |
I ran it again on the server for the openSSL benchmark (from what CIL dynamically combined from the compiledb) as opposed to a single already combined file (what I had done before), and there I observe the same gigantic speed difference again: Without the optimization, I ran out of patience after a while:
with it, it finished blazingly fast:
Maybe this is related to the combined files being built before we disabled merge_inline and us having more functions after 🤔
I might want to benchmark this suspicion on Monday. |
I've spent extensive amount of time profiling the counterintuitive memory usage reduction and have finally come to a conclusion. Function entry (and return) nodes are the culprit: their identity ( |
Thus, I believe that this PR alone isn't what we need. Even if we use this optimization just during the CFG construction, it does not and cannot do anything to avoid the issue in transfer functions during solving. There the same conflict would probably cause a slowdown of a similar factor (the number of unmerged copies of a function per each unmerged copy of that function), but to the analysis itself, not to a lightweight graph algorithm. |
For FFmpeg, the CFG computation phase did previously not terminate in a reasonable amount of time.
With these changes (use a separate smaller hashtable to check for connectedness, this seems to work). Also this drastically reduces RAM usage, which makes me slightly suspicious
TODO: