-
Notifications
You must be signed in to change notification settings - Fork 2
Reference Guide
Welcome to the reference guide for CFGConf. CFGConf is a JSON-based specification language that can be used to generate drawings of Control Flow Graphs (CFG). CFGConf does not require detailed knowledge of graph drawing algorithms or layout engines to produce domain-specific drawings of the CFG. The keywords are high-level and support many aspects of the graph visualization including specifying the graph and program constructs, filtering the graph, collapsing functions in the graph, and rendering and styling of the graph.
The reference guide lists all supported keywords and their usage in the JSON file. In addition, the JSON file can contain arbitrary keys for holding extra information and still be valid.
Table of Contents
- Control Flow Graph Data
- Graph Rendering Options
- Graph Filtering Options
This section describes how to specify the data about the control flow graph (i.e. the nodes and edges) and the structures (i.e. the functions and loops) in the program. The data (i.e. nodes, edges, functions, and loops) can be specified inside the current JSON file or can be loaded from separate files. Refer to the data
keyword to load the data from separate files. Loading the graph from a dot file is also supported.
The data
keyword provides a way to separate the graph and its structures from the rendering and filtering options. Instead of specifying the graph (i.e. nodes and edges) and its structures (i.e. functions and loops) inside the CFGConf JSON file, it can be loaded from separate JSON files. The graph (i.e. nodes and edges) can be loaded directly from dot file as well. The only required field is graphFile
. NOTE: Files specified inside the data
keyword must be in the same directory as the CFGConf JSON file.
The graphFile
key provides the file name for the graph. It can be either a dot
file or a JSON file. If it is a JSON file, it can contain the keys nodes
and edges
as defined in nodes
and edges
section. If it is a dot file, the nodes and edges from the dot
file are imported into the program. The file must be in the same directory as the CFGConf JSON file.
The format for the graph file. It can be either "dot"
or "json"
. If graphFormat
is not provided, the program detects the format based on the file extension.
The file name for the JSON file containing program structures i.e. functions
and loops
as defined in the functions
and loops
section. The file must be in the same directory as the CFGConf JSON file. The CFGConf JSON file can contain either the structureFile
or analysisFile
key.
The file name for the analysis file generated using Dyninst. Dyninst is a toolkit that provides many program analysis tools for binary code. The program extracts the functions and loops from this analysis file. The file must be in the same directory as the CFGConf JSON file. The CFGConf JSON file can contain either the structureFile
or analysisFile
key.
An array of node objects as specified below.
A node object contains the information related to the node. The only required field is id
which contains the identifier for the node.
The id of the node. NOTE: The id should be unique across all the nodes, loops, and functions inside the graph. (They are all stored as nodes in a compound graph.)
The label for the node. This is typically what is displayed inside the node. If the label is not provided, id
is used as the label.
The JSON nodes are converted into dot graphviz format. Any node attributes supported by
dot
can be included inside the JSON node object. Listed below are commonly used dot Graphviz attributes for styling/rendering the nodes. See thedot
specification for more attributes.
Shape defined in dot graphviz. Default value is box
.
Style defined in dot graphviz. Supports a comma supported list of values. Default value is solid
.
Drawing color except for text as in dot graphviz.
Background color when style
is filled
as in dot graphviz.
Color for text as in dot graphviz
Tooltip for the node. If not provided, the label
is used as the tooltip.
Class names to attach to the node. Combine with style in the CSS file to achieve styling of the nodes. Multiple space-separated classes are supported.
An array of edge objects as specified below.
An edge object contains the information related to the edge. The only required fields are source
and target
.
The id of the source node in the edge.
The id of the target node in the edge.
The label for the edge.
The JSON edges are converted into dot graphviz format. Any edge attributes supported by
dot
can be included inside the JSON edge object. Listed below are commonly used dot Graphviz attributes for styling/rendering the edges. See thedot
specification for additional attributes.
The weight of the edge. The heavier the weight, the shorter, straighter and more vertical the edge is.
The port associated with the head of the edge when routing the edge i.e. the edge terminates at this point in the target node. Can be one of 'n', 'ne', 'e', 'se', 's', 'sw', 'w', 'nw', 'c' corresponding to one of the eight compass directions or the center(c).
The port associated with the tail of the edge when routing the edge i.e. the edge starts at this point in the source node. Can be one of 'n', 'ne', 'e', 'se', 's', 'sw', 'w', 'nw', 'c' corresponding to one of the eight compass directions or the center(c).
Style defined in dot graphviz. Supports a comma supported list of values. Default value is solid
.
The color of the edge except for the text.
Style of arrowhead on the edge.
Scale factor for arrowheads.
The tooltip for the edge.
Classnames to attach to the edge. Combine with style in the CSS file to achieve styling of the edges. Multiple space-separated values are supported.
An array of loop objects as specified below
A loop object contains the information related to the loop. The only required fields are id
, nodes
, and backedges
.
The id of the loop. NOTE: The id should be unique across all the nodes, loops, and functions inside the graph since they are all stored as nodes inside a compound graph.
The node ids comprising the loop.
The array of back edges in the loop. Each back edge is specified as a two-element array [source node id, target node id]
.
Note on Back edges: A loop is defined by its back edge. Strictly, a loop can have only one back edge. In practice, several can be identified due to specifications such as break
statements. However, we consolidate all the back edges specified, i.e., all the loops that contain a common set of nodes. Hence, we identify a loop as a set of nodes and combine all back edges contained by this set of nodes into a single loop object. This is similar to how dyninst, a popular tool for analyzing binary code refers to a loop.
The loops nested inside the current loop.
An array of function objects as specified below.
A function object contains the information related to the function. The only required fields are id
and nodes
.
The id of the function. NOTE: The id should be unique across all the nodes, loops, and functions inside the graph since they are all stored as nodes inside a compound graph.
The label for the function. If label
is not provided, id
is used as the label.
The array of node ids inside the function.
This section describes the rendering options that can be used to change the rendering and layout of the graph.
This object is a container for all the global properties for the graph rendering.
This contains the global properties for nodes as defined in dot graphviz e.g. shape
, style
, class
etc. CFGConf defines an additional keyword label
that lets you choose between the node id or full label to be displayed inside the node in the drawing of the graph.
label
lets you choose between the node id or full label to be displayed inside the node in the drawing of the graph. To display the node id in the drawing of the node, set the value to "id"
. The default value is "full"
which displays the full label in the drawing of the node.
This contains the global properties for edges as defined in dot graphviz e.g. style
, color
, class
etc.
The program produces a graph in the dot graphviz format after processing the CFGConf JSON file. Hence, any global graph layout options supported by
dot
can be included inside therendering
key. Listed below are commonly used dot Graphviz attributes for layout/rendering of the graph.
Direction of graph layout. The default value is "TB"
for Top to Bottom direction. Can be one of "TB"
(Top to Bottom), "BT"
(Bottom to Top), "LR"
(Left to Right), or "RL"
(Right to Left) direction.
The minimum separation between the nodes in the same layer. Nodes in the same layer have the same vertical coordinate.
The minimum separation between the nodes in the adjacent layers.
A container for the global properties for loop layout.
This key determines whether the loop background is drawn or not. Default value is true
. The loops in the graph are drawn with orange background. Inner loops are drawn in darker orange color. The back edges are larger in width and drawn in magenta color.
This contains the global properties for function layout.
This key determines whether functions are drawn with an enclosing rectangle. If true
, the nodes in a function are grouped together and a blue rectangular boundary is drawn enclosing the function nodes. The function label is displayed in the top middle area inside the boundary. Default value is true
.
The collapsingRules
key contains the rules/conditions for collapsing and duplicating the functions. The purpose of collapsing and duplicating is to de-clutter the layout by de-emphasizing high-degree functions that are not the focus of the depiction, such a library and utility functions. These rules describe which functions to collapse as well as overrides to these rules.
Functions containing loops are never collapsed. When all the conditions for collapsing a function without a full loop are met, the function is collapsed. When a function is collapsed, the function boundary and the nodes inside the function are removed and the function is replaced with a single node representing the whole function. The node is drawn with a dotted boundary with the function name as the label of the node. Hovering over this node shows more information about the collapsed function. This function node is duplicated throughout the graph i.e. whenever there is an edge between the nodes in the collapsed function and an outside node, an edge is drawn from a duplicated version of the function node to the outside node.
Rationale: Generally, this feature is aimed at functions that do not play a part in the main logic of the portion of the graph being examined. This is often true of library/utility functions. Since these library/utility functions are called by many parts of the code, the drawing of the graph can get cluttered with many edges coming into these functions resulting in dense areas and a high number of edge crossings. Collapsing a function node and duplicating it throughout the graph reduces the density of the drawing and the number of edge crossings.
This key determines if function collapsing is enabled. Default value is false
.
The minimum number of incoming edges to the function for the function to be eligible for collapsing. Default value is 10
.
Rationale: Typically library/utility functions are called frequently, thus they have a large number of incoming (call) edges.
The minimum number of outgoing edges from the function for the function to be eligible for collapsing. Default value is 0
.
Rationale: Typically library/utility functions are called frequently, thus they have a large number of outgoing (return) edges.
The minimum size of the function in the visible graph for the function to be eligible for collapsing. It can be specified as a number or percentage of the nodes in the visible graph. Append p at the end of the value to specify percentage. e.g. a value of "5p"
equals 5 percent of nodes in the visible graph. Default value is 1
.
Rationale: If there are few nodes associated with the function in the focus graph, it may be unnecessary to collapse/duplicate.
The maximum size of the function in the visible graph for the function to be eligible for collapsing. It can be specified as a number or percentage of the nodes in the visible graph. Append p at the end of the value to specify percentage. e.g. a value of "5p"
equals 5 percent of nodes in the visible graph. Default value is "10p"
.
Rationale: If there are a significant number of nodes associated with the function in the focus graph, it may be of more interest to examine.
The list of function ids that should always be collapsed. Generally, the rules specified above are used to collapse the functions. However, this option provides a manual override in case a function always needs to be collapsed.
The list of function ids that should never be collapsed. Generally, the rules specified above are used to collapse the functions. However, this option provides a manual override in case a function should never be collapsed.
This section describes the filtering options to obtain a filtered subgraph of the input graph.
The filtering options provide a way to subset the graph into a subgraph of interest.
This feature intended for large graphs i.e. graphs with more than a thousand nodes. The dot graphviz layout algorithm can take a long time to produce a drawing or even fail when the graph size is very large. To complete the drawing process in a timely manner, it is a good idea to filter the graph to a subgraph of interest.
The filtering options provide a way to filter the graph based on a set of selectedNodes
. These nodes act as the seed or the starting point for the filtering process. The filtering is hop-based i.e. any nodes reachable within a specified number of hops from the starting nodes is included in the filtered graph and rendered. By default, the filtering mode will also select all the nodes any loop that has one node reachable in the filter set so as to preserve structures. This can be switched off using the isLoopFilterOn
key.
We refer to the nodes that are immediate neighbors of nodes in the filtered graph but are not part of the filtered graph as boundary nodes. The boundary nodes are drawn as small ellipses hanging off the filtered graph to provide context. When multiple boundary nodes are attached to a node in the filtered graph, the boundary nodes are shown as stacked ellipses and the aggregate count of boundary nodes is displayed on the side of the stacked ellipse.
The node ids of the starting nodes used for filtering the graph. If selectedNodes
is empty but filtering is enabled (see isHopFilterOn
), we use the first node of the graph as the starting node. All nodes in the selectedNodes
are highlighted with a teal border in the graph as well.
This key determines if the filtering is enabled. Default value is false
.
The maximum number of hops performed during hop-based graph filtering. 0 returns only the selectedNodes
with no hops performed. NOTE: The number of nodes in the filtered graph grows exponentially as the number of hops increases, so it is recommended this value be small. Default value is 3
.
The minimum number of nodes to retrieve when performing hop-based filtering. When this minimum is reached, we stop the next hop of filtering. The hop-based filtering starts with the first hop i.e. immediate neighbors of the starting nodes. As we perform more hops, the immediate neighbors of the current filtered graph are included in the resulting graph. When the minimum number of nodes is reached, we stop the remaining hops for the filtering. Default value is 20
.
This key determines if loop-based filtering is enabled. Default value is true
. Loop filtering is performed before hop-based filtering. If loop-based filtering is enabled, all loops that contain any of the nodes in the selectedNodes
are included as starting nodes for hop-based filtering.