Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STAL-2792] Add CLI option to export data flow graphs #535

Merged
merged 10 commits into from
Oct 25, 2024
Merged

Conversation

jasonforal
Copy link
Collaborator

What problem are you trying to solve?

We'd like to be able to export the taint analysis graphs for each file as a debugging tool.

What is your solution?

This PR is necessarily large because:

  1. The original v8 -> DOT graph functionality was specifically implemented for the test environment only -- it assumed trusted input (e.g. a graph with valid node ids), made liberal use of unwrap, and used internal apis on the JsRuntime that should not be exposed crate-wide.
  2. The original v8 -> DOT graph functionality was only implemented for each individual method -- being able to construct the graph of a class requires use of subgraphs, which had not previously been implemented.
  3. In a DOT graph, nodes need to have globally unique names regardless of subgraph. This requires us to rename phi nodes because their name is only unique within each method ("phi0", "phi1", etc).

Implementation Highlights

A. v8 -> DOT is now a deno op
Because the canonical DOT representation requires access to internal data structures (e.g. the TsNode and Context bridge in order to get the node text/position), rather than leak the innards of the JsRuntime, the functionality is exposed as a deno op.

  • Although this is intended as a private/internal function, a rule could invoke this op due to it being in the public scope. Thus, it needs to handle invalid data gracefully.

B. Hack: use Rule abstractions to pass data
The logic to generate the graph dots is modeled as a Rule, which allows us to use execute_rule, avoiding the need to duplicate/re-implement the Rust -> v8 bridging (here and here).

This is very much a hack: I pass the DOT strings from the JsRuntime by repurposing Violation/Fix. This allows me to easily associate the DOT graphs for multiple methods to multiples classes. (This is ultimately more simple than trying to pass this data via console.log)

C. .dot files are written adjacent to scanned files
This will necessarily dirty the git working directory, however it is gated behind an opt-in cli flag. Because this is intended as a debug tool, errors are silently swallowed rather than handled.
Example:
Pre-analysis:

├── some_folder
│   ├── file1.java
│   └── file2.java
└── file3.java
❯ datadog-static-analyzer <...snip> --debug-export-java-dfa

Post-analysis:

├── some_folder
│   ├── file1.java
│   ├── file1.dot
│   ├── file2.java
│   └── file2.dot
├── file3.java
└── file3.dot

D. Generic abstraction
By not coupling to Java or Java's semantics, the DigraphCollection can be used for other languages or use cases--for example, generating interfile Java graphs would only involve adding an extra collection.

Sample output

File:

public class ClassA {
    String format(String a) {
        String query;
        if (someCondition) {
            query = "prefix_" + a;
        } else {
            query = a + "_suffix";
        }
        return query;
    }
    void log(String a) {
        String query;
        if (someCondition) {
            query = "[INFO] " + a;
        } else {
            query = "[WARN] " + a;
        }
        System.out.println(query);
    }
}

Output:

strict digraph "some_folder/file1.java" {
  label="some_folder/file1.java"
  subgraph "cluster: ClassA" {
    label=ClassA
    subgraph "cluster: String format(String a)" {
      label="String format(String a)"
      "a:5:33"[text=a,line=5,col=33,cstkind=identifier,vkind=cst]
      "\"prefix_\" + a:5:21"[text="\"prefix_\" + a",line=5,col=21,cstkind=binary_expression,vkind=cst]
      "query:5:13"[text=query,line=5,col=13,cstkind=identifier,vkind=cst]
      "a:7:21"[text=a,line=7,col=21,cstkind=identifier,vkind=cst]
      "a + \"_suffix\":7:21"[text="a + \"_suffix\"",line=7,col=21,cstkind=binary_expression,vkind=cst]
      "query:7:13"[text=query,line=7,col=13,cstkind=identifier,vkind=cst]
      phi0_0[shape=diamond,vkind=phi]
      "a:2:26"[text=a,line=2,col=26,cstkind=identifier,vkind=cst]
      "a:5:33" -> "a:2:26" [kind=dependence]
      "\"prefix_\" + a:5:21" -> "a:5:33" [kind=dependence]
      "query:5:13" -> "\"prefix_\" + a:5:21" [kind=assignment]
      "a:7:21" -> "a:2:26" [kind=dependence]
      "a + \"_suffix\":7:21" -> "a:7:21" [kind=dependence]
      "query:7:13" -> "a + \"_suffix\":7:21" [kind=assignment]
      phi0_0 -> "query:5:13" [kind=dependence]
      phi0_0 -> "query:7:13" [kind=dependence]
    }
    subgraph "cluster: void log(String a)" {
      label="void log(String a)"
      "a:14:33"[text=a,line=14,col=33,cstkind=identifier,vkind=cst]
      "\"[INFO] \" + a:14:21"[text="\"[INFO] \" + a",line=14,col=21,cstkind=binary_expression,vkind=cst]
      "query:14:13"[text=query,line=14,col=13,cstkind=identifier,vkind=cst]
      "a:16:33"[text=a,line=16,col=33,cstkind=identifier,vkind=cst]
      "\"[WARN] \" + a:16:21"[text="\"[WARN] \" + a",line=16,col=21,cstkind=binary_expression,vkind=cst]
      "query:16:13"[text=query,line=16,col=13,cstkind=identifier,vkind=cst]
      phi0_1[shape=diamond,vkind=phi]
      "query:18:28"[text=query,line=18,col=28,cstkind=identifier,vkind=cst]
      "(query):18:27"[text="(query)",line=18,col=27,cstkind=argument_list,vkind=cst]
      "System.out.println(query):18:9"[text="System.out.println(query)",line=18,col=9,cstkind=method_invocation,vkind=cst]
      "a:11:21"[text=a,line=11,col=21,cstkind=identifier,vkind=cst]
      "a:14:33" -> "a:11:21" [kind=dependence]
      "\"[INFO] \" + a:14:21" -> "a:14:33" [kind=dependence]
      "query:14:13" -> "\"[INFO] \" + a:14:21" [kind=assignment]
      "a:16:33" -> "a:11:21" [kind=dependence]
      "\"[WARN] \" + a:16:21" -> "a:16:33" [kind=dependence]
      "query:16:13" -> "\"[WARN] \" + a:16:21" [kind=assignment]
      phi0_1 -> "query:14:13" [kind=dependence]
      phi0_1 -> "query:16:13" [kind=dependence]
      "query:18:28" -> phi0_1 [kind=dependence]
      "(query):18:27" -> "query:18:28" [kind=dependence]
      "System.out.println(query):18:9" -> "(query):18:27" [kind=dependence]
    }
  }
}

Visualized:
image

Alternatives considered

  • Keep the current v8 -> DOT pattern (i.e. don't implement as a deno op)

What the reviewer should know

@jasonforal jasonforal requested a review from juli1 October 25, 2024 18:57
@jasonforal jasonforal requested a review from a team as a code owner October 25, 2024 18:57
@jasonforal jasonforal merged commit 3228c65 into main Oct 25, 2024
70 checks passed
@jasonforal jasonforal deleted the jf/STAL-2792 branch October 25, 2024 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants