Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor #8619

Closed
wants to merge 4 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 9, 2020

Rationale:

I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items:

  1. Schema information (as in when did columns appear / disappear in the plan)
  2. A visual representation (graphviz)

Open questions:

  1. Would it be better to split the visitor into visitor.rs and display code into display.rs? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big.

Changes:

This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below

To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats.

Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746

Existing Formatting

Here is what master currently allows:

Projection: #id
   Filter: #state Eq Utf8(\"CO\")\
       CsvScan: employee.csv projection=Some([0, 3])

With Schema Information.

This PR adds a dump with schema information:

 Projection: #id [id:Int32]\
    Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\
      TableScan: employee.csv projection=Some([0, 3]) [id:Int32, state:Utf8]";

As Graphviz

This PR adds the ability to display plans using Graphviz

Here is an example GraphViz plan that comes out:

// Begin DataFusion GraphViz Plan (see https://graphviz.org)
digraph {
  subgraph cluster_1
  {
    graph[label="LogicalPlan"]
    2[label="Projection: #id"]
    3[label="Filter: #state Eq Utf8(_CO_)"]
    2 -> 3 [arrowhead=none, arrowtail=normal, dir=back]
    4[label="TableScan: employee.csv projection=Some([0, 3])"]
    3 -> 4 [arrowhead=none, arrowtail=normal, dir=back]
  }
  subgraph cluster_5
  {
    graph[label="Detailed LogicalPlan"]
    6[label="Projection: #id\nSchema: [id:Int32]"]
    7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"]
    6 -> 7 [arrowhead=none, arrowtail=normal, dir=back]
    8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: [id:Int32, state:Utf8]"]
    7 -> 8 [arrowhead=none, arrowtail=normal, dir=back]
  }
}
// End DataFusion GraphViz Plan

Here is what that looks like rendered:
Screen Shot 2020-11-09 at 2 30 07 PM

@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2020

I am sorry for the PR size -- but it is mostly comments and tests

/// ```
///
/// Example use: TODO
pub trait PlanVisitor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this introduction is the most important thing about this PR -- I would like to move most of the recursive walks of LogicalPlans to be in terms of such a structure.

/// format!("{}", display_schema(&schema))
/// );
/// ```
pub fn display_schema<'a>(schema: &'a Schema) -> impl fmt::Display + 'a {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this belongs in arrow rather than DataFusion? I could go either way

@github-actions
Copy link

github-actions bot commented Nov 9, 2020


writeln!(
self.f,
" {}[label={}]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the plans would look better and use less space if we added shape=box here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- updated. The plans now look like this:

Screen Shot 2020-11-10 at 6 01 10 PM

@andygrove
Copy link
Member

Thanks @alamb. I am also a fan of using GraphViz to render query plans so this gets a 👍 from me. I will try and find time to review fully later today.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool!

I went through this and it looks good to me 👍

What do you think of adding instructions on how to create the graphs in the README, or as an example? Atm it feels that this feature may not be utilized due to its unknown to people.

A minor concern is the file size: I would be supportive of moving the visitor to a different file, to avoid exploding this one even further. A later PR is also fine, though :P

@alamb
Copy link
Contributor Author

alamb commented Nov 10, 2020

@jorgecarleitao

What do you think of adding instructions on how to create the graphs in the README, or as an example?

Great idea. I have added some instructions in b1965fa

A minor concern is the file size: I would be supportive of moving the visitor to a different file, to avoid exploding this one even further. A later PR is also fine, though :P

I will do so as a follow on PR after this is merged

@andygrove -- I will wait for your review / feedback before doing anything more with this PR.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @alamb

@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2020

Thanks @andygrove and @jorgecarleitao -- I plan to merge this PR and then make a new one to break the code into smaller modules

@alamb alamb force-pushed the alamb/improved-display branch from b1965fa to 44b9f8f Compare November 11, 2020 13:49
@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2020

Rebased and will merge when it passes CI

@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2020

#8639 is the PR for breaking up the logical_plan module

jorgecarleitao pushed a commit that referenced this pull request Nov 12, 2020
… modules

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge #8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

Closes #8639 from alamb/alamb/ARROW-10559-split-up

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
yordan-pavlov pushed a commit to yordan-pavlov/arrow that referenced this pull request Nov 14, 2020
…or LogicalPlans and a PlanVisitor

# Rationale:
I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items:

1. Schema information (as in when did columns appear / disappear in the plan)
2. A visual representation (graphviz)

# Open questions:
1. Would it be better to split the visitor into `visitor.rs` and display code into `display.rs`? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big.

# Changes:

This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below

To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats.

Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746

## Existing Formatting
Here is what master currently allows:

```
Projection: #id
   Filter: #state Eq Utf8(\"CO\")\
       CsvScan: employee.csv projection=Some([0, 3])
```

## With Schema Information.
This PR adds a dump with schema information:

```
 Projection: #id [id:Int32]\
    Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\
      TableScan: employee.csv projection=Some([0, 3]) [id:Int32, state:Utf8]";
```

## As Graphviz

This PR adds the ability to display plans using [Graphviz](http://www.graphviz.org)

Here is an example GraphViz plan that comes out:
```
// Begin DataFusion GraphViz Plan (see https://graphviz.org)
digraph {
  subgraph cluster_1
  {
    graph[label="LogicalPlan"]
    2[label="Projection: #id"]
    3[label="Filter: #state Eq Utf8(_CO_)"]
    2 -> 3 [arrowhead=none, arrowtail=normal, dir=back]
    4[label="TableScan: employee.csv projection=Some([0, 3])"]
    3 -> 4 [arrowhead=none, arrowtail=normal, dir=back]
  }
  subgraph cluster_5
  {
    graph[label="Detailed LogicalPlan"]
    6[label="Projection: #id\nSchema: [id:Int32]"]
    7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"]
    6 -> 7 [arrowhead=none, arrowtail=normal, dir=back]
    8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: [id:Int32, state:Utf8]"]
    7 -> 8 [arrowhead=none, arrowtail=normal, dir=back]
  }
}
// End DataFusion GraphViz Plan
```

Here is what that looks like rendered:
<img width="1679" alt="Screen Shot 2020-11-09 at 2 30 07 PM" src="https://user-images.githubusercontent.com/490673/98606322-0f891880-22b5-11eb-8e1c-669ce85f0f52.png">

Closes apache#8619 from alamb/alamb/improved-display

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: alamb <andrew@nerdnetworks.org>
yordan-pavlov pushed a commit to yordan-pavlov/arrow that referenced this pull request Nov 14, 2020
… modules

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

Closes apache#8639 from alamb/alamb/ARROW-10559-split-up

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
@alamb alamb deleted the alamb/improved-display branch December 7, 2020 19:30
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…or LogicalPlans and a PlanVisitor

# Rationale:
I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items:

1. Schema information (as in when did columns appear / disappear in the plan)
2. A visual representation (graphviz)

# Open questions:
1. Would it be better to split the visitor into `visitor.rs` and display code into `display.rs`? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big.

# Changes:

This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below

To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats.

Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746

## Existing Formatting
Here is what master currently allows:

```
Projection: #id
   Filter: #state Eq Utf8(\"CO\")\
       CsvScan: employee.csv projection=Some([0, 3])
```

## With Schema Information.
This PR adds a dump with schema information:

```
 Projection: #id [id:Int32]\
    Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\
      TableScan: employee.csv projection=Some([0, 3]) [id:Int32, state:Utf8]";
```

## As Graphviz

This PR adds the ability to display plans using [Graphviz](http://www.graphviz.org)

Here is an example GraphViz plan that comes out:
```
// Begin DataFusion GraphViz Plan (see https://graphviz.org)
digraph {
  subgraph cluster_1
  {
    graph[label="LogicalPlan"]
    2[label="Projection: #id"]
    3[label="Filter: #state Eq Utf8(_CO_)"]
    2 -> 3 [arrowhead=none, arrowtail=normal, dir=back]
    4[label="TableScan: employee.csv projection=Some([0, 3])"]
    3 -> 4 [arrowhead=none, arrowtail=normal, dir=back]
  }
  subgraph cluster_5
  {
    graph[label="Detailed LogicalPlan"]
    6[label="Projection: #id\nSchema: [id:Int32]"]
    7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"]
    6 -> 7 [arrowhead=none, arrowtail=normal, dir=back]
    8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: [id:Int32, state:Utf8]"]
    7 -> 8 [arrowhead=none, arrowtail=normal, dir=back]
  }
}
// End DataFusion GraphViz Plan
```

Here is what that looks like rendered:
<img width="1679" alt="Screen Shot 2020-11-09 at 2 30 07 PM" src="https://user-images.githubusercontent.com/490673/98606322-0f891880-22b5-11eb-8e1c-669ce85f0f52.png">

Closes apache#8619 from alamb/alamb/improved-display

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: alamb <andrew@nerdnetworks.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
… modules

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

Closes apache#8639 from alamb/alamb/ARROW-10559-split-up

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants