-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor #8619
Conversation
I am sorry for the PR size -- but it is mostly comments and tests |
/// ``` | ||
/// | ||
/// Example use: TODO | ||
pub trait PlanVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this introduction is the most important thing about this PR -- I would like to move most of the recursive walks of LogicalPlans
to be in terms of such a structure.
/// format!("{}", display_schema(&schema)) | ||
/// ); | ||
/// ``` | ||
pub fn display_schema<'a>(schema: &'a Schema) -> impl fmt::Display + 'a { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this belongs in arrow rather than DataFusion? I could go either way
|
||
writeln!( | ||
self.f, | ||
" {}[label={}]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the plans would look better and use less space if we added shape=box
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb. I am also a fan of using GraphViz to render query plans so this gets a 👍 from me. I will try and find time to review fully later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool!
I went through this and it looks good to me 👍
What do you think of adding instructions on how to create the graphs in the README, or as an example? Atm it feels that this feature may not be utilized due to its unknown to people.
A minor concern is the file size: I would be supportive of moving the visitor to a different file, to avoid exploding this one even further. A later PR is also fine, though :P
Great idea. I have added some instructions in b1965fa
I will do so as a follow on PR after this is merged @andygrove -- I will wait for your review / feedback before doing anything more with this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @alamb
Thanks @andygrove and @jorgecarleitao -- I plan to merge this PR and then make a new one to break the code into smaller modules |
…ith schema and graphviz, and a PlanVisitor
b1965fa
to
44b9f8f
Compare
Rebased and will merge when it passes CI |
#8639 is the PR for breaking up the logical_plan module |
… modules The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge #8619 (review) This PR just moves code around -- it is not intended to change any semantics Reviewing it commit-by-commit might be helpful to see how each piece went I can also break it up into a sequence of smaller PRs if that would help review Closes #8639 from alamb/alamb/ARROW-10559-split-up Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
…or LogicalPlans and a PlanVisitor # Rationale: I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items: 1. Schema information (as in when did columns appear / disappear in the plan) 2. A visual representation (graphviz) # Open questions: 1. Would it be better to split the visitor into `visitor.rs` and display code into `display.rs`? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big. # Changes: This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats. Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746 ## Existing Formatting Here is what master currently allows: ``` Projection: #id Filter: #state Eq Utf8(\"CO\")\ CsvScan: employee.csv projection=Some([0, 3]) ``` ## With Schema Information. This PR adds a dump with schema information: ``` Projection: #id [id:Int32]\ Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\ TableScan: employee.csv projection=Some([0, 3]) [id:Int32, state:Utf8]"; ``` ## As Graphviz This PR adds the ability to display plans using [Graphviz](http://www.graphviz.org) Here is an example GraphViz plan that comes out: ``` // Begin DataFusion GraphViz Plan (see https://graphviz.org) digraph { subgraph cluster_1 { graph[label="LogicalPlan"] 2[label="Projection: #id"] 3[label="Filter: #state Eq Utf8(_CO_)"] 2 -> 3 [arrowhead=none, arrowtail=normal, dir=back] 4[label="TableScan: employee.csv projection=Some([0, 3])"] 3 -> 4 [arrowhead=none, arrowtail=normal, dir=back] } subgraph cluster_5 { graph[label="Detailed LogicalPlan"] 6[label="Projection: #id\nSchema: [id:Int32]"] 7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"] 6 -> 7 [arrowhead=none, arrowtail=normal, dir=back] 8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: [id:Int32, state:Utf8]"] 7 -> 8 [arrowhead=none, arrowtail=normal, dir=back] } } // End DataFusion GraphViz Plan ``` Here is what that looks like rendered: <img width="1679" alt="Screen Shot 2020-11-09 at 2 30 07 PM" src="https://user-images.githubusercontent.com/490673/98606322-0f891880-22b5-11eb-8e1c-669ce85f0f52.png"> Closes apache#8619 from alamb/alamb/improved-display Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: alamb <andrew@nerdnetworks.org>
… modules The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review) This PR just moves code around -- it is not intended to change any semantics Reviewing it commit-by-commit might be helpful to see how each piece went I can also break it up into a sequence of smaller PRs if that would help review Closes apache#8639 from alamb/alamb/ARROW-10559-split-up Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
…or LogicalPlans and a PlanVisitor # Rationale: I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items: 1. Schema information (as in when did columns appear / disappear in the plan) 2. A visual representation (graphviz) # Open questions: 1. Would it be better to split the visitor into `visitor.rs` and display code into `display.rs`? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big. # Changes: This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats. Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746 ## Existing Formatting Here is what master currently allows: ``` Projection: #id Filter: #state Eq Utf8(\"CO\")\ CsvScan: employee.csv projection=Some([0, 3]) ``` ## With Schema Information. This PR adds a dump with schema information: ``` Projection: #id [id:Int32]\ Filter: #state Eq Utf8(\"CO\") [id:Int32, state:Utf8]\ TableScan: employee.csv projection=Some([0, 3]) [id:Int32, state:Utf8]"; ``` ## As Graphviz This PR adds the ability to display plans using [Graphviz](http://www.graphviz.org) Here is an example GraphViz plan that comes out: ``` // Begin DataFusion GraphViz Plan (see https://graphviz.org) digraph { subgraph cluster_1 { graph[label="LogicalPlan"] 2[label="Projection: #id"] 3[label="Filter: #state Eq Utf8(_CO_)"] 2 -> 3 [arrowhead=none, arrowtail=normal, dir=back] 4[label="TableScan: employee.csv projection=Some([0, 3])"] 3 -> 4 [arrowhead=none, arrowtail=normal, dir=back] } subgraph cluster_5 { graph[label="Detailed LogicalPlan"] 6[label="Projection: #id\nSchema: [id:Int32]"] 7[label="Filter: #state Eq Utf8(_CO_)\nSchema: [id:Int32, state:Utf8]"] 6 -> 7 [arrowhead=none, arrowtail=normal, dir=back] 8[label="TableScan: employee.csv projection=Some([0, 3])\nSchema: [id:Int32, state:Utf8]"] 7 -> 8 [arrowhead=none, arrowtail=normal, dir=back] } } // End DataFusion GraphViz Plan ``` Here is what that looks like rendered: <img width="1679" alt="Screen Shot 2020-11-09 at 2 30 07 PM" src="https://user-images.githubusercontent.com/490673/98606322-0f891880-22b5-11eb-8e1c-669ce85f0f52.png"> Closes apache#8619 from alamb/alamb/improved-display Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: alamb <andrew@nerdnetworks.org>
… modules The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review) This PR just moves code around -- it is not intended to change any semantics Reviewing it commit-by-commit might be helpful to see how each piece went I can also break it up into a sequence of smaller PRs if that would help review Closes apache#8639 from alamb/alamb/ARROW-10559-split-up Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Rationale:
I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug formatting is ok, but it was missing a few key items:
Open questions:
visitor.rs
and display code intodisplay.rs
? I am torn -- this is all logically part of logical_plan, but the module is getting kind of big.Changes:
This PR adds several additional formatting options to logical plans in addition to the existing indent. Examples are included below
To do so it also provides a generalized "Visitor" pattern for walking logical plan nodes, as well as a general pattern to display logical plan nodes with multiple potential formats.
Note it should be straight forward to get this wired up into EXPALIN as well: https://issues.apache.org/jira/browse/ARROW-9746
Existing Formatting
Here is what master currently allows:
With Schema Information.
This PR adds a dump with schema information:
As Graphviz
This PR adds the ability to display plans using Graphviz
Here is an example GraphViz plan that comes out:
Here is what that looks like rendered: