Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement EXPLAIN ANALYZE #779

Closed
alamb opened this issue Jul 26, 2021 · 1 comment · Fixed by #858
Closed

Implement EXPLAIN ANALYZE #779

alamb opened this issue Jul 26, 2021 · 1 comment · Fixed by #858
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 26, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Now that we have EXPLAIN <query> we know what plan DataFusion will execute. However, there is no particularly easy way to see what actually did happen (e.g how many rows were actually read / filtered by each operator).

Describe the solution you'd like
I would like to extend DataFusion's EXPLAIN functionality to also include the ability to actually run the plan, capture metrics, and display them

I imagine something like the following (adding the executed_plan row)

> EXPLAIN ANALYZE SELECT * from foo;
+---------------+--------------------------------------------------------------------------+
| plan_type     | plan                                                                     |
+---------------+--------------------------------------------------------------------------+
| logical_plan  | Projection: #foo.x                                                       |
|               |   TableScan: foo projection=Some([0])                                    |
| physical_plan | ProjectionExec: expr=[x@0 as x]                                          |
|               |   RepartitionExec: partitioning=RoundRobinBatch(16)                      |
|               |     CsvExec: source=Path(/tmp/foo.csv: [/tmp/foo.csv]), has_header=false |
| executed_plan | ProjectionExec:  num_rows=2 exec_ms=6                                       |
|               |   RepartitionExec:  num_rows=2 exec_ms=4                    |
|               |     CsvExec: num_rows=2, exec_ms=300  |
+---------------+--------------------------------------------------------------------------+

2 rows in set. Query took 0.002 seconds.

Additional context
We probably need something like #679 completed prior to doing this?

cc @Dandandan and @andygrove

@alamb alamb added the enhancement New feature or request label Jul 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Jul 26, 2021

There is some prior work by @andygrove here: #662 (added the with_metrics for displayable physical plans)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant