Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display all partitions and their files in EXPLAIN VERBOSE #6383

Closed
NGA-TRAN opened this issue May 18, 2023 · 4 comments · Fixed by #6711
Closed

Display all partitions and their files in EXPLAIN VERBOSE #6383

NGA-TRAN opened this issue May 18, 2023 · 4 comments · Fixed by #6711
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@NGA-TRAN
Copy link
Contributor

Is your feature request related to a problem or challenge?

To follow the discussion here and here that even though we want to display fewer partitions in EXPLAIN, we want to display all files and partitions/groups in EXPLAIN VERBOSE

Describe the solution you'd like

Fully display the number of partitions and their files in EXPLAIN VERBOSE. For example,

EXPLAIN only include some

ParquetExec: file_groups={10 groups: [[1/1/1/00000000-0000-0000-0000-00000000000a.parquet], [1/1/1/00000000-0000-0000-0000-00000000000b.parquet], [1/1/1/00000000-0000-0000-0000-00000000000c.parquet], [1/1/1/00000000-0000-0000-0000-00000000000d.parquet], [1/1/1/00000000-0000-0000-0000-00000000000e.parquet], ...]}, projection=[__chunk_order, f, tag, time], output_ordering=[tag@2 ASC, time@3 ASC, __chunk_order@0 ASC]    |

EXPLAIN VERBOSE will include all groups and files

ParquetExec: file_groups={10 groups: [[1/1/1/00000000-0000-0000-0000-00000000000a.parquet], [1/1/1/00000000-0000-0000-0000-00000000000b.parquet], [1/1/1/00000000-0000-0000-0000-00000000000c.parquet], [1/1/1/00000000-0000-0000-0000-00000000000d.parquet], [1/1/1/00000000-0000-0000-0000-00000000000e.parquet], [1/1/1/00000000-0000-0000-0000-00000000000f.parquet], [1/1/1/00000000-0000-0000-0000-000000000010.parquet], [1/1/1/00000000-0000-0000-0000-000000000011.parquet], [1/1/1/00000000-0000-0000-0000-000000000012.parquet], [1/1/1/00000000-0000-0000-0000-000000000013.parquet]]}, projection=[__chunk_order, f, tag, time], output_ordering=[tag@2 ASC, time@3 ASC, __chunk_order@0 ASC]    |

Describe alternatives you've considered

No response

Additional context

No response

@NGA-TRAN NGA-TRAN added the enhancement New feature or request label May 18, 2023
@alamb
Copy link
Contributor

alamb commented May 19, 2023

cc @yahoNanJing and @crepererum

I think this is fairly well explained and would also be a good first issue for someone.

To reproduce locally you could use something like:

$ mkdir /tmp/foo
$ for i in `seq 1 10`;  do  echo "1" > "/tmp/foo/data$i.csv"; done
$ ls /tmp/foo/
data1.csv   data10.csv  data2.csv   data3.csv   data4.csv   data5.csv   data6.csv   data7.csv   data8.csv   data9.csv

$ datafusion-cli
DataFusion CLI v24.0.0
❯ explain select * from '/tmp/foo';
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                   |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: /tmp/foo projection=[1]                                                                                                                                                                                                                                                                                                                                     |
| physical_plan | CsvExec: file_groups={10 groups: [[private/tmp/foo/data3.csv], [private/tmp/foo/data2.csv], [private/tmp/foo/data1.csv], [private/tmp/foo/data5.csv], [private/tmp/foo/data4.csv], [private/tmp/foo/data6.csv], [private/tmp/foo/data7.csv], [private/tmp/foo/data9.csv], [private/tmp/foo/data8.csv], [private/tmp/foo/data10.csv]]}, projection=[1], has_header=true |
|               |                                                                                                                                                                                                                                                                                                                                                                        |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.033 seconds.

On master the above display is truncated. The desire is that with ❯ explain verbose select * from '/tmp/foo'; would show all the files

@alamb alamb added the good first issue Good for newcomers label May 19, 2023
@qrilka
Copy link
Contributor

qrilka commented May 25, 2023

I'll try looking into this one

@qrilka
Copy link
Contributor

qrilka commented May 29, 2023

@alamb it looks like right now there's no simple way to pass verbose close to where file list formatting needs to get changed.
My current idea is to add DisplayFormatType::Verbose into https://github.com/apache/arrow-datafusion/blob/305625ac6d5e72e3069d9ee2c64418ded69de098/datafusion/core/src/physical_plan/display.rs#L32 so it could be used in CsvExec::fmt_as. But this means that every other implementation of ExecutionPlan::fmt_as will need to accept DisplayFormatType::Default | DisplayFormatType::Verbose instead of DisplayFormatType::Default as it will no need to be changed.
Does this sound OK?

@alamb
Copy link
Contributor

alamb commented May 30, 2023

My current idea is to add DisplayFormatType::Verbose into

I think that is a good idea

But this means that every other implementation of ExecutionPlan::fmt_as will need to accept DisplayFormatType::Default | DisplayFormatType::Verbose instead of DisplayFormatType::Default as it will no need to be changed.
Does this sound OK?

I do think it sounds ok -- I think the point of DisplayFormatType is exactly what you are trying to use it for, and its existence should give people the hint that other types may be added

I think it would be fine for all the existing ExecutionPlan::fmt_as implementations to display the same content for DisplayFormatType::Default | DisplayFormatType::Verbose initially and we can add more details to the Verbose version over time .

Thank you again for looking into this

qrilka added a commit to qrilka/arrow-datafusion that referenced this issue Jun 17, 2023
Adds DisplayAs trait for structs which could show more details when
formatted in the verbose mode
Resolves apache#6383
qrilka added a commit to qrilka/arrow-datafusion that referenced this issue Jun 17, 2023
Adds DisplayAs trait for structs which could show more details when
formatted in the verbose mode
Resolves apache#6383
qrilka added a commit to qrilka/arrow-datafusion that referenced this issue Jun 17, 2023
Adds DisplayAs trait for structs which could show more details when
formatted in the verbose mode
Resolves apache#6383
alamb added a commit that referenced this issue Jun 20, 2023
Adds DisplayAs trait for structs which could show more details when
formatted in the verbose mode
Resolves #6383

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants