-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parquet partition_file output display #4467
Conversation
} | ||
|
||
#[test] | ||
fn file_groups_display() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to the changes in this PR, this was displayed as "[foo, bar, baz, ]"
I think this PR is much clearer
@@ -183,17 +183,27 @@ struct FileGroupsDisplay<'a>(&'a [Vec<PartitionedFile>]); | |||
|
|||
impl<'a> Display for FileGroupsDisplay<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation also avoids allocating strings and Vecs, which I don't think is a particularly important feature, but it is a nice benefit.
88e42a7
to
efab485
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rdettai who I believe contributed the original formatting code
@@ -696,7 +696,7 @@ async fn test_physical_plan_display_indent() { | |||
" CoalesceBatchesExec: target_batch_size=4096", | |||
" FilterExec: c12@1 < 10", | |||
" RepartitionExec: partitioning=RoundRobinBatch(9000)", | |||
" CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]", | |||
" CsvExec: files=1:[[ARROW_TEST_DATA/csv/aggregate_test_100.csv]], has_header=true, limit=None, projection=[c1, c12]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried several variations on this formatting to show the file names as well as the number of partitions
Other thoughts?
maybe adding a brace and a key?
" CsvExec: files=1:[[ARROW_TEST_DATA/csv/aggregate_test_100.csv]], has_header=true, limit=None, projection=[c1, c12]", | |
" CsvExec: files={groups:1, [[ARROW_TEST_DATA/csv/aggregate_test_100.csv]]}, has_header=true, limit=None, projection=[c1, c12]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you only spell out groups
once, I think it could be very helpful to spell this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this very much. Would prefer the slightly more verbose version, but that's not a blocker.
@@ -696,7 +696,7 @@ async fn test_physical_plan_display_indent() { | |||
" CoalesceBatchesExec: target_batch_size=4096", | |||
" FilterExec: c12@1 < 10", | |||
" RepartitionExec: partitioning=RoundRobinBatch(9000)", | |||
" CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]", | |||
" CsvExec: files=1:[[ARROW_TEST_DATA/csv/aggregate_test_100.csv]], has_header=true, limit=None, projection=[c1, c12]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you only spell out groups
once, I think it could be very helpful to spell this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of explicitly identifying what the integer represents, the partition count, but equally this is an improvement already so approving 👍
I have changed it to be
Which I think is better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge! A nice improvement
Benchmark runs are scheduled for baseline = 156a594 and contender = df41267. df41267 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4466
Rationale for this change
See #4466
What changes are included in this PR?
Improve the formatting for
Vec<Vec<PartitionFile>>
Are these changes tested?
Yes
Are there any user-facing changes?
better explain output