PARQUET-386: Printing out the statistics of metadata in parquet-tools#279
PARQUET-386: Printing out the statistics of metadata in parquet-tools#279onursoyer wants to merge 1 commit intoapache:masterfrom
Conversation
|
Doesn't this test failure look also due to #269? |
|
This looks good to me. |
|
+1 @julienledem Should we merge this? |
|
ping @julienledem @rdblue |
|
We already merged a commit that added the stats: 7419443. That should be out in the 1.9.0 release. |
|
Sorry, I could not find the duplicate. Closing this. |
|
Thanks @onursoyer! |
|
The commit 7419443 only prints statistics in |
|
Oh, sorry, I thought this was resolved in the other pull request. Reopening this. @rdblue @liancheng |
|
|
||
| if (!meta.getStatistics().isEmpty()) { | ||
| out.format(" MIN:%s", meta.getStatistics().genericGetMin().toString()); | ||
| out.format(" MAX:%s", meta.getStatistics().genericGetMax().toString()); |
There was a problem hiding this comment.
Can we format the statistics similar to the values ? https://github.com/apache/parquet-mr/blob/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-tools/src/main/java/org/apache/parquet/tools/command/DumpCommand.java#L308
toString() is inconsistent since all values are converted to UTF8 strings. Values such as timestamps (INT96) and decimals could be converted to unrecognizable characters like
ST:min:
�����
}%, max: �����
}%, num_nulls: 0
There was a problem hiding this comment.
@majetideepak You are right. I have updated the code. Now It should work as you have suggested.
bc457aa to
71cff7c
Compare
|
+1 LGTM |
|
Is there a reason why this PR wasn't committed? |
|
Closing this PR as this functionality has already been implemented. See #442 for details. |
https://issues.apache.org/jira/browse/PARQUET-386
parquet-tools does not print statistics in column metadata.
In this PR, it prints the statistics if there is statistics data in column metadata.