Skip to content

PARQUET-386: Printing out the statistics of metadata in parquet-tools#279

Closed
onursoyer wants to merge 1 commit intoapache:masterfrom
onursoyer:master
Closed

PARQUET-386: Printing out the statistics of metadata in parquet-tools#279
onursoyer wants to merge 1 commit intoapache:masterfrom
onursoyer:master

Conversation

@onursoyer
Copy link

https://issues.apache.org/jira/browse/PARQUET-386

parquet-tools does not print statistics in column metadata.

In this PR, it prints the statistics if there is statistics data in column metadata.

@HyukjinKwon
Copy link
Member

Doesn't this test failure look also due to #269?

@julienledem
Copy link
Member

This looks good to me.

@liancheng
Copy link
Contributor

+1

@julienledem Should we merge this?

@HyukjinKwon
Copy link
Member

ping @julienledem @rdblue

@rdblue
Copy link
Contributor

rdblue commented May 23, 2016

We already merged a commit that added the stats: 7419443. That should be out in the 1.9.0 release.

@onursoyer
Copy link
Author

Sorry, I could not find the duplicate. Closing this.

@onursoyer onursoyer closed this May 24, 2016
@rdblue
Copy link
Contributor

rdblue commented May 24, 2016

Thanks @onursoyer!

@majetideepak
Copy link

The commit 7419443 only prints statistics in DataPageHeader
I think this patch prints the statistics in the ColumnMetaData. This patch should be committed too.

@onursoyer
Copy link
Author

Oh, sorry, I thought this was resolved in the other pull request. Reopening this. @rdblue @liancheng

@onursoyer onursoyer reopened this Nov 2, 2016

if (!meta.getStatistics().isEmpty()) {
out.format(" MIN:%s", meta.getStatistics().genericGetMin().toString());
out.format(" MAX:%s", meta.getStatistics().genericGetMax().toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we format the statistics similar to the values ? https://github.com/apache/parquet-mr/blob/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-tools/src/main/java/org/apache/parquet/tools/command/DumpCommand.java#L308
toString() is inconsistent since all values are converted to UTF8 strings. Values such as timestamps (INT96) and decimals could be converted to unrecognizable characters like
ST:min:
�����
}%, max: �����
}%, num_nulls: 0

Copy link
Author

@onursoyer onursoyer Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak You are right. I have updated the code. Now It should work as you have suggested.

@onursoyer
Copy link
Author

@majetideepak
Copy link

+1 LGTM

@kkhatua
Copy link

kkhatua commented Jan 4, 2019

Is there a reason why this PR wasn't committed?

@gszadovszky
Copy link
Contributor

Closing this PR as this functionality has already been implemented. See #442 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants