Skip to content

PARQUET-548: Add EncodingStats.#332

Closed
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:PARQUET-548-add-encoding-stats
Closed

PARQUET-548: Add EncodingStats.#332
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:PARQUET-548-add-encoding-stats

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Feb 26, 2016

This adds EncodingStats, which tracks the number of pages for each encoding, separated into dictionary and data pages. It also adds convenience functions that are useful for dictionary filtering, like hasDictionaryEncodedPages and hasNonDictionaryEncodedPages.

EncodingStats have a unit test in parquet-column and an integration test in parquet-hadoop that writes a file and verifies the stats are present and correct when it is read.

This includes commits from #330 because it updates the dictionary filter. I'll rebase and remove them once it is merged.

@rdblue rdblue force-pushed the PARQUET-548-add-encoding-stats branch 3 times, most recently from 56c2486 to c5f424f Compare March 14, 2016 18:57
@rdblue rdblue force-pushed the PARQUET-548-add-encoding-stats branch from c5f424f to e5c28a2 Compare March 16, 2016 23:18
@rdblue rdblue force-pushed the PARQUET-548-add-encoding-stats branch from e5c28a2 to dc332d3 Compare March 17, 2016 20:59
@rdblue
Copy link
Contributor Author

rdblue commented Apr 21, 2016

@danielcweeks and @julienledem, this would be a good one to get into 1.9.0 if we can.

import static org.apache.parquet.column.Encoding.PLAIN_DICTIONARY;
import static org.apache.parquet.column.Encoding.RLE_DICTIONARY;

public class EncodingStats {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a short description in javadoc

@julienledem
Copy link
Member

+1 LGTM

@asfgit asfgit closed this in 3dd2210 Apr 23, 2016
piyushnarang pushed a commit to piyushnarang/parquet-mr that referenced this pull request Jun 15, 2016
This adds `EncodingStats`, which tracks the number of pages for each encoding, separated into dictionary and data pages. It also adds convenience functions that are useful for dictionary filtering, like `hasDictionaryEncodedPages` and `hasNonDictionaryEncodedPages`.

`EncodingStats` have a unit test in parquet-column and an integration test in parquet-hadoop that writes a file and verifies the stats are present and correct when it is read.

This includes commits from apache#330 because it updates the dictionary filter. I'll rebase and remove them once it is merged.

Author: Ryan Blue <blue@apache.org>

Closes apache#332 from rdblue/PARQUET-548-add-encoding-stats and squashes the following commits:

5f148e6 [Ryan Blue] PARQUET-548: Fixes for review comments.
dc332d3 [Ryan Blue] PARQUET-548: Add EncodingStats.
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jul 13, 2016
This adds `EncodingStats`, which tracks the number of pages for each encoding, separated into dictionary and data pages. It also adds convenience functions that are useful for dictionary filtering, like `hasDictionaryEncodedPages` and `hasNonDictionaryEncodedPages`.

`EncodingStats` have a unit test in parquet-column and an integration test in parquet-hadoop that writes a file and verifies the stats are present and correct when it is read.

This includes commits from apache#330 because it updates the dictionary filter. I'll rebase and remove them once it is merged.

Author: Ryan Blue <blue@apache.org>

Closes apache#332 from rdblue/PARQUET-548-add-encoding-stats and squashes the following commits:

5f148e6 [Ryan Blue] PARQUET-548: Fixes for review comments.
dc332d3 [Ryan Blue] PARQUET-548: Add EncodingStats.

Conflicts:
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java
Resolution:
    Minor formatting changes conflicted with wrapping encodings in a HashSet.
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
This adds `EncodingStats`, which tracks the number of pages for each encoding, separated into dictionary and data pages. It also adds convenience functions that are useful for dictionary filtering, like `hasDictionaryEncodedPages` and `hasNonDictionaryEncodedPages`.

`EncodingStats` have a unit test in parquet-column and an integration test in parquet-hadoop that writes a file and verifies the stats are present and correct when it is read.

This includes commits from apache#330 because it updates the dictionary filter. I'll rebase and remove them once it is merged.

Author: Ryan Blue <blue@apache.org>

Closes apache#332 from rdblue/PARQUET-548-add-encoding-stats and squashes the following commits:

5f148e6 [Ryan Blue] PARQUET-548: Fixes for review comments.
dc332d3 [Ryan Blue] PARQUET-548: Add EncodingStats.

Conflicts:
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java
Resolution:
    Minor formatting changes conflicted with wrapping encodings in a HashSet.
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017
This adds `EncodingStats`, which tracks the number of pages for each encoding, separated into dictionary and data pages. It also adds convenience functions that are useful for dictionary filtering, like `hasDictionaryEncodedPages` and `hasNonDictionaryEncodedPages`.

`EncodingStats` have a unit test in parquet-column and an integration test in parquet-hadoop that writes a file and verifies the stats are present and correct when it is read.

This includes commits from apache#330 because it updates the dictionary filter. I'll rebase and remove them once it is merged.

Author: Ryan Blue <blue@apache.org>

Closes apache#332 from rdblue/PARQUET-548-add-encoding-stats and squashes the following commits:

5f148e6 [Ryan Blue] PARQUET-548: Fixes for review comments.
dc332d3 [Ryan Blue] PARQUET-548: Add EncodingStats.

Conflicts:
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java
Resolution:
    Minor formatting changes conflicted with wrapping encodings in a HashSet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants