-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
PARQUET-384 needs to determine whether an entire column chunk is dictionary-encoded, but it is difficult to detect that case based on the set of encodings for a column. For 1.0, this can be done by checking for a PLAIN page because both dictionary pages and dictionary-encoded pages use PLAIN_DICTIONARY and RLE/BIT_PACKING is only used for repetition and definition levels. But for 2.0, dictionary pages might be using PLAIN and there is no way to tell if a column has fallen back.
PageEncodingStats were added to the format to solve this problem, so we just need to implement them.
Reporter: Ryan Blue / @rdblue
Assignee: Ryan Blue / @rdblue
Related issues:
- Add Dictionary Based Filtering to Filter2 API (is related to)
PRs and other links:
Note: This issue was originally created as PARQUET-548. Please see the migration documentation for further details.