Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

Implement Parquet delta encoding for the vectorized interface, which is needed for V2 pages. The implementation simply delegates the decoding to the Parquet implementation.

Reference:

Parquet encodings
DefaultV2ValuesWriterFactory

Why are the changes needed?

Support for parquet DataPageV2.

Does this PR introduce any user-facing change?

How was this patch tested?

New UT.

Credit to @nandorKollar, Closes #23988.

@SparkQA
Copy link

SparkQA commented Jan 22, 2020

Test build #117210 has finished for PR 27316 at commit 947c6f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @rdblue

*/
public class VectorizedDeltaBinaryPackedReader extends ValuesReader
implements VectorizedValuesReader {
private final DeltaBinaryPackingValuesReader valuesReader = new DeltaBinaryPackingValuesReader();
Copy link
Member

@kiszk kiszk Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse DeltaBinaryPackingValuesReader multiple times while I am not familiar with Parquet?

When DeltaBinaryPackinginitFromPage.initFromPage is called, valuesBuffer is allocated every time at here. On the other hand, valuesBuffer is not initialized at DeltaBinaryPackinginitFromPage.initFromPage.

I am curious about what happens if initFromPage is called multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
The current usage keeps the same pattern with parquet inside, DeltaBinaryPackingValuesReader is also reused in the encoding DeltaByteArrayReader and DeltaLengthByteArrayValuesReader.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 15, 2020
@github-actions github-actions bot closed this May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants