Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2187: Update rle_boolean_encoding file with more rows #27

Merged
merged 1 commit into from
Sep 29, 2022

Conversation

sfc-gh-nthimmegowda
Copy link
Contributor

@sfc-gh-nthimmegowda sfc-gh-nthimmegowda commented Sep 22, 2022

Jira link : https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-2187

The previous file with RLE encoding with boolean type had only 7 rows, which did not use rle-run part of the encoded data

RLE encoding has a combination of <bit-packed-run> and <rle-run> . RLE encoding

run := <bit-packed-run> | <rle-run>

Replacing with a new file with more data (68 rows) , produced by the following statement. This ensures we will have coverage for both bit-packed decoding and rle-run decoding

Create with Athena.

insert into boolean_datatype_athena VALUES (true), (false), (null), (true), (true), (false), (false), (true) , (true), (true), (false), (false), (true),
(true), (false), (null), (true), (true), (false), (false), (true) , (true), (false), (null), (true), (true), (false), (false), (true) , (true), (true), (false), (false) , (false), (false), (true) , (true), (false), (null), (true), (true), (false), (false), (true) , (true), (true), (false), (false), (null), (true), (true), (false), (false), (true) , (true), (true) ,(false), (true) , (true), (false), (null), (true), (true), (false), (false), (true) , (true), (true) ;

@sfc-gh-nthimmegowda
Copy link
Contributor Author

@sunchao @nevi-me @pitrou @wesm . Sorry for the repeat. Could any of you please review?

This was raised by @kou in
apache/arrow#14147 for Parquet-cpp as part of review.

@sfc-gh-nthimmegowda sfc-gh-nthimmegowda changed the title Update rle_boolean_encoding file with more rows PARQUET-2187 Update rle_boolean_encoding file with more rows Sep 22, 2022
@pitrou
Copy link
Member

pitrou commented Sep 27, 2022

@achille-roussel could you take a look at this file?

@achille-roussel
Copy link

The file seems compatible with https://github.com/segmentio/parquet-go 👍

achille-roussel pushed a commit to segmentio/parquet-go that referenced this pull request Sep 28, 2022
@pitrou pitrou changed the title PARQUET-2187 Update rle_boolean_encoding file with more rows PARQUET-2187: Update rle_boolean_encoding file with more rows Sep 29, 2022
@pitrou
Copy link
Member

pitrou commented Sep 29, 2022

Thank you @sfc-gh-nthimmegowda @achille-roussel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants