Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parquet::arrow module #1827

Merged
merged 3 commits into from
Jun 11, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jun 9, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

As the reader logic in particular has gotten more sophisticated, the module has gotten rather large. This moves some files around to make the structure easier to understand

What changes are included in this PR?

  • Creates a new buffer module where the arrow compat logic can live
  • Renames files to module_name/mod.rs instead of having module_name.rs and module_name/

Are there any user-facing changes?

No


#[cfg(feature = "async")]
pub mod async_reader;

experimental_mod!(converter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is slated for eventual removal as part of #1661

It is marked experimental so hiding it from the public API is not a breaking change

Comment on lines 21 to 23
pub mod offset_buffer;
pub mod dictionary_buffer;
pub mod converter;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to eventually upstream many of these into arrow-rs proper

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 9, 2022
@tustvold tustvold marked this pull request as draft June 9, 2022 14:02
@codecov-commenter
Copy link

Codecov Report

Merging #1827 (a7b76fd) into master (8f1fd12) will decrease coverage by 0.00%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master    #1827      +/-   ##
==========================================
- Coverage   83.45%   83.45%   -0.01%     
==========================================
  Files         200      200              
  Lines       56697    56696       -1     
==========================================
- Hits        47315    47313       -2     
- Misses       9382     9383       +1     
Impacted Files Coverage Δ
parquet/src/arrow/array_reader/builder.rs 93.50% <ø> (ø)
parquet/src/arrow/array_reader/byte_array.rs 85.71% <ø> (ø)
parquet/src/arrow/arrow_reader.rs 96.87% <ø> (ø)
parquet/src/arrow/arrow_writer/levels.rs 96.92% <ø> (ø)
parquet/src/arrow/arrow_writer/mod.rs 97.53% <ø> (ø)
parquet/src/arrow/buffer/bit_util.rs 100.00% <ø> (ø)
parquet/src/arrow/buffer/converter.rs 83.33% <ø> (ø)
parquet/src/arrow/buffer/dictionary_buffer.rs 94.44% <ø> (ø)
parquet/src/arrow/buffer/offset_buffer.rs 94.79% <ø> (ø)
parquet/src/arrow/mod.rs 44.44% <ø> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f1fd12...a7b76fd. Read the comment docs.

@@ -355,27 +355,6 @@ fn create_string_byte_array_dictionary_reader(
.unwrap()
}

fn create_complex_object_byte_array_dictionary_reader(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComplexObjectArrayReader is no longer used and is slated for removal (#1661 )

@tustvold tustvold marked this pull request as ready for review June 9, 2022 18:16
Copy link
Member

@Ted-Jiang Ted-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the reader logic in particular has gotten more sophisticated

Great !👍 I took me a lot of time to figure out how this work 😂
Is there any plan for simplify the reader API ?

@tustvold
Copy link
Contributor Author

Is there any plan for simplify the reader API

Yes, the high level ticket is #1163. I'm hoping to pick it up soon, but large amounts of this API are public so needs some careful thought

@Ted-Jiang
Copy link
Member

Yes, the high level ticket is #1163. I'm hoping to pick it up soon, but large amounts of this API are public so needs some careful thought

Great! 👍 After implement send row_selection in arrow_reader::get_record_reader_by_columns
I would glad to help! 😊

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that you have broken down the modules makes sense. Yeah the files had gotten large over time :)

@tustvold tustvold merged commit 029203e into apache:master Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants