This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 223
Added support to write nested parquet #1007
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## main #1007 +/- ##
==========================================
+ Coverage 71.42% 71.65% +0.23%
==========================================
Files 356 359 +3
Lines 19784 20037 +253
==========================================
+ Hits 14131 14358 +227
- Misses 5653 5679 +26
Continue to review full report at Codecov.
|
jorgecarleitao
force-pushed
the
write_nest
branch
from
May 26, 2022 04:28
e773c10
to
cddbbad
Compare
jorgecarleitao
force-pushed
the
write_nest
branch
from
May 26, 2022 11:51
cddbbad
to
1014a52
Compare
jorgecarleitao
changed the title
[WIP] Write nested parquet
Added support to write nested parquet
May 26, 2022
jorgecarleitao
force-pushed
the
write_nest
branch
3 times, most recently
from
May 26, 2022 17:30
dcfb9d8
to
22a1c84
Compare
jorgecarleitao
force-pushed
the
write_nest
branch
from
May 26, 2022 17:55
22a1c84
to
339668e
Compare
This is now ready - more tests, generalized to arbitrary nesting, and tests demo that we can write |
Awesome work @jorgecarleitao! |
Many thanks for the quick fix @jorgecarleitao. Fantastic work. |
Tested a different, two level nested Parquet against this branch. Unfortunately, the Parquet reading fails. Submitted a separate issue: #1014. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support to write
StructArray
(and likely arbitrary nested, but more testing is required).ListArray
does not preserve all values #1008I expect the operation to be (per array)
O(N * D * C)
(worse case) where:D
is the max depthN
the number of items.C
is the number of leaf parquet columnsThe main idea in this PR: given a (potentially nested) parquet field (
ParquetType
) and anArray
,Vec<Nested>
containing the validities and lengths of each nest level.Vec<Nested>
to iterators of rep and def levels.