-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skip_page missing OffsetIndex Fallback in SerializedPageReader #2460
Conversation
@@ -480,13 +480,17 @@ pub(crate) fn decode_page( | |||
Ok(result) | |||
} | |||
|
|||
#[allow(clippy::large_enum_variant)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this clippy get
error: large size difference between variants
--> parquet/src/file/serialized_reader.rs:484:5
|
484 | / Values {
485 | | /// The current byte offset in the reader
486 | | offset: usize,
487 | |
... |
492 | | next_page_header: Option<PageHeader>,
493 | | },
| |_____^ this variant is 336 bytes
|
= note: `-D clippy::large-enum-variant` implied by `-D warnings`
note: and the second-largest variant is 72 bytes:
--> parquet/src/file/serialized_reader.rs:494:5
but i think the 336 is wrong because in PageHeader
, most of the large arg are option
, clippy can not know when running only one is some
, so i think remove this check here is reasonable (or wrap 'Box' i think this will downgrade)🤔
pub data_page_header: Option<DataPageHeader>,
pub index_page_header: Option<IndexPageHeader>,
pub dictionary_page_header: Option<DictionaryPageHeader>,
pub data_page_header_v2: Option<DataPageHeaderV2>,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold PTAL😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An option takes up the same space regardless of it is set, they're like C++ union types in that sense. I think you should box the page header probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right! tested in ut
#[test]
fn test() {
enum Test {
Large {
page: Option<PageHeader>,
}
}
let test1 = Test::Large { page: None };
println!("{}", std::mem::size_of_val(&test1));
()
}
got 320 😂 thanks for your info!
I should try before ask😔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just some minor nits
if *remaining_bytes == 0 { | ||
return Ok(None); | ||
} | ||
return if let Some(header) = next_page_header.take() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this probably shouldn't take but just be as_ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only way set it none. Is after skip_next_page.
I'm going to get this in as the fix for parquet_derive is in #2494 and I want this to make the release 😄 |
Benchmark runs are scheduled for baseline = 9f77e4e and contender = 12cc067. 12cc067 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2459
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?