-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] Add test to verify heap size calculation #8925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // Calculated heap size doesn't match exactly, possibly due to extra overhead not accounted | ||
| // for in the HeapSize implementation for parquet::data_type::ByteArray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't managed to track down exactly where the difference comes from so this is a bit of a guess. The confusing part is the file with encryption has stats for a ByteArray column too so I'm not sure why it does give the exact same heap size. Maybe there's something different about how the stats from encrypted metadata work or maybe the difference is from somewhere else.
But the computed size is still very close to the actual heap allocation size so I think this is good enough.
cacf740 to
aae2c0d
Compare
|
Interestingly, this PR fails on my mac (OSX 26.1) Darwin Andrews-MacBook-Pro-3.local 25.1.0 Darwin Kernel Version 25.1.0: Mon Oct 20 19:30:01 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6031 arm64---- test_metadata_heap_memory stdout ----
thread 'test_metadata_heap_memory' (15130761) panicked at parquet/tests/metadata_memory.rs:138:9:
assertion `left == right` failed: Calculated heap size 10534 doesn't match the allocated size 10526 for file /Users/andrewlamb/Software/arrow-rs/arrow/../parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted
left: 10534
right: 10526
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test_metadata_heap_memory
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
error: test failed, to rerun pass `-p parquet --test metadata_memory` |
|
Huh, interesting. I think trying to get this to always match exactly and track down all the causes of differences is not worth the effort, and I should just add a small tolerance for all the test cases. |
| use std::sync::Arc; | ||
| use std::sync::atomic::{AtomicUsize, Ordering}; | ||
|
|
||
| pub struct TrackingAllocator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very cool idea
Which issue does this PR close?
What changes are included in this PR?
Adds a new test program that overrides the global allocator in order to track allocations, and compare the measured allocation size with the computed heap size of the Parquet metadata.
Are these changes tested?
Yes, this only adds a test
Are there any user-facing changes?
No