[Parquet] Add test to verify heap size calculation #8925

adamreeve · 2025-11-26T02:49:30Z

Which issue does this PR close?

Closes [Parquet] Add metadata heap size test that tracks allocations #8924.

What changes are included in this PR?

Adds a new test program that overrides the global allocator in order to track allocations, and compare the measured allocation size with the computed heap size of the Parquet metadata.

Are these changes tested?

Yes, this only adds a test

Are there any user-facing changes?

No

adamreeve · 2025-12-04T03:21:05Z

parquet/tests/metadata_memory.rs

+        // Calculated heap size doesn't match exactly, possibly due to extra overhead not accounted
+        // for in the HeapSize implementation for parquet::data_type::ByteArray.


I haven't managed to track down exactly where the difference comes from so this is a bit of a guess. The confusing part is the file with encryption has stats for a ByteArray column too so I'm not sure why it does give the exact same heap size. Maybe there's something different about how the stats from encrypted metadata work or maybe the difference is from somewhere else.

But the computed size is still very close to the actual heap allocation size so I think this is good enough.

alamb · 2025-12-10T17:43:38Z

Interestingly, this PR fails on my mac (OSX 26.1)

Darwin Andrews-MacBook-Pro-3.local 25.1.0 Darwin Kernel Version 25.1.0: Mon Oct 20 19:30:01 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6031 arm64

---- test_metadata_heap_memory stdout ----

thread 'test_metadata_heap_memory' (15130761) panicked at parquet/tests/metadata_memory.rs:138:9:
assertion `left == right` failed: Calculated heap size 10534 doesn't match the allocated size 10526 for file /Users/andrewlamb/Software/arrow-rs/arrow/../parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted
  left: 10534
 right: 10526
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_metadata_heap_memory

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

error: test failed, to rerun pass `-p parquet --test metadata_memory`

adamreeve · 2025-12-10T20:48:31Z

Huh, interesting. I think trying to get this to always match exactly and track down all the causes of differences is not worth the effort, and I should just add a small tolerance for all the test cases.

alamb · 2025-12-10T17:45:36Z

parquet/tests/metadata_memory.rs

+use std::sync::Arc;
+use std::sync::atomic::{AtomicUsize, Ordering};
+
+pub struct TrackingAllocator {


this is a very cool idea

github-actions bot added the parquet Changes to the parquet crate label Nov 26, 2025

adamreeve mentioned this pull request Nov 26, 2025

Fix Parquet metadata heap size accounting #8898

Closed

adamreeve commented Dec 4, 2025

View reviewed changes

adamreeve added 4 commits December 4, 2025 16:27

Add Parquet test to verify heap size calculation

4511d76

Add license header

99490f5

Add test with an encrypted file

e4662ee

Test more files and add tolerance

aae2c0d

adamreeve force-pushed the heap-tracking-test branch from cacf740 to aae2c0d Compare December 4, 2025 03:27

adamreeve marked this pull request as ready for review December 4, 2025 03:28

adamreeve requested a review from rok December 4, 2025 03:31

Merge branch 'main' into heap-tracking-test

2b8dd95

alamb reviewed Dec 10, 2025

View reviewed changes

Apply a uniform relative tolerance to the metadata size

c2af742

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Parquet] Add test to verify heap size calculation #8925

[Parquet] Add test to verify heap size calculation #8925

Uh oh!

adamreeve commented Nov 26, 2025

Uh oh!

adamreeve Dec 4, 2025

Uh oh!

alamb commented Dec 10, 2025 •

edited

Loading

Uh oh!

adamreeve commented Dec 10, 2025

Uh oh!

alamb Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Calculated heap size doesn't match exactly, possibly due to extra overhead not accounted
		// for in the HeapSize implementation for parquet::data_type::ByteArray.

[Parquet] Add test to verify heap size calculation #8925

Are you sure you want to change the base?

[Parquet] Add test to verify heap size calculation #8925

Uh oh!

Conversation

adamreeve commented Nov 26, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adamreeve Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamreeve commented Dec 10, 2025

Uh oh!

alamb Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alamb commented Dec 10, 2025 •

edited

Loading