You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
parquet page level skipping (page index pruning) panic's with evolved schemas
To Reproduce
Enable setting like
pub enable_page_index: bool, default = true
And then run test:
cargo test -p datafusion --lib -- evolved_schema_disjoint_schema_filter
Which fails
test physical_plan::file_format::parquet::tests::evolved_schema_disjoint_schema_filter ... FAILED
failures:
---- physical_plan::file_format::parquet::tests::evolved_schema_disjoint_schema_filter stdout ----
thread 'physical_plan::file_format::parquet::tests::evolved_schema_disjoint_schema_filter' panicked at 'index out of bounds: the len is 1 but the index is 1', /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-31.0.0/src/file/metadata.rs:251:10
Expected behavior
Test should pass
Additional context
I found this while working to enable this feature by default #5099
The text was updated successfully, but these errors were encountered:
I am pretty sure the issue is that the page pruning is trying to use a column index into the merged schema to find the relevant statistics, but each file schema can be slightly different. I think the fix will be to look up statistics based on name rather than index
Describe the bug
parquet page level skipping (page index pruning) panic's with evolved schemas
To Reproduce
Enable setting like
And then run test:
cargo test -p datafusion --lib -- evolved_schema_disjoint_schema_filter
Which fails
Expected behavior
Test should pass
Additional context
I found this while working to enable this feature by default
#5099
The text was updated successfully, but these errors were encountered: