Skip to content

Query execution fails with index out of bounds err #2161

@kamilkonior

Description

@kamilkonior

Describe the bug
Simply I get index out of bounds when parquet pruning is enabled.

file: metadata.rs:212:10
struct: RowGroupMetaData,
accessed field: columns
error: thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 1 but the index is 1'

To Reproduce
Create two parquet files with different fields in schema, I put 4 numbers into each file.

file: sample1.parquet
message schema {
    REQUIRED INT32 a;
}

file: sample2.parquet
message schema {
    REQUIRED INT32 b;
}

code:

#[tokio::main]
async fn main() -> Result<()> {
    // create local execution context
    let mut ctx = ExecutionContext::new();

    // Configure listing options
    let file_format = ParquetFormat::default().with_enable_pruning(true);
    let listing_options = ListingOptions {
        file_extension: DEFAULT_PARQUET_EXTENSION.to_owned(),
        format: Arc::new(file_format),
        table_partition_cols: vec![],
        collect_stat: false,
        target_partitions: 1,
    };

    ctx.register_listing_table(
        "FANCY_TABLE",
        "file:///absolute-path/table/",
        listing_options,
        None,
    ).await.unwrap();

    let df = ctx
        .sql("SELECT * FROM FANCY_TABLE where a > 2 or b > 2")
        .await?;

    df.show().await?;

    Ok(())
}

Expected behavior
Query executes without any issues.

When pruning is disabled, everything is fine and I receive such result.

+---+---+
| a | b |
+---+---+
|   | 3 |
|   | 4 |
| 3 |   |
| 4 |   |
+---+---+

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions