Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All-null column get wrong parquet null-counts #306

Closed
crepererum opened this issue May 17, 2021 · 0 comments · Fixed by #307
Closed

All-null column get wrong parquet null-counts #306

crepererum opened this issue May 17, 2021 · 0 comments · Fixed by #307
Labels
bug parquet Changes to the parquet crate

Comments

@crepererum
Copy link
Contributor

Describe the bug
When serializing an all-null arrow array to parquet, the null-count in the stats is always 0.

To Reproduce
Steps to reproduce the behavior:

#[test]
fn statistics_null_counts_only_nulls() {
    // check that null-count statistics for "only NULL"-columns are correct
    let values = Arc::new(UInt64Array::from(vec![
        None,
        None,
    ]));
    let file = one_column_roundtrip("null_counts", values, true);

    // check statistics are valid
    let reader = SerializedFileReader::new(file).unwrap();
    let metadata = reader.metadata();
    assert_eq!(metadata.num_row_groups(), 1);
    let row_group = metadata.row_group(0);
    assert_eq!(row_group.num_columns(), 1);
    let column = row_group.column(0);
    let stats = column.statistics().unwrap();
    assert_eq!(stats.null_count(), 2);  // <<< this fails, null count is 0
}

Expected behavior
For all-null columns the null-count should be the same as the number of rows.

Additional context
Tested on c863a2c44bffa5c092a49e07910d5e9225483193.

I am claiming this issue since I have a fix ready.

@crepererum crepererum added the bug label May 17, 2021
crepererum added a commit to crepererum/arrow-rs that referenced this issue May 17, 2021
@alamb alamb added the parquet Changes to the parquet crate label May 18, 2021
alamb added a commit that referenced this issue Jun 5, 2021
Fixes #306.

Co-authored-by: Marco Neumann <marco@crepererum.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants