Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate func Approx_median not work with Parquet format #2221

Closed
Ted-Jiang opened this issue Apr 13, 2022 · 2 comments · Fixed by #2262
Closed

Aggregate func Approx_median not work with Parquet format #2221

Ted-Jiang opened this issue Apr 13, 2022 · 2 comments · Fixed by #2262
Labels
bug Something isn't working

Comments

@Ted-Jiang
Copy link
Member

Describe the bug

    Finished dev [unoptimized + debuginfo] target(s) in 0.58s
     Running `target/debug/datafusion-cli`
DataFusion CLI v7.0.0
❯ create external table test STORED AS PARQUET LOCATION '/Users/yangjiang/CLionProjects/github/arrow-datafusion/parquet-testing/data/alltypes_plain.parquet';
0 rows in set. Query took 0.004 seconds.
❯ select approx_median(tinyint_col) from test;
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /Users/yangjiang/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-11.1.0/src/datatypes/schema.rs:193:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@jychen7
Copy link
Contributor

jychen7 commented Apr 15, 2022

more context, it only fails for approx_median, and approx_percentile_cont works fine

❯ select approx_percentile_cont(tinyint_col, 0.5) from test;
+-----------------------------------------------------+
| APPROXPERCENTILECONT(test.tinyint_col,Float64(0.5)) |
+-----------------------------------------------------+
| 0                                                   |
+-----------------------------------------------------+

https://github.com/apache/arrow-datafusion/blob/9f2ed423dc63f9f5d0a5e586925d2c31e3b9f5b8/datafusion/core/src/optimizer/to_approx_perc.rs#L97-L100


more interestingly, I add following test case to datafusion/core/tests/sql/aggregates.rs locally and it pass without panic

#[tokio::test]
async fn parquet_query_median_1() -> Result<()> {
    let ctx = SessionContext::new();
    register_alltypes_parquet(&ctx).await;
    let sql = "SELECT approx_median(tinyint_col) FROM alltypes_plain";
    let actual = execute(&ctx, sql).await;
    let expected = vec![vec!["0"]];
    assert_float_eq(&expected, &actual);
    Ok(())
}

@andygrove
Copy link
Member

I just ran into this and in my case it was caused by optimizing the query twice (which really should be safe buf apparently is not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants