Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer regressions in unwrap_cast_in_comparison #3690

Closed
Tracked by #3671
andygrove opened this issue Oct 3, 2022 · 3 comments · Fixed by #3699
Closed
Tracked by #3671

Optimizer regressions in unwrap_cast_in_comparison #3690

andygrove opened this issue Oct 3, 2022 · 3 comments · Fixed by #3699
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

andygrove commented Oct 3, 2022

Describe the bug
I am testing the latest code from master with some Dask SQL queries and seeing some new regressions that were caused by #3662.

To Reproduce

Error casting to unsigned types

#[test]
fn unsigned_target_type() -> Result<()> {
    let sql = "SELECT * FROM test WHERE col_uint32 > 0";
    let plan = test_sql(sql)?;
    let expected = "TBD";
    assert_eq!(expected, format!("{:?}", plan));
    Ok(())
}
Error: Internal("Optimizer rule 'unwrap_cast_in_comparison' failed due to unexpected error: Internal error: Error target data type UInt32. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker")

CASE WHEN regression

#[test]
fn case_when() -> Result<()> {
    let sql = "SELECT CASE WHEN col_int32 > 0 THEN 1 ELSE 0 END FROM test";
    let plan = test_sql(sql)?;
    let expected = "TBD";
    assert_eq!(expected, format!("{:?}", plan));
    Ok(())
}
Error: Internal("Optimizer rule 'unwrap_cast_in_comparison' failed due to unexpected error: Schema error: No field named 'test.col_int32'. Valid fields are 'CASE WHEN test.col_int32 > Int64(0) THEN Int64(1) ELSE Int64(0) END'.")

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@liukun4515
Copy link
Contributor

@andygrove
Does the failed about the case when caused by this issue?

I will check them.

@liukun4515
Copy link
Contributor

For the case when case

#[test]
fn case_when() -> Result<()> {
    let sql = "SELECT CASE WHEN col_int32 > 0 THEN 1 ELSE 0 END FROM test";
    let plan = test_sql(sql)?;
    let expected = "Projection: CASE WHEN CAST(#test.col_int32 AS Int64) > Int64(0) THEN Int64(1) ELSE Int64(0) END\
    \n  TableScan: test projection=[col_int32]";
    assert_eq!(expected, format!("{:?}", plan));
    Ok(())
}

In the rule of unwrap_cast_in_comparison, we can't find the filed type of 'test.col_int32' for the projection plan.

Because the schema for the projection is:

DFSchema { fields: [DFField { qualifier: None, field: Field { name: "CASE WHEN test.col_int32 > Int64(0) THEN Int64(1) ELSE Int64(0) END", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }

and the schema of table scan is:

DFSchema { fields: [DFField { qualifier: Some("test"), field: Field { name: "col_int32", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("test"), field: Field { name: "col_uint32", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("test"), field: Field { name: "col_utf8", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("test"), field: Field { name: "col_date32", data_type: Date32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: Some("test"), field: Field { name: "col_date64", data_type: Date64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }

I am confused about the difference between these two plan.

Why the schema for the projection plan is the output expr for the plan?

@alamb @andygrove

I think the schema should only contain the fields from the table, but I find the output expr for the projection plan.

@liukun4515
Copy link
Contributor

I think the schema for the logical plan is very important, and we can extract and generate any information from the basic schema.

For example:

table:
c1 int32,
c2 int64,
c3 utf8

sql:

select c1,c2+1 from table where c1>12

The plan should be

projection:  c1,c2+1; the schema is [c1,c2]
    filter: c1>12; the schema is [c1,c2]
       table scan: [c1,c2] with the projection; the schema is [c1,c2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants