Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coalesce function returns incorrect result #167

Closed
viirya opened this issue Mar 5, 2024 · 0 comments · Fixed by #168
Closed

coalesce function returns incorrect result #167

viirya opened this issue Mar 5, 2024 · 0 comments · Fixed by #168
Assignees
Labels
bug Something isn't working

Comments

@viirya
Copy link
Member

viirya commented Mar 5, 2024

Describe the bug

When I work on SortMergeJoin, there are some TPCDS query failures happened with errors like:

- q38 *** FAILED ***
  java.lang.Exception: Expected "struct<[count(1):bigint]>", but got "struct<[]>" Schema did not match
SELECT count(*)
FROM (
       SELECT DISTINCT
         c_last_name,
         c_first_name,
         d_date
       FROM store_sales, date_dim, customer
       WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk
         AND store_sales.ss_customer_sk = customer.c_customer_sk
         AND d_month_seq BETWEEN 1200 AND 1200 + 11
       INTERSECT
       SELECT DISTINCT
         c_last_name,
         c_first_name,
         d_date
       FROM catalog_sales, date_dim, customer
       WHERE catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
         AND catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
         AND d_month_seq BETWEEN 1200 AND 1200 + 11
       INTERSECT
       SELECT DISTINCT
         c_last_name,
         c_first_name,
         d_date
       FROM web_sales, date_dim, customer
       WHERE web_sales.ws_sold_date_sk = date_dim.d_date_sk
         AND web_sales.ws_bill_customer_sk = customer.c_customer_sk
         AND d_month_seq BETWEEN 1200 AND 1200 + 11
     ) hot_cust
LIMIT 100
...
org.apache.comet.CometNativeException
Arrow error: Invalid argument error: RowConverter column schema mismatch, expected Utf8 got Date32

It is because DataFusion coalesce function returns a Date32 array from Date32 inputs (this is correct) but its return type is Utf8. The details are in apache/datafusion#9458. The fix is at apache/datafusion#9459.

Steps to reproduce

test("coalesce should return correct datatype") {
     Seq(true, false).foreach { dictionaryEnabled =>
       withTempDir { dir =>
         val path = new Path(dir.toURI.toString, "test.parquet")
         makeParquetFileAllTypes(path, dictionaryEnabled = dictionaryEnabled, 10000)
         withParquetTable(path.toString, "tbl") {
           checkSparkAnswerAndOperator(
             "SELECT coalesce(cast(_18 as date), cast(_19 as date), _20) FROM tbl")
         }
       }
     }
   }

Expected behavior

No response

Additional context

No response

@viirya viirya added the bug Something isn't working label Mar 5, 2024
@viirya viirya self-assigned this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant