Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: fail to compile and run some SQL statements with OpenMLDB-batch when enable batch_window_parallelization #437

Closed
jingchen2222 opened this issue Sep 22, 2021 · 1 comment · Fixed by #453
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jingchen2222
Copy link
Collaborator

jingchen2222 commented Sep 22, 2021

Issue tracker is ONLY used for reporting bugs. New features should be discussed on our discussion

We fail to compile and run SQL with OpenMLDB-batch when enable batch_window_parallelization

-- t1: ["col0 string", "col1 int", "col2 int"]
-- t2: ["str0 string", "str1 string", "col0 int", "col1 int"]
SELECT sum(t1.col1) over w1 as sum_t1_col1, t2.str1 as t2_str1 FROM t1
               last join t2 order by t2.col1
               on t1.col1 = t2.col1 and t1.col2 = t2.col0
               WINDOW w1 AS (PARTITION BY t1.col2 ORDER BY t1.col1
               ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW);

Expected Behavior

Compile and run successly for SQL

Current Behavior

Fail to compile SQL.

Fail to find column id #48 in current schema context
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/sql_compiler.cc:260)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/sql_compiler.cc:166)
    (Caused by) Fail to generate physical plan batch mode
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/transform.cc:1489)
    (Caused by) Fail to generate functions for physical plan: LIMIT(limit=10)
  SIMPLE_PROJECT(sources=(sum_t1_col1, t2_str1))
    JOIN(type=kJoinTypeConcat)
      SIMPLE_PROJECT(sources=(t2.str1 -> t2_str1))
        JOIN(type=LastJoin, right_sort=(t2.col1 ASC), condition=, left_keys=(t1.col1,t1.col2), right_keys=(t2.col1,t2.col0), index_keys=)
          DATA_PROVIDER(table=t1)
          DATA_PROVIDER(table=t2)
      PROJECT(type=WindowAggregation)
        +-WINDOW(partition_keys=(t1.col2), orders=(t1.col1 ASC), range=(t1.col1, -3, 0))
        JOIN(type=LastJoin, right_sort=(t2.col1 ASC), condition=, left_keys=(t1.col1,t1.col2), right_keys=(t2.col1,t2.col0), index_keys=)
          DATA_PROVIDER(table=t1)
          DATA_PROVIDER(table=t2)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/transform.cc:197)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/transform.cc:197)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/transform.cc:197)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/transform.cc:290)
    (Caused by) Instantiate 0th native function "__internal_sql_codegen_12" failed at node:
SIMPLE_PROJECT(sources=(t2.str1 -> t2_str1))
  JOIN(type=LastJoin, right_sort=(t2.col1 ASC), condition=, left_keys=(t1.col1,t1.col2), right_keys=(t2.col1,t2.col0), index_keys=)
    DATA_PROVIDER(table=t1)
    DATA_PROVIDER(table=t2)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/codegen/fn_let_ir_builder.cc:130)
    (Caused by) Build expr failed at 0:
+-expr[get field]
  +-input:
    +-expr[id]
      +-var: %44(row)
  +-column_id: 48
  +-column_name: str1
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/codegen/fn_let_ir_builder.cc:215)
    (Caused by) Fail to codegen project expression: #48:str1
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/codegen/expr_ir_builder.cc:162)
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/codegen/expr_ir_builder.cc:835)
    (Caused by) Fail to resolve column #48:str1 from row
    (At /Users/chenjing/work/chenjing/OpenMLDB/hybridse/src/vm/schemas_context.cc:275)
    (Caused by) Fail to find column id #48 in current schema context(#1: 0, 0),(#2: 0, 1),(#3: 0, 2),(#51: 1, 0),(#52: 1, 1),(#53: 1, 2),(#54: 1, 3),
column_name_map_:col0, col1, col2, str0, str1, 

Possible Solution

I think the problem is raised by the incorrect column resolve and inappropriate schema update.

In fact, things change after we applying for optimization passes on the physical plan. So there must be something wrong with the optimization passes.
We also observe that the SIMPLE_PROJECT(sources=(t2.str1 -> t2_str1)) and PROJECT(type=WindowAggregation) share the same JOIN node. If we process passes in a post-DFS order, we might update JOIN node firstly and update SimpleProject later (with on new JOIN schema), and JOIN again and update WindowAggregation Project finally.
So, based on the observation above, I strongly concern that we might apply optimization on JOIN twice so that SIMPLE_PROJCT can't be up-to-date. That's why we fail to resolve column t2_str1 on SimpleProject.

Solution

  1. Keep track of if a PhysicalNode has been visited or not.
  2. Stop process optimization if a node has been visited before.

Steps to Reproduce

Context (Environment)

Detailed Description

Possible Implementation

@jingchen2222 jingchen2222 added the bug Something isn't working label Sep 22, 2021
@jingchen2222 jingchen2222 added this to the v0.4 milestone Sep 22, 2021
@jingchen2222 jingchen2222 self-assigned this Sep 22, 2021
@jingchen2222 jingchen2222 linked a pull request Sep 23, 2021 that will close this issue
@jingchen2222
Copy link
Collaborator Author

jingchen2222 commented Sep 24, 2021

It has been fixed by #453 feb2d55 @kanekanekane .

@jingchen2222 jingchen2222 changed the title Fail query window+lastjoin when turn on enable_batch_window_parallelization Bug: fail to compile and run some SQL statements with OpenMLDB-batch when enable batch_window_parallelization Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant