Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue: DynamicFilter operator #3419

Closed
9 of 13 tasks
jon-chuang opened this issue Jun 23, 2022 · 4 comments
Closed
9 of 13 tasks

Tracking Issue: DynamicFilter operator #3419

jon-chuang opened this issue Jun 23, 2022 · 4 comments
Assignees
Labels
component/frontend Protocol, parsing, binder. component/streaming Stream processing related issue. difficulty/medium Issues that need some knowledge of the whole system type/feature

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Jun 23, 2022

Design Doc: https://singularity-data.quip.com/AE06Ao1kAIaZ/RFC-Dynamic-Filter-A-New-Streaming-Operator

Logical:

  • StreamDynamicFilter (feat(frontend): StreamDynamicFilter #3515)
    • Not sure if we need BatchDynamicFilter?
    • Hash distribution on left_col
    • Optimize scalar subqueries into LogicalDynamicFilter. Enable tpch q11, q22 planner tests
  • proto for StreamDynamicFilter

Executor:


Questions:

  1. Do we support dynamic filter for more than numeric data types? (can support I64, F64, NaiveDate/Time, String?, Bytes?).
  2. Will the frontend ensure that the left and right types are the same datatype? (we could do a cast on the RHS scalar subquery result)?
  3. Can we support only ==, <> and not Cmp operators for some datatypes? I guess the determination for this can be done in the frontend.
  4. Shouldn't we use the anti/semi join for ==, <>? (we don't need broadcast for these)

References:

  1. Range joins in DuckDB
@jon-chuang jon-chuang added type/feature component/streaming Stream processing related issue. component/frontend Protocol, parsing, binder. difficulty/medium Issues that need some knowledge of the whole system labels Jun 23, 2022
@st1page
Copy link
Contributor

st1page commented Jun 23, 2022

there is some discussion about the detailed behavior for scalar subqueries in #2279. and some question still exists for the DynamicFilter

  1. what should we do if the DynamicFilter gets more than one row? should we throw an error and stop all the streaming jobs? or do not generate the DynamicFilter if the side "could" get more than one row?
  2. can we introduce the "LogicalCardinality", "max1row", "min1row", "exact1row" or other similar properties, and then consider the DynamicFilter rewriting as a general optimization for LogicalJoin?

@jon-chuang
Copy link
Contributor Author

jon-chuang commented Jun 23, 2022

what should we do if the DynamicFilter gets more than one row? should we throw an error and stop all the streaming jobs? or do not generate the DynamicFilter if the side "could" get more than one row?

This is a good question. Actually, SQL does not restrict a non-equijoin join to only be evaluated on a single row. So as a matter of fact, scalar subqueries need to be explicitly identified as those which must return exactly one row (i.e. contains a simple agg). Using the exact1row and max1row properties can indeed facilitate this query rewriting.

However, perhaps as a first approximation, we could simply pattern match on simpleagg with a single column.

@st1page
Copy link
Contributor

st1page commented Jun 24, 2022

Will the frontend ensure that the left and right types are the same datatype? (we could do a cast on the RHS scalar subquery result)?

yes.

Shouldn't we use the anti/semi join for ==, <>? (we don't need broadcast for these)

I think so, and the DynamicFilter can focus on the comparison operators: <, <=, >, >=

@fuyufjh
Copy link
Member

fuyufjh commented Feb 6, 2023

Just brought it back to the project to keep it under tracking. Ping @jon-chuang Any plan for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/frontend Protocol, parsing, binder. component/streaming Stream processing related issue. difficulty/medium Issues that need some knowledge of the whole system type/feature
Projects
None yet
Development

No branches or pull requests

3 participants