Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support prepare statement #4490

Merged
merged 20 commits into from
Dec 8, 2022
Merged

Conversation

NGA-TRAN
Copy link
Contributor

@NGA-TRAN NGA-TRAN commented Dec 2, 2022

Which issue does this PR close?

Closes #4426

Re: #4539

  • This PR generates a LogicalPlan for PREPARE SELECT ...
  • EXECUTE statement will be done in a follow-on PR

Which prepare statements supported in this PR:

  • Params in WHERE clause including both column > $1 and column in [$1, $2, ...]
  • Params in HAVING clause including both AGG(column) > $1 and AGG(column) in [$1, $2, ...]
  • Params in VALUES table in SELECT. Eg: SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter);

Those are all I can think of for SELECT. Let me know if you think of other ones and I will add them in the tests. This PR does NOT support params in UPDATE and INSERT

What changes are included in this PR?

  • Add LogicalPlan::Prepare
  • Support Expr::Placeholder as it is a part of Prepare stmt
  • Generate LogicalPlan::Prepare for PREPARE stmt
  • Generate corresponding protobuf messages for LogicalPlan::Prepare
  • Map given data types of the PREPARE stmt to their corresponding Placeholders in the stmt

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, they can use PREPARE stmt now

@NGA-TRAN NGA-TRAN marked this pull request as draft December 2, 2022 22:03
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner labels Dec 2, 2022
@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Dec 2, 2022

@alamb, @thinkharderdev @metesynnada I will add more tests but this PR has implemented the PREPARE stmt to create a logical plan from a sql string (see the test)

Can you have a quick look to see if it is what you expect while I add more tests?

I will implement the EXECUTE stmt either in the following PR (preferred) or in this PR if you think better.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking exactly along the right track @NGA-TRAN -- thank you !

@NGA-TRAN NGA-TRAN marked this pull request as ready for review December 6, 2022 20:48
@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Dec 6, 2022

@alamb : the PR is ready for review. The supported queries are described in the description

@alamb
Copy link
Contributor

alamb commented Dec 6, 2022

I plan to review this carefully tomorrow morning

@metesynnada
Copy link
Contributor

LGTM. As far as I can see, we do not store the plans, just use them immediately after we prepare. I hope I am not missing anything.

@alamb alamb mentioned this pull request Dec 7, 2022
5 tasks
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NGA-TRAN this is looking quite good

My biggest concern about the code in this PR is the introduction of the _with_params variants. I worry that both the variants means that some codepaths will support parameters and some will not and we'll be chasing down gaps / inconsistencies.

What do you think?

I tried making running a PREPARE statement in datafusion-cli and it didn't seem to work:

cd datafusion-cli
cargo run 

And I got an error

❯ PREPARE my_plan(INT) AS SELECT 1 + $1;
Internal("Placehoder $1 does not exist in the parameter list: []")

I would recommend adding some basic end to end tests (perhaps make a prepared.slt file, following the model of https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/aggregate.slt, which can be run like cargo run -p datafusion --test sqllogictest)

@mingmwang
Copy link
Contributor

I will take a look at this PR carefully tomorrow.
Generally, Prepared Statement and Plan Cache support is quite complex in RDBMS.

@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Dec 7, 2022

I am working on adding more tests which I think the last piece of this PR

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 7, 2022
@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Dec 7, 2022

@alamb : I have addressed all your comments. Can you re-review it?

Notes: If you run queries in DataFusion-cli, you will get error Internal("Unsupported logical plan: Prepare"). I think this means we are successfully creating a LogicalPlan::Prepare (the purpose of this PR) but since we have not stored it anywhere and have not implemented physical plan for it, we get this error naturally.

Example:

cd datafusion-cli 
cargo run
DataFusion CLI v15.0.0
❯ PREPARE my_plan(INT) AS SELECT $1
;
Internal("Unsupported logical plan: Prepare")
❯ 

# PREPARE my_plan(STRING, STRING) AS SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter);

# And then we may want to add test_prepare_statement* here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do not store the prepare logical plan anywhere and still throw error when we try to generate physical plan for it, this .stl tests are not that useful yet but they will be valuable after those are done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I can add a subtask to implement it under #4539 -- then in DataFusion we normally leave a link to the ticket as a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All corresponding prepare tests are added here. They will be available for testing (mean many statement error will become statement ok) after we store the prepare logical plan and not throw error

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thanks @NGA-TRAN

I plan to wait for @mingmwang 's review prior to merging this PR but I think it could go in now

# PREPARE my_plan(STRING, STRING) AS SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter);

# And then we may want to add test_prepare_statement* here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I can add a subtask to implement it under #4539 -- then in DataFusion we normally leave a link to the ticket as a comment

/// (e.g. `$foo` or `$1`)
Placeholder {
/// The identifier of the parameter (e.g, $1 or $foo)
id: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -16,7 +16,7 @@
// under the License.

//! SQL Query Planner (produces logical plan from SQL AST)

use log::debug;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I have wanted some debug logging in the planner as well

prepare_stmt_quick_test(sql, expected_plan, expected_dt);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 very nice tests


let expected_plan = "Prepare: \"my_plan\" [Int32, Float64] \
\n Projection: Int64(1) + $1 + $2\n EmptyRelation";
let expected_dt = "[Int32, Float64]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +1380 to +1384
pub data_types: Vec<DataType>,
/// The logical plan of the statements
pub input: Arc<LogicalPlan>,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the data types Vec size is the same with the place holders in the input plan, but is there any check for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data types of the placeholders are from the data types of the Vec here so they match. We do check if the Vec contains enough params, too. However, there are flexibility:

  1. The length of the Vec can be longer than the number of the placeholders which will be fine and we have tests for this.
  2. The data type of the Vec can be anything and we will use them for the placeholders, we do not check if the data types are compatible with the variables in the expression because: (i) we allow data type casting, and (2) when we reach the placeholders, we no longer have the context which variable/column/expression the place holder is used for. We do not want to add more context to backtrack which will cost compile time as well as complicated implementation.

However, I am working on #4550 that convert Prepare Logical Plan to a logical plan with all placeholders replaced with actual values. There, I will throw error if the data types provided do not work. We follow the same behavior of Postgres

Comment on lines +586 to +589
name: name.clone(),
data_types: data_types.clone(),
input: Arc::new(inputs[0].clone()),
})),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible and allowed here that the method passed in a totally different input plan ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code we implement, the answer is no unless there are bugs

Comment on lines +1870 to +1872
SQLExpr::Value(Value::Placeholder(param)) => {
Self::create_placeholder_expr(param, param_data_types)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SQLValues with Placeholder, should we expect that the param_data_types is the same data type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get the question but I think my answer for this https://github.com/apache/arrow-datafusion/pull/4490/files#r1043122807 probably answer this question, too

@mingmwang
Copy link
Contributor

@NGA-TRAN
Overall, this PR LGTM, thanks!

@mingmwang
Copy link
Contributor

BTW, do we support this case ?

Plain SQL:

SELECT * FROM tab WHERE col_a is null;

Prepared Stmt:

REPARE my_plan_1(INT) AS 
   SELECT * FROM tab WHERE col_a is $1;

@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Dec 8, 2022

@mingmwang
I have tried and will add this test in my next commit. We will get error. Thanks

REPARE my_plan_1(INT) AS 
   SELECT * FROM tab WHERE col_a is $1;

value: SQL(ParserError("Expected [NOT] NULL or TRUE|FALSE or [NOT] DISTINCT FROM after IS, found: $1"))

Copy link
Contributor Author

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb The PR is ready to master merge. I will resolve conflicts shortly but more conflicts may come if we do not merge soon 😄

@alamb
Copy link
Contributor

alamb commented Dec 8, 2022

I plan to merge this as soon as the CI checks complete

@alamb alamb merged commit 4538912 into apache:master Dec 8, 2022
SSebo pushed a commit to SSebo/arrow-datafusion that referenced this pull request Feb 22, 2023
* feat: support prepare statement

* fix: typo

* chore: address preliminary review comments

* fix: put Placeholder last to have the expression comparison to work as expected

* test: add more tests and starting to pass param_data_types to expression to get data types of the params , , ...

* test: one more test and a bit of refactor while waiting for the CTEs/PlannerContext PR

* feat: use prepare stmt's param data types in the planner context

* chore: cleanup

* refactor: address review comments

* chore: cleanup

* test: more prepare statement tests

* chore: cleanup

* chore: fix typos and add tests into the sqllogicaltests

* docs: add docstring

* chore: update test panic message due to recent change to have clearer message per review comment

* chore: add a test and a doc string per review comments

* fix: output of a test after master merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial support for prepared statement
4 participants