-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support prepare statement #4490
Conversation
@alamb, @thinkharderdev @metesynnada I will add more tests but this PR has implemented the Can you have a quick look to see if it is what you expect while I add more tests? I will implement the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking exactly along the right track @NGA-TRAN -- thank you !
…ion to get data types of the params , , ...
@alamb : the PR is ready for review. The supported queries are described in the description |
I plan to review this carefully tomorrow morning |
LGTM. As far as I can see, we do not store the plans, just use them immediately after we prepare. I hope I am not missing anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NGA-TRAN this is looking quite good
My biggest concern about the code in this PR is the introduction of the _with_params
variants. I worry that both the variants means that some codepaths will support parameters and some will not and we'll be chasing down gaps / inconsistencies.
What do you think?
I tried making running a PREPARE statement in datafusion-cli and it didn't seem to work:
cd datafusion-cli
cargo run
And I got an error
❯ PREPARE my_plan(INT) AS SELECT 1 + $1;
Internal("Placehoder $1 does not exist in the parameter list: []")
I would recommend adding some basic end to end tests (perhaps make a prepared.slt
file, following the model of https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/aggregate.slt, which can be run like cargo run -p datafusion --test sqllogictest
)
I will take a look at this PR carefully tomorrow. |
I am working on adding more tests which I think the last piece of this PR |
@alamb : I have addressed all your comments. Can you re-review it? Notes: If you run queries in DataFusion-cli, you will get error Example: cd datafusion-cli
cargo run
DataFusion CLI v15.0.0
❯ PREPARE my_plan(INT) AS SELECT $1
;
Internal("Unsupported logical plan: Prepare")
❯ |
# PREPARE my_plan(STRING, STRING) AS SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter); | ||
|
||
# And then we may want to add test_prepare_statement* here | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do not store the prepare logical plan anywhere and still throw error when we try to generate physical plan for it, this .stl tests are not that useful yet but they will be valuable after those are done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can add a subtask to implement it under #4539 -- then in DataFusion we normally leave a link to the ticket as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All corresponding prepare tests are added here. They will be available for testing (mean many statement error
will become statement ok
) after we store the prepare logical plan and not throw error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thanks @NGA-TRAN
I plan to wait for @mingmwang 's review prior to merging this PR but I think it could go in now
# PREPARE my_plan(STRING, STRING) AS SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter); | ||
|
||
# And then we may want to add test_prepare_statement* here | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can add a subtask to implement it under #4539 -- then in DataFusion we normally leave a link to the ticket as a comment
/// (e.g. `$foo` or `$1`) | ||
Placeholder { | ||
/// The identifier of the parameter (e.g, $1 or $foo) | ||
id: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -16,7 +16,7 @@ | |||
// under the License. | |||
|
|||
//! SQL Query Planner (produces logical plan from SQL AST) | |||
|
|||
use log::debug; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I have wanted some debug logging in the planner as well
prepare_stmt_quick_test(sql, expected_plan, expected_dt); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 very nice tests
|
||
let expected_plan = "Prepare: \"my_plan\" [Int32, Float64] \ | ||
\n Projection: Int64(1) + $1 + $2\n EmptyRelation"; | ||
let expected_dt = "[Int32, Float64]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pub data_types: Vec<DataType>, | ||
/// The logical plan of the statements | ||
pub input: Arc<LogicalPlan>, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the data types Vec size is the same with the place holders in the input plan, but is there any check for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data types of the placeholders are from the data types of the Vec here so they match. We do check if the Vec contains enough params, too. However, there are flexibility:
- The length of the Vec can be longer than the number of the placeholders which will be fine and we have tests for this.
- The data type of the Vec can be anything and we will use them for the placeholders, we do not check if the data types are compatible with the variables in the expression because: (i) we allow data type casting, and (2) when we reach the placeholders, we no longer have the context which variable/column/expression the place holder is used for. We do not want to add more context to backtrack which will cost compile time as well as complicated implementation.
However, I am working on #4550 that convert Prepare Logical Plan to a logical plan with all placeholders replaced with actual values. There, I will throw error if the data types provided do not work. We follow the same behavior of Postgres
name: name.clone(), | ||
data_types: data_types.clone(), | ||
input: Arc::new(inputs[0].clone()), | ||
})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible and allowed here that the method passed in a totally different input plan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the code we implement, the answer is no unless there are bugs
SQLExpr::Value(Value::Placeholder(param)) => { | ||
Self::create_placeholder_expr(param, param_data_types) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SQLValues
with Placeholder
, should we expect that the param_data_types is the same data type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get the question but I think my answer for this https://github.com/apache/arrow-datafusion/pull/4490/files#r1043122807 probably answer this question, too
@NGA-TRAN |
BTW, do we support this case ? Plain SQL:
Prepared Stmt:
|
@mingmwang REPARE my_plan_1(INT) AS
SELECT * FROM tab WHERE col_a is $1;
value: SQL(ParserError("Expected [NOT] NULL or TRUE|FALSE or [NOT] DISTINCT FROM after IS, found: $1")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb The PR is ready to master merge. I will resolve conflicts shortly but more conflicts may come if we do not merge soon 😄
I plan to merge this as soon as the CI checks complete |
* feat: support prepare statement * fix: typo * chore: address preliminary review comments * fix: put Placeholder last to have the expression comparison to work as expected * test: add more tests and starting to pass param_data_types to expression to get data types of the params , , ... * test: one more test and a bit of refactor while waiting for the CTEs/PlannerContext PR * feat: use prepare stmt's param data types in the planner context * chore: cleanup * refactor: address review comments * chore: cleanup * test: more prepare statement tests * chore: cleanup * chore: fix typos and add tests into the sqllogicaltests * docs: add docstring * chore: update test panic message due to recent change to have clearer message per review comment * chore: add a test and a doc string per review comments * fix: output of a test after master merge
Which issue does this PR close?
Closes #4426
Re: #4539
PREPARE SELECT ...
EXECUTE statement
will be done in a follow-on PRWhich prepare statements supported in this PR:
column > $1
andcolumn in [$1, $2, ...]
AGG(column) > $1
andAGG(column) in [$1, $2, ...]
SELECT * FROM (VALUES(1, $1), (2, $2)) AS t (num, letter);
Those are all I can think of for
SELECT
. Let me know if you think of other ones and I will add them in the tests. This PR does NOT support params in UPDATE and INSERTWhat changes are included in this PR?
PREPARE stmt
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, they can use PREPARE stmt now