Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for prepared statements #4338

Closed
rschlussel-zz opened this issue Jan 13, 2016 · 11 comments
Closed

Support for prepared statements #4338

rschlussel-zz opened this issue Jan 13, 2016 · 11 comments

Comments

@rschlussel-zz
Copy link
Member

Overview

Prepared statements are a way to save query plans to reuse for other queries in the session. They are a part of the sql standard and are used both through ODBC/JDBC drivers and in embedded sql for convenience and to save time in parsing and planning when a user wishes to run the same or similar queries many times in a row.

We'd like to add support for prepared statements in Presto. See the plan below and let me know if you have any comments or suggestions. Thanks!

PREPARE

Prepares a plan for a query and saves it for use later in the same session. The query can have parameters in it to replace literals that the user will fill in when executing the query

Syntax

PREPARE FROM
where parameters are represented by question marks
e.g.
PREPARE my-select FROM select n_name from nation where n_nationkey < ? AND n_regionkey = ?

Implementation

  • Add syntax to Parser.
  • Add Parameter Expression Type
    • a parameter should have a monotonically increasing id per query (parameters for each query labeled 0,1,2) to ensure that on execution you know which parameters go where, even if the query has been rewritten
    • parameter should also have a datatype.
  • Add a map within the session that maps statement names to plans. Potentially the map should also include input parameters and output column types for the prepared statement.
  • In the plan, add a prepare PlanNode that sits at the top of the query plan.
    Plan and optimize as usual.
    Then add the plan (nodes below prepare node) to the prepared statement map

DEALLOCATE

Removes a plan from your prepared plans.

Syntax

DEALLOCATE PREPARE
e.g.
DEALLOCATE PREPARE my-select;

Implementation

Add syntax to parser.
Remove the thing with name from your prepared plans map.

EXECUTE

Executes the already prepared statement of name replacing any parameters with the included args.

Syntax

EXECUTE [USING [, ,…]]
e.g.
EXECUTE my-select USING 12, 3;
to execute the query
SELECT n_name from nation where n_nationkey < 12 and n_regionkey = 3;

Implementation

  • Add syntax to parser.
  • The parameters should be included in the analysis information.
  • Grab the plan with name from the prepared plans map. If we have information about the types of parameters expected, can do that type checking here. Otherwise do it during the expression rewrite.
  • Add an expression rewrite rule to replace all instances of Parameter with literals corresponding to the appropriate parameter.
  • If there were any other optimizations that couldn't be run during prepare because they needed actual values, run those now.
  • Add verifier that you’ve gotten rid of all the Parameter expressions in your plan.

DESCRIBE OUTPUT

Get the output metadata for a prepared statement. BI tools such as tableau use prepared statements as a way to get output metadata for a query without having to run them.

Syntax

There is sql standard syntax for getting metadata for the output of a prepared statement, which is as follows:
DESCRIBE [OUTPUT] using

But Presto doesn’t have sql descriptors and we anyway just want to return the data in the usual way. Another issue with this syntax is that DESCRIBE without OUTPUT conflicts with DESCRIBE for Presto.

Therefore, I propose we leave out that portion and use the following slightly modified syntax:
DESCRIBE OUTPUT

e.g.
DESCRIBE OUTPUT my-select
would give you something like
n_name: varchar

Implementation

Add syntax to Parser
Get the plan from the prepared statement map. Return output columns/types

DESCRIBE INPUT

Get the input parameters required for a prepared statement. This is not as important, but could potentially be useful.

Syntax

ANSI compliant syntax would again use sql descriptors, but we would leave that out and write
DESCRIBE INPUT
e.g.
DESCRIBE INPUT my-select
would give you
bigint, bigint

Implementation

The easiest way to implement this would be to store the information in the prepared statements map and get it from there.

@rschlussel-zz
Copy link
Member Author

FYI: @petroav @mattsfuller

@electrum
Copy link
Contributor

This proposal is great. Using SQL for prepared statements is much elegant than anything I had considered and the proposal itself is well written. Nice work.

I'm not sure that caching prepared plans is necessary or even feasible. Today, the time and resources used by planning are usually orders of magnitude smaller than execution. We also need to re-run all the optimizers after the bound values are available, since they could result in a vastly different plan (historically a common problem in databases that naively reuse plans for prepared statements).

@rschlussel-zz
Copy link
Member Author

That makes sense. So I'll save the AST of the prepared query instead.

@martint
Copy link
Contributor

martint commented Jan 16, 2016

I see some differences from what the SQL standard specifies. For instance, the syntax for DEALLOCATE is supposed to be:

<deallocate prepared statement> ::= DEALLOCATE PREPARE <SQL statement name>

What section of the spec defines PREPARE? In Part 2, the syntax for it is described as:

<prepare statement> ::=
  PREPARE <SQL statement name> [ <attributes specification> ]
      FROM <SQL statement variable>

but that doesn't make much sense to me, so I imagine it's extended in another part of the spec.

@electrum
Copy link
Contributor

The spec has this chain (with other rules elided):

<SQL procedure statement> ::=
  <SQL executable statement>

<SQL executable statement> ::=
  <SQL dynamic statement>

<SQL dynamic statement> ::=
  <prepare statement>

These seem to be intended for use in routines or SQL embedded into other languages. For example, this chain:

<SQL-invoked procedure> ::=
  PROCEDURE ...
  <routine body>

<routine body> ::=
  <SQL routine spec>

<SQL routine body> ::=
  <SQL procedure statement>

They are also used as the executed action of triggers

So the question seems to be if they are intended to be used as "normal" SQL statements, or what the spec calls "Direct invocation" in 4.33.1 Classes of SQL-statements, and the answer is no. However, I see no reason not to allow it.

MySQL seems to support the standard syntax for this: http://dev.mysql.com/doc/refman/5.7/en/sql-syntax-prepared-statements.html

@martint
Copy link
Contributor

martint commented Jan 16, 2016

But where does it say that the syntax is PREPARE <name> FROM <query>? If you look at the rule I copied above, you'll see it references <SQL statement variable>, which resolves to

<SQL statement variable>
  : <simple value specification>
  ;

<simple value specification>
  : <literal>
  | <host parameter name>
  | <SQL parameter reference>
  | <embedded variable name>
  ;

@rschlussel-zz
Copy link
Member Author

In my reading prepare <name> from <query> is a subset of the possibilities in <SQL statement variable>, where <name> is a literal used to identify the query. In any case, for things that aren't embedded sql engines, none of the other possibilities make sense.

This syntax seems to make the most sense and aligns closely with standard syntax. As noted by @electrum, MySQL uses the same syntax for prepared statements created via direct sql calls. Postgresql also allows creating prepared statements via sql, but they use a slightly different syntax http://www.postgresql.org/docs/current/static/sql-prepare.html.

Do you have a different suggestion?

You are right about DEALLOCATE. I'll fix the syntax. to be DEALLOCATE PREPARE

@rschlussel-zz
Copy link
Member Author

To clarify my reasoning for choosing this syntax- It's true that the SQL syntax for prepared statements in the standard was developed to be used in embedded sql, and so it doesn't translate 100% cleanly, but mostly it does, and so I don't see a good reason to make up our own syntax beyond necessary adaptations to do the same thing.

@martint
Copy link
Contributor

martint commented Jan 18, 2016

Oh, nothing wrong with using SQL for this, especially if it's contemplated in the standard (in some form). I was just curious about how what's in the standard works given my comment about the rule for prepare statement not having a reference to a subquery. I've seen similar stuff with sql invokables, where the rules in Part 2 are non-sensical without the extensions in other sections (e.g., according to Part 2, a sql function or procedure can contain just one statement. This is modified in another part of the spec to allow multiple statements)

@martint
Copy link
Contributor

martint commented Jan 23, 2016

Ok, I was able to decipher what the spec is trying to say about PREPARE. The "SQL variable name" is supposed to be a variable of string type in the host language. The intended usage is something like:

char* query = "SELECT * FROM orders WHERE orderkey = ?";
EXEC SQL PREPARE foo FROM query;      << embedded SQL 

Here is how some databases use/abuse it:
Informix: http://www-304.ibm.com/support/knowledgecenter/SSGU8G_11.50.0/com.ibm.esqlc.doc/sii-14-96943.htm
Sybase: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00801.1540/doc/html/san1281564947508.html
DB2: https://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_prepare.html
MySQL: http://dev.mysql.com/doc/refman/5.7/en/prepare.html

All of them allow literal strings in addition to variable names (a departure from the spec). An example straight from the mysql docs:

mysql> PREPARE stmt1 FROM 'SELECT SQRT(POW(?,2) + POW(?,2)) AS hypotenuse';
mysql> SET @a = 3;
mysql> SET @b = 4;
mysql> EXECUTE stmt1 USING @a, @b;
+------------+
| hypotenuse |
+------------+
|          5 |
+------------+
mysql> DEALLOCATE PREPARE stmt1;

@rschlussel-zz
Copy link
Member Author

Support has been added to the presto server and cli. There remains work to add prepared statement support to the jdbc driver, which is tracked by #6158.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants