Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: prepared statements cache parsed query #15639

Merged
merged 4 commits into from
May 9, 2017

Conversation

jordanlewis
Copy link
Member

Previously, prepared statements contained only the text of their query, causing them to be re-parsed at execution time.

Now, they contain a parsed query.

This improves throughput of kv -read-percent=100 -concurrency=1 by about 15% on a local single-node cluster.

Updates #14927.

@jordanlewis jordanlewis requested review from andreimatei and knz May 3, 2017 14:17
@knz knz added the in progress label May 3, 2017
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis petermattis added this to the 1.1 milestone May 3, 2017
@petermattis
Copy link
Collaborator

Nice! Tagging with 1.1 as it is too late for this to go into 1.0. I didn't look at the change closely, but wanted to point out #14932 which also tackles this (though unsuccessfully).

@tamird
Copy link
Contributor

tamird commented May 3, 2017

Wow, that's an impressive improvement!

Can you help me understand why the first commit is necessary? Wouldn't it be valid to cache the parsed result with the placeholders substituted out?


Reviewed 3 of 3 files at r1.
Review status: 3 of 6 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented May 3, 2017

Nice work indeed.

The first commit is necessary in that Eval should not panic on placeholders. But the code can be simplified, see my comments below.


Reviewed 3 of 3 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


pkg/sql/executor.go, line 507 at r2 (raw file):

// ExecuteStatementsParsed executes the given statement and returns a response.
func (e *Executor) ExecuteStatementParsed(
  1. ExecutePreparedStatement is closer to the truth
  2. since this is only executing prepared statements, what about giving the prepared statement key as argument and move the prepared statement lookup to here.

pkg/sql/executor.go, line 513 at r2 (raw file):

	session.phaseTimes[sessionStartBatch] = timeutil.Now()

	defer func() {

Make this deferred function a common helper. Possibly take inspiration from #15358 (still planning to merge that one eventually)


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

// Eval implements the TypedExpr interface.
func (node *Placeholder) Eval(ctx *EvalContext) (Datum, error) {
	return node.Value.Eval(ctx)

return ctx.Placeholders.Value(node.Name)


pkg/sql/parser/expr.go, line 616 at r1 (raw file):

type Placeholder struct {
	Name  string
	Value TypedExpr

remove this once Eval uses ctx to retrieve the value.


pkg/sql/parser/expr.go, line 633 at r1 (raw file):

	buf.WriteByte('$')
	buf.WriteString(node.Name)
	if node.Value != nil {

Not a fan of this. I don't see any benefits.


pkg/sql/parser/type_check.go, line 738 at r1 (raw file):

	// If there is a value known already, immediately substitute with it.
	if v, ok := ctx.Placeholders.Value(expr.Name); ok {
		expr.Value = v

See above.


pkg/sql/pgwire/v3.go, line 596 at r2 (raw file):

// type should return NoData.
func canSendNoData(stmt parser.Statement) bool {
	return stmt == nil || stmt.StatementType() != parser.Rows

Please revise this and above -- unfortunately Go also has empty but non-nil arrays. So you could have a parser.Statement reference that is non-nill but still should return canSendNoData true.

I think we should really have a StatementList reference in PreparedStatement, not Statement, and use that here too with len(stmt) == 0 instead of stmt == nil.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented May 3, 2017

@tamird in SQL you can execute a prepared statement multiple times with different placeholder values.


Review status: all files reviewed at latest revision, 7 unresolved discussions.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Whoops, thanks for the pointer @petermattis as I totally missed your attempt.

@tamird without that patch, the following sequence of commands would return 3 both times, since executing a statement with placeholder values available actually modifies the parsed query:

PREPARE q AS SELECT $1;
EXECUTE q(3);
EXECUTE q(4);

Review status: all files reviewed at latest revision, 7 unresolved discussions.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented May 3, 2017

Ah, of course. Thanks @knz @jordanlewis!


Reviewed 1 of 3 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

TFTR @knz!


Review status: all files reviewed at latest revision, 7 unresolved discussions.


pkg/sql/executor.go, line 507 at r2 (raw file):

Previously, knz (kena) wrote…
  1. ExecutePreparedStatement is closer to the truth
  2. since this is only executing prepared statements, what about giving the prepared statement key as argument and move the prepared statement lookup to here.

Renamed. I think giving the key isn't quite right, because each of the calling methods do different things with the prepared statement.


pkg/sql/executor.go, line 513 at r2 (raw file):

Previously, knz (kena) wrote…

Make this deferred function a common helper. Possibly take inspiration from #15358 (still planning to merge that one eventually)

Done.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, knz (kena) wrote…

return ctx.Placeholders.Value(node.Name)

I think the Eval is necessary because placeholders can contain Expressions now and not only Datums.

Unfortunately, this is also the wrong kind of ctx... the SemaContext has the placeholders, not the EvalContext. I don't think there's a way to get from one to another right now... thoughts?

I agree that your solution is cleaner - there's no point in mutating the expression tree to include the placeholder value.


pkg/sql/parser/expr.go, line 633 at r1 (raw file):

Previously, knz (kena) wrote…

Not a fan of this. I don't see any benefits.

Removed.


pkg/sql/pgwire/v3.go, line 596 at r2 (raw file):

Previously, knz (kena) wrote…

Please revise this and above -- unfortunately Go also has empty but non-nil arrays. So you could have a parser.Statement reference that is non-nill but still should return canSendNoData true.

I think we should really have a StatementList reference in PreparedStatement, not Statement, and use that here too with len(stmt) == 0 instead of stmt == nil.

I think it's misleading to have a StatementList within PreparedStatement, though, because it should only have size 1 maximum. And I can't find a case where stmt here is non-nil but also should return no data - when we create a PreparedStatement in the first place, if there's no statements to process, we return a PreparedStatement with a nil Statement inside... please show me what I'm missing?


Comments from Reviewable

@knz
Copy link
Contributor

knz commented May 3, 2017

Reviewed 1 of 4 files at r3, 3 of 3 files at r4.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think the Eval is necessary because placeholders can contain Expressions now and not only Datums.

Unfortunately, this is also the wrong kind of ctx... the SemaContext has the placeholders, not the EvalContext. I don't think there's a way to get from one to another right now... thoughts?

I agree that your solution is cleaner - there's no point in mutating the expression tree to include the placeholder value.

so indeed return ctx.xxx.Get(node.Name).Eval(ctx)
The placeholder values need to be plumbed through the EvalContext, indeed.

Meanwhile, whether you make this change or not, you will also need to check that placeholder values are properly propagated to distsql processors. I suspect they are not, until you extend the distsql protobufs.


pkg/sql/pgwire/v3.go, line 596 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think it's misleading to have a StatementList within PreparedStatement, though, because it should only have size 1 maximum. And I can't find a case where stmt here is non-nil but also should return no data - when we create a PreparedStatement in the first place, if there's no statements to process, we return a PreparedStatement with a nil Statement inside... please show me what I'm missing?

Ok, I can see how the current code limits this. I'll let it sleep until we start supporting stored procedures.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Tests should pass now, and DistSQL should work correctly.

I discussed the best approach for marshalling placeholders over DistSQL with @RaduBerinde offline, and we came to the conclusion that it's easiest to replace placeholder values at serialization time rather than trying to send the placeholders to all relevant processors. That's what I've done here.


Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, knz (kena) wrote…

so indeed return ctx.xxx.Get(node.Name).Eval(ctx)
The placeholder values need to be plumbed through the EvalContext, indeed.

Meanwhile, whether you make this change or not, you will also need to check that placeholder values are properly propagated to distsql processors. I suspect they are not, until you extend the distsql protobufs.

I'm afraid that doing this correctly will be quite onerous, as many more values will need to be plumbed deep within DistSQL in all sorts of unexpected places (every time MakeExpression is called), so for the time being I've left the value expression within the Placeholder node.

I agree that it's a little messy to store the value on the Placeholder, not to mention dangerous since it is not cleared after execution, which exposes us to seeing old placeholder values if there are bugs in our code.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Whoops, I spoke too soon about the tests.

@andreimatei
Copy link
Contributor

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm afraid that doing this correctly will be quite onerous, as many more values will need to be plumbed deep within DistSQL in all sorts of unexpected places (every time MakeExpression is called), so for the time being I've left the value expression within the Placeholder node.

I agree that it's a little messy to store the value on the Placeholder, not to mention dangerous since it is not cleared after execution, which exposes us to seeing old placeholder values if there are bugs in our code.

given what you said in the top-level discussion about how DistSQL will not need to deal with placeholders, can we now get rid of Placeholder.Value, and do things along the lines Rafa's suggesting?

Better yet, I think it'd be nice if we had a pass (a Visitor) that does the substitution like the TypeChecking used to. Since supposedly we already have something like that for the distsql rendering? Then we wouldn't need to complicate any EvalContexts.
Does that make sense?


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

given what you said in the top-level discussion about how DistSQL will not need to deal with placeholders, can we now get rid of Placeholder.Value, and do things along the lines Rafa's suggesting?

Better yet, I think it'd be nice if we had a pass (a Visitor) that does the substitution like the TypeChecking used to. Since supposedly we already have something like that for the distsql rendering? Then we wouldn't need to complicate any EvalContexts.
Does that make sense?

We can't naively get rid of Placeholder.Value without plumbing the placeholders map into everywhere MakeExpression is called, which is what I mentioned above

Your suggestion to use a Visitor is another possible solution but does introduce some questions. If we were to use a Visitor, which is basically what the old code did to replace Placeholders with their values, we'd again break the assumption that the parse tree stays unmodified after its initial creation. Alternatively, we could make a deep copy of the parse tree right before executing a prepared statement. Perhaps that is the correct solution here?


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

We can't naively get rid of Placeholder.Value without plumbing the placeholders map into everywhere MakeExpression is called, which is what I mentioned above

Your suggestion to use a Visitor is another possible solution but does introduce some questions. If we were to use a Visitor, which is basically what the old code did to replace Placeholders with their values, we'd again break the assumption that the parse tree stays unmodified after its initial creation. Alternatively, we could make a deep copy of the parse tree right before executing a prepared statement. Perhaps that is the correct solution here?

FYI, I experimented with making a deep copy of the parse tree (using reflection to make the copy). That was a non-starter from a performance perspective. Might want to verify that result, and not using reflection could change the numbers.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/parser/eval.go, line 2818 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

FYI, I experimented with making a deep copy of the parse tree (using reflection to make the copy). That was a non-starter from a performance perspective. Might want to verify that result, and not using reflection could change the numbers.

@jordanlewis Visitors don't normally change the expression in-place. When you return a new expression fromVisitPre/VisitPost, the parent nodes are copied as necessary so that the old tree remains valid. See WalkExpr.

I implemented this "immutable nodes model" as an alternative to implementing deep copy (which was even more tedious to do). There are various things that do mutate nodes in-place so we're not strictly adhering to the model, but those changes are (so far) harmless.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Ah I see. Thanks. I was going to do a comparison between the mutable method I'm using currently and a new WalkExpr implementation, when I noticed that this patch now tanks the performance vs master.

I think the main change between now and when I first checked the performance is that DistSQL is always on, correct? I'll need to figure out why this is making things worse now.

@RaduBerinde
Copy link
Member

The simple query done by kv shouldn't be running through DistSQL in auto mode. We even have a test for that (explain_distsql:25).

@jordanlewis jordanlewis force-pushed the prepare-parsed branch 2 times, most recently from 2867d96 to 0c65a85 Compare May 7, 2017 02:55
@jordanlewis
Copy link
Member Author

I see. The trouble was that index selection expects placeholders to have been completely replaced by Datum values. That test is unfortunately insufficient because it doesn't try to use prepared queries, and we can't write an EXPLAIN test against prepared queries yet. I filed #15743 to add support for that.

I've redone this patch to use a visitor to replace the placeholders right before type checking. This is considerably simpler and doesn't require messing with the expression serialization code. PTAL.

Here are the new performance numbers. I don't see the 15% speedup anymore, now it's more like 5%. This is at least partially caused by the extra copies introduced by the visitor, but there's also considerable variance when testing this stuff locally.

Before:

[21:56]% ./kv -read-percent=100 -concurrency=1 -duration=10s
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         2790.2         2790.1      0.5      1.4      3.3
      2s        0         3121.9         2955.6      0.4      0.7      4.5
      3s        0         3105.3         3005.5      0.4      0.7      4.1
      4s        0         3030.1         3011.6      0.5      0.7      2.8
      5s        0         3117.8         3032.9      0.4      0.7      3.1
      6s        0         3122.4         3047.8      0.4      0.7      3.5
      7s        0         3124.7         3058.8      0.4      0.7      4.2
      8s        0         3035.6         3055.9      0.4      0.7      3.7
      9s        0         3049.5         3055.2      0.4      0.6      5.8
     10s        0         3101.3         3059.8      0.4      0.6      5.0

_elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end)
   10.0s        0          30608         3059.8      0.4      0.7      5.8              0 / 0

BenchmarkBlocks	   30608	    326824.5 ns/op

After:

[21:43:130]% ./kv -read-percent=100 -concurrency=1 -duration=10s
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         3016.9         3016.8      0.5      0.7      3.4
      2s        0         3333.4         3174.8      0.4      0.6      2.4
      3s        0         3298.2         3215.9      0.4      0.6      3.1
      4s        0         3278.9         3231.7      0.4      0.6      2.6
      5s        0         3276.6         3240.6      0.4      0.6      3.5
      6s        0         3259.5         3243.8      0.4      0.7      3.3
      7s        0         3297.6         3251.5      0.4      0.7      2.9
      8s        0         3230.3         3248.8      0.4      0.6      3.9
      9s        0         3113.6         3233.8      0.5      0.7      5.5
     10s        0         3300.2         3240.4      0.4      0.6      3.8

_elapsed___errors____________ops___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)_____seq(begin/end)
   10.0s        0          32419         3240.5      0.4      0.7      5.5              0 / 0

BenchmarkBlocks	   32419	    308592.1 ns/op

@knz
Copy link
Contributor

knz commented May 7, 2017

Your solution albeit functionally correct is ... potentially multiplying the number of allocations by the size of the ASTs.

I still think you can do better. Like this:

  • do like you did in one of your earlier attempts: have TypeCheck keep the placeholders, and have Eval look the value up from the context
  • use a callback in FmtParsable, used by distsql to "serialize" expressions, to replace the placeholders by their value datums during the AST to string conversion. This way the distsql processors don't see any placeholder any more.

This saves you all the allocations and many tree traversals.


Reviewed 6 of 10 files at r5, 5 of 5 files at r6, 1 of 1 files at r7, 3 of 3 files at r8.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/sql/prepare.go, line 101 at r8 (raw file):

	}

	sz := int64(uintptr(len(name)) + unsafe.Sizeof(*stmt))

This change is incorrect and removing the comment is misleading.
The correct code here should measure the memory usage of the AST that's being retained. I understand you can't or don't want to do this here, but an outrageously large query should show up in the memory monitoring still. At least you could keep the length of the query string as an estimator of the memory usage -- the reasoning being that the ASCII encoding in SQL is not a very large constant factor away from the AST structural representation.


Comments from Reviewable

Previously, placeholder nodes were completely removed during
type-checking when a value was available for them and replaced by their
value. This mutated the expression tree and made it unsuitable for
re-use.

Now, the expression tree is "modified" using a visitor, which ensures
that the original copy of the expression tree is left unchanged.

This new behavior preserves the original parsed expression tree even
after evaluation, which sets the stage for caching parse results.
@jordanlewis
Copy link
Member Author

As discussed offline, the approach you suggest above is what I tried in the previous iteration of this PR (mod keeping the placeholder values somewhere besides the Placeholder nodes), and I abandoned it due to the additional complexity of properly teaching the index selection and analyze code about the possible existence of Placeholders.

I am planning to do that work in a separate PR, since it's a bit beyond the scope of this one. I've opened #15792 to track the remaining work.


Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/prepare.go, line 101 at r8 (raw file):

Previously, knz (kena) wrote…

This change is incorrect and removing the comment is misleading.
The correct code here should measure the memory usage of the AST that's being retained. I understand you can't or don't want to do this here, but an outrageously large query should show up in the memory monitoring still. At least you could keep the length of the query string as an estimator of the memory usage -- the reasoning being that the ASCII encoding in SQL is not a very large constant factor away from the AST structural representation.

Good point. I've restored the old behavior.


Comments from Reviewable

Previously, internal users that wanted to use a parser to parse a sql
string containing more than one statement needed to declare a
`parser.Parser` and call its `Parse` method.

Now, `parser.Parse` is exported which does the exact same thing.
Previously, prepared statements contained only the text of their query,
causing them to be re-parsed at execution time.

Now, they contain a parsed query.

This improves throughput of kv -read-percent=100 -concurrency=1 by about
15% on a local single-node cluster.
@knz
Copy link
Contributor

knz commented May 9, 2017

:lgtm:


Reviewed 4 of 8 files at r9, 1 of 1 files at r10, 3 of 4 files at r11, 3 of 3 files at r12.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

@jordanlewis jordanlewis merged commit 84dcbe0 into cockroachdb:master May 9, 2017
@jordanlewis jordanlewis deleted the prepare-parsed branch May 9, 2017 17:46
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request May 9, 2017
Some merge skew with cockroachdb#15639 caused the original fix to not really work.
Array types are now permitted in `CastTargetToDatumType`, so we need to
validate the result of its output in `CREATE VIEW` before passing it
onward. This validation code was already there, but unfortunately
occurred slightly too late to prevent issues. The validation now occurs
soon enough to prevent trouble.
jordanlewis added a commit to jordanlewis/cockroach that referenced this pull request May 9, 2017
Some merge skew with cockroachdb#15639 caused the original fix to not really work.
Array types are now permitted in `CastTargetToDatumType`, so we need to
validate the result of its output in `CREATE VIEW` before passing it
onward. This validation code was already there, but unfortunately
occurred slightly too late to prevent issues. The validation now occurs
soon enough to prevent trouble.
@andreimatei
Copy link
Contributor

Some post-merge comments. Sorry for the late review. Nothing major though.

One high-level thing I'd like to understand: does this change make us keep more state in a session than we were before for statements that are not necessarily going to be repeated? I know some drivers prepare all the statements before executing them. Are we going to keep state in that case (and if yes, are we now gonna keep more state)? Or do we somehow get an indication (maybe through the name) about what's really going to be reused and what's prepared for other reasons?


Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/sql/executor.go, line 404 at r12 (raw file):

// nil if there are no results).
func (e *Executor) Prepare(
	stmts parser.StatementList, session *Session, pinfo parser.PlaceholderTypes,

I spent a while scratching my head about whether this is supposed to be a single statement or it can also be multiple ones until I saw the switch len(stmts) below.
Mind documenting that it needs to be a single statement?
Better yet, I think it'd be better if this took parser.Statement and we'd move error checking to the callers.


pkg/sql/executor.go, line 482 at r12 (raw file):

}

func logIfPanicking(ctx context.Context, sql string) {

nit: comment that this function needs to be directly deferred. Like, it can't be called from an anonymous deferred function.


pkg/sql/executor.go, line 508 at r12 (raw file):

	session *Session, stmt *PreparedStatement, pinfo *parser.PlaceholderInfo,
) StatementResults {
	session.resetForBatch(e)

are you sure this resetForBatch() here is correct? I don't know exactly what the consequences of it are, but the name certainly indicates that it should be done per batch of statements.
It's also called from Executor.Prepare, which I'm guessing is also wrong now that we support PREPARE as a sql statement, and not just as a wire command that always runs outside of batches.


pkg/sql/executor.go, line 509 at r12 (raw file):

) StatementResults {
	session.resetForBatch(e)
	session.phaseTimes[sessionStartBatch] = timeutil.Now()

same for this phaseTimes - is the "startBatch" correct?


pkg/sql/prepare.go, line 31 at r12 (raw file):

// of arguments and results have been determined.
type PreparedStatement struct {
	// Statement is the parsed, prepared SQL statement. It may be nil if the

Mind explaining (or just telling me) what an "empty" prepared statement represents? What does it return?


pkg/sql/prepare.go, line 73 at r12 (raw file):

// in inferring placeholder types.
//
// ps.session.Ctx() is used as the logging context for the prepare operation.

I don't think "logging context" is a phrase we should be using. Contexts help with more than just logging.

Feel free to ignore this, but I don't think this comment is necessary. I suspect there's no real need for the PreparedStatements type to hold a reference to the session; it could probably be passed in everywhere. And then you probably wouldn't have felt the need to write it, as it's more common knowledge that sessions have a context (or at least something more magic that provides a context) for all their operations. Or if we keep the session ptr, I think a comment on the PreparedStatements type saying that an instance is a part of a Session would be better,


pkg/sql/prepare.go, line 74 at r12 (raw file):

//
// ps.session.Ctx() is used as the logging context for the prepare operation.
func (ps PreparedStatements) NewFromString(

Is this new method (which also comes with a duplicate comment) really necessary? Can't the caller do the parsing and call the other one?


pkg/sql/prepare.go, line 95 at r12 (raw file):

	name string,
	stmts parser.StatementList,
	queryLen int,

the queryLen argument seems very odd to me. We moved to a nice structured type for the statement (really, StatementList, but as I was saying above I think that's a mistake), but we pass this length separately. And also the query string length is pretty silly IMHO for the memory mgmt purposes for which it's used. I suggest adding a Size() method to a parser.Statement which takes into account more than just the query length, and getting rid of this argument.


pkg/sql/parser/placeholders.go, line 168 at r9 (raw file):

	switch t := expr.(type) {
	case *Placeholder:
		if e, ok := v.placeholders.Value(t.Name); ok {

what if a value is not found? Should that be an error?
This probably deserves to be documented on the placeholderVisitors type definition.


pkg/sql/parser/placeholders.go, line 169 at r9 (raw file):

	case *Placeholder:
		if e, ok := v.placeholders.Value(t.Name); ok {
			return true, e

do you need to recurse (true) here?


pkg/sql/parser/placeholders.go, line 178 at r9 (raw file):

// replacePlaceholders replaces all placeholders in the input expression with
// their supplied values in the planner's Placeholders map.

The comment says something about a planner but I don't see one here.


pkg/sql/parser/placeholders.go, line 180 at r9 (raw file):

// their supplied values in the planner's Placeholders map.
func replacePlaceholders(expr Expr, ctx *SemaContext) Expr {
	if ctx == nil {

if we really need to accept nil ctx, please document what that means.


pkg/sql/parser/placeholders.go, line 183 at r9 (raw file):

		return expr
	}
	v := &placeholdersVisitor{}

why do you need two lines to initialize a placeholderVisitor?


pkg/sql/parser/type_check.go, line 736 at r9 (raw file):

// TypeCheck implements the Expr interface.
func (expr *Placeholder) TypeCheck(ctx *SemaContext, desired Type) (TypedExpr, error) {
	// Perform placeholder typing. This only happens during Prepare, when there

this comment is unclear to me. What only happens during prepare? Is this function only called during prepare?

Perhaps say that there shouldn't be any Placeholders after prepare. Could we also assert that through the SemaContext somehow?


pkg/sql/pgwire/v3.go, line 826 at r12 (raw file):

}

func (c *v3Conn) executeStatementParsed(

the name says "parsed" but this seems to be about "prepared" statements


pkg/sql/pgwire/v3.go, line 834 at r12 (raw file):

) error {
	tracing.AnnotateTrace()
	// Note: sql.Executor gets its Context from c.session.context, which

sql.Executor is a singleton; it doesn't "get" a context.

I'd remove this comment. Knowledge about how a session's ctx is used should live elsewhere I think.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Thanks for the comments! I'll follow up on these.

At a high level, no, we don't keep more state in a session than before unless someone is using named prepared statements. The pgwire "extended protocol" does indeed prepare every statement before executing it, but it's prepared as the "default statement", which is the (single) unnamed prepared statement that is present in every session.

I think that the memory overhead here is pretty much equivalent to what it was before. One exception I can think of is if you found a way to create a statement that is much larger in its parsed form than in its text form. Then, if you were to prepare and execute this statement via the unnamed prepared statement and then leave your session open but idle, the server would keep this structure in memory pointlessly.


Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/sql/executor.go, line 404 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I spent a while scratching my head about whether this is supposed to be a single statement or it can also be multiple ones until I saw the switch len(stmts) below.
Mind documenting that it needs to be a single statement?
Better yet, I think it'd be better if this took parser.Statement and we'd move error checking to the callers.

Added a comment.


pkg/sql/executor.go, line 482 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: comment that this function needs to be directly deferred. Like, it can't be called from an anonymous deferred function.

Done.


pkg/sql/executor.go, line 508 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

are you sure this resetForBatch() here is correct? I don't know exactly what the consequences of it are, but the name certainly indicates that it should be done per batch of statements.
It's also called from Executor.Prepare, which I'm guessing is also wrong now that we support PREPARE as a sql statement, and not just as a wire command that always runs outside of batches.

Yeah this is incorrect. I think it's harmless during PREPARE though.


pkg/sql/executor.go, line 509 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

same for this phaseTimes - is the "startBatch" correct?

Done.


pkg/sql/prepare.go, line 31 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Mind explaining (or just telling me) what an "empty" prepared statement represents? What does it return?

It returns no data. This is valid behavior and happens in the wild, and we in fact have tests for it in pgwire_test.go.


pkg/sql/prepare.go, line 73 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't think "logging context" is a phrase we should be using. Contexts help with more than just logging.

Feel free to ignore this, but I don't think this comment is necessary. I suspect there's no real need for the PreparedStatements type to hold a reference to the session; it could probably be passed in everywhere. And then you probably wouldn't have felt the need to write it, as it's more common knowledge that sessions have a context (or at least something more magic that provides a context) for all their operations. Or if we keep the session ptr, I think a comment on the PreparedStatements type saying that an instance is a part of a Session would be better,

This was just copy pasta from PreparedStatements.New.


pkg/sql/prepare.go, line 74 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is this new method (which also comes with a duplicate comment) really necessary? Can't the caller do the parsing and call the other one?

I think this is cleaner.


pkg/sql/prepare.go, line 95 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

the queryLen argument seems very odd to me. We moved to a nice structured type for the statement (really, StatementList, but as I was saying above I think that's a mistake), but we pass this length separately. And also the query string length is pretty silly IMHO for the memory mgmt purposes for which it's used. I suggest adding a Size() method to a parser.Statement which takes into account more than just the query length, and getting rid of this argument.

I agree that it's odd. I agree that Statement should have a Size - in the mean time @knz suggested using query string length as a proxy.


pkg/sql/parser/placeholders.go, line 168 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what if a value is not found? Should that be an error?
This probably deserves to be documented on the placeholderVisitors type definition.

No - if we're in the prepare phase, we expect to see placeholders without values.


pkg/sql/parser/placeholders.go, line 169 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do you need to recurse (true) here?

Placeholders can be expressions, but they can't contain other placeholders. You're right.


pkg/sql/parser/placeholders.go, line 178 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

The comment says something about a planner but I don't see one here.

Yep, changed to say SemaContext.


pkg/sql/parser/placeholders.go, line 180 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if we really need to accept nil ctx, please document what that means.

People pass nil SemaContexts to TypeCheck in tests a lot.


pkg/sql/parser/placeholders.go, line 183 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

why do you need two lines to initialize a placeholderVisitor?

I was copying how it's done in select_name_resolution.go without thinking about it:

	v := &p.nameResolutionVisitor
	*v = nameResolutionVisitor{
		err:                nil,
		sources:            sources,
		colOffsets:         make([]int, len(sources)),
		iVarHelper:         ivarHelper,
		searchPath:         p.session.SearchPath,
		foundDependentVars: false,
	}

Is there a reason it's done that way there?


pkg/sql/parser/type_check.go, line 736 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this comment is unclear to me. What only happens during prepare? Is this function only called during prepare?

Perhaps say that there shouldn't be any Placeholders after prepare. Could we also assert that through the SemaContext somehow?

Done.


pkg/sql/pgwire/v3.go, line 826 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

the name says "parsed" but this seems to be about "prepared" statements

I inlined this function as it's just called in one place.


pkg/sql/pgwire/v3.go, line 834 at r12 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

sql.Executor is a singleton; it doesn't "get" a context.

I'd remove this comment. Knowledge about how a session's ctx is used should live elsewhere I think.

Done.


Comments from Reviewable

@andreimatei
Copy link
Contributor

I have trouble understanding from the code where the named vs unnamed distinction comes into play. Does unnamed mean name == ""? If so, maybe you can find a place to document what this means and how these unnamed statements are used by the protocol. I know this is not new code, but you seem to understand it :)

If that's true and a session has one unnamed prepared statement, consider storing the unnamed one separately from the named ones to make the code path that uses it more distinct from the paths that use cached ones, and prevent questions from the likes of me. The caching behaviour could also change I guess (you don't need to pre-parse the unnamed one if it's only executed once), but that I don't care about. I don't know if this makes sense.

More questions, sorry :)

  • is the next step to move to a cache that's not per-session? I guess many apps will prepare the same statements in all the sessions. Not to mention that we should also be caching frequent non-prepared queries.
  • should we / can we store the plan that Executor.Prepare() creates, instead of simply the parsed statement? I expect that'd yield more performance improvements. I guess we need some invalidation story w.r.t. schema changes, but I'm already unclear how the current prepared statements are supposed to interact with schema changes that change placeholder types?

Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


pkg/sql/prepare.go, line 73 at r12 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

This was just copy pasta from PreparedStatements.New.

I realize that, I think it's bad everywhere. And a duplicate comment is also bad for other reasons; you should have one of the comments redirect to the other.
Or I would get rid of the method :)

What bothers me about this method also is the name New. I had to read it three times to understand what type it's defined on and what it returns. And now there's two of them.


pkg/sql/prepare.go, line 95 at r12 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I agree that it's odd. I agree that Statement should have a Size - in the mean time @knz suggested using query string length as a proxy.

using whatever as a proxy is fine, but doing so by separately plumbing the length is what's bothering me. Consider making it a member even if you still only count the original size. It'd give you a place to hang a TODO from, and it'd hide the dirty laundry.


pkg/sql/parser/placeholders.go, line 168 at r9 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

No - if we're in the prepare phase, we expect to see placeholders without values.

but why do we use this visitor in the prepare phase if it's not expected to do anything?


pkg/sql/parser/placeholders.go, line 183 at r9 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I was copying how it's done in select_name_resolution.go without thinking about it:

	v := &p.nameResolutionVisitor
	*v = nameResolutionVisitor{
		err:                nil,
		sources:            sources,
		colOffsets:         make([]int, len(sources)),
		iVarHelper:         ivarHelper,
		searchPath:         p.session.SearchPath,
		foundDependentVars: false,
	}

Is there a reason it's done that way there?

that one uses &p.nameResolutionVisitor to avoid an allocation (as documented at that field's definition). You may want to preallocate your visitor in a similar way


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Yes, unnamed means name == "". It's already thinly documented. I'll improve it a little, but the behavior of the unnamed prepared query and named prepared queries is exactly identical except for the fact that the unnamed prepared query can be overwritten, so I don't know if it would be a good idea to separate the code paths. In other words, clients can in fact reuse the unnamed prepared statement just as they can named prepared statements.

Caching plans is something that I'm planning to work on for 1.1. Like you mention, we do need an invalidation mechanism - naively, the plans could store a schema version and always get invalidated if the current version is different from the stored version. Presently, prepared statements get re-type-checked upon execution, so schema changes that invalidate them will fail at execute time, which is what Postgres does. I think it's desirable behavior. Executing a prepared statement should be equivalent to running the analogous non-prepared statement.

I haven't thought too much about expanding the scope of statement caching, or frequent statement caching. I'm interested in working on that though at some point and I think it could certainly work.


Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants