-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: prepared statements cache parsed query #15639
Conversation
Nice! Tagging with 1.1 as it is too late for this to go into 1.0. I didn't look at the change closely, but wanted to point out #14932 which also tackles this (though unsuccessfully). |
Wow, that's an impressive improvement! Can you help me understand why the first commit is necessary? Wouldn't it be valid to cache the parsed result with the placeholders substituted out? Reviewed 3 of 3 files at r1. Comments from Reviewable |
Nice work indeed. The first commit is necessary in that Eval should not panic on placeholders. But the code can be simplified, see my comments below. Reviewed 3 of 3 files at r1, 3 of 3 files at r2. pkg/sql/executor.go, line 507 at r2 (raw file):
pkg/sql/executor.go, line 513 at r2 (raw file):
Make this deferred function a common helper. Possibly take inspiration from #15358 (still planning to merge that one eventually) pkg/sql/parser/eval.go, line 2818 at r1 (raw file):
pkg/sql/parser/expr.go, line 616 at r1 (raw file):
remove this once pkg/sql/parser/expr.go, line 633 at r1 (raw file):
Not a fan of this. I don't see any benefits. pkg/sql/parser/type_check.go, line 738 at r1 (raw file):
See above. pkg/sql/pgwire/v3.go, line 596 at r2 (raw file):
Please revise this and above -- unfortunately Go also has empty but non-nil arrays. So you could have a parser.Statement reference that is non-nill but still should return I think we should really have a Comments from Reviewable |
@tamird in SQL you can execute a prepared statement multiple times with different placeholder values. Review status: all files reviewed at latest revision, 7 unresolved discussions. Comments from Reviewable |
Whoops, thanks for the pointer @petermattis as I totally missed your attempt. @tamird without that patch, the following sequence of commands would return
Review status: all files reviewed at latest revision, 7 unresolved discussions. Comments from Reviewable |
Ah, of course. Thanks @knz @jordanlewis! Reviewed 1 of 3 files at r2. Comments from Reviewable |
1cdffd8
to
90d7b21
Compare
TFTR @knz! Review status: all files reviewed at latest revision, 7 unresolved discussions. pkg/sql/executor.go, line 507 at r2 (raw file): Previously, knz (kena) wrote…
Renamed. I think giving the key isn't quite right, because each of the calling methods do different things with the prepared statement. pkg/sql/executor.go, line 513 at r2 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, knz (kena) wrote…
I think the Eval is necessary because placeholders can contain Expressions now and not only Datums. Unfortunately, this is also the wrong kind of I agree that your solution is cleaner - there's no point in mutating the expression tree to include the placeholder value. pkg/sql/parser/expr.go, line 633 at r1 (raw file): Previously, knz (kena) wrote…
Removed. pkg/sql/pgwire/v3.go, line 596 at r2 (raw file): Previously, knz (kena) wrote…
I think it's misleading to have a Comments from Reviewable |
Reviewed 1 of 4 files at r3, 3 of 3 files at r4. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
so indeed Meanwhile, whether you make this change or not, you will also need to check that placeholder values are properly propagated to distsql processors. I suspect they are not, until you extend the distsql protobufs. pkg/sql/pgwire/v3.go, line 596 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Ok, I can see how the current code limits this. I'll let it sleep until we start supporting stored procedures. Comments from Reviewable |
90d7b21
to
f1cc387
Compare
Tests should pass now, and DistSQL should work correctly. I discussed the best approach for marshalling placeholders over DistSQL with @RaduBerinde offline, and we came to the conclusion that it's easiest to replace placeholder values at serialization time rather than trying to send the placeholders to all relevant processors. That's what I've done here. Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, knz (kena) wrote…
I'm afraid that doing this correctly will be quite onerous, as many more values will need to be plumbed deep within DistSQL in all sorts of unexpected places (every time I agree that it's a little messy to store the value on the Comments from Reviewable |
Whoops, I spoke too soon about the tests. |
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
given what you said in the top-level discussion about how DistSQL will not need to deal with placeholders, can we now get rid of Better yet, I think it'd be nice if we had a pass (a Visitor) that does the substitution like the TypeChecking used to. Since supposedly we already have something like that for the distsql rendering? Then we wouldn't need to complicate any EvalContexts. Comments from Reviewable |
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
We can't naively get rid of Your suggestion to use a Visitor is another possible solution but does introduce some questions. If we were to use a Visitor, which is basically what the old code did to replace Comments from Reviewable |
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
FYI, I experimented with making a deep copy of the parse tree (using reflection to make the copy). That was a non-starter from a performance perspective. Might want to verify that result, and not using reflection could change the numbers. Comments from Reviewable |
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/sql/parser/eval.go, line 2818 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
@jordanlewis Visitors don't normally change the expression in-place. When you return a new expression from I implemented this "immutable nodes model" as an alternative to implementing deep copy (which was even more tedious to do). There are various things that do mutate nodes in-place so we're not strictly adhering to the model, but those changes are (so far) harmless. Comments from Reviewable |
f1cc387
to
139f60f
Compare
Ah I see. Thanks. I was going to do a comparison between the mutable method I'm using currently and a new I think the main change between now and when I first checked the performance is that DistSQL is always on, correct? I'll need to figure out why this is making things worse now. |
The simple query done by |
2867d96
to
0c65a85
Compare
I see. The trouble was that index selection expects placeholders to have been completely replaced by I've redone this patch to use a visitor to replace the placeholders right before type checking. This is considerably simpler and doesn't require messing with the expression serialization code. PTAL. Here are the new performance numbers. I don't see the 15% speedup anymore, now it's more like 5%. This is at least partially caused by the extra copies introduced by the visitor, but there's also considerable variance when testing this stuff locally. Before:
After:
|
0c65a85
to
35af9e2
Compare
Your solution albeit functionally correct is ... potentially multiplying the number of allocations by the size of the ASTs. I still think you can do better. Like this:
This saves you all the allocations and many tree traversals. Reviewed 6 of 10 files at r5, 5 of 5 files at r6, 1 of 1 files at r7, 3 of 3 files at r8. pkg/sql/prepare.go, line 101 at r8 (raw file):
This change is incorrect and removing the comment is misleading. Comments from Reviewable |
Previously, placeholder nodes were completely removed during type-checking when a value was available for them and replaced by their value. This mutated the expression tree and made it unsuitable for re-use. Now, the expression tree is "modified" using a visitor, which ensures that the original copy of the expression tree is left unchanged. This new behavior preserves the original parsed expression tree even after evaluation, which sets the stage for caching parse results.
As discussed offline, the approach you suggest above is what I tried in the previous iteration of this PR (mod keeping the placeholder values somewhere besides the I am planning to do that work in a separate PR, since it's a bit beyond the scope of this one. I've opened #15792 to track the remaining work. Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sql/prepare.go, line 101 at r8 (raw file): Previously, knz (kena) wrote…
Good point. I've restored the old behavior. Comments from Reviewable |
35af9e2
to
3969286
Compare
Previously, internal users that wanted to use a parser to parse a sql string containing more than one statement needed to declare a `parser.Parser` and call its `Parse` method. Now, `parser.Parse` is exported which does the exact same thing.
Previously, prepared statements contained only the text of their query, causing them to be re-parsed at execution time. Now, they contain a parsed query. This improves throughput of kv -read-percent=100 -concurrency=1 by about 15% on a local single-node cluster.
3969286
to
c3023d4
Compare
Reviewed 4 of 8 files at r9, 1 of 1 files at r10, 3 of 4 files at r11, 3 of 3 files at r12. Comments from Reviewable |
Some merge skew with cockroachdb#15639 caused the original fix to not really work. Array types are now permitted in `CastTargetToDatumType`, so we need to validate the result of its output in `CREATE VIEW` before passing it onward. This validation code was already there, but unfortunately occurred slightly too late to prevent issues. The validation now occurs soon enough to prevent trouble.
Some merge skew with cockroachdb#15639 caused the original fix to not really work. Array types are now permitted in `CastTargetToDatumType`, so we need to validate the result of its output in `CREATE VIEW` before passing it onward. This validation code was already there, but unfortunately occurred slightly too late to prevent issues. The validation now occurs soon enough to prevent trouble.
Some post-merge comments. Sorry for the late review. Nothing major though. One high-level thing I'd like to understand: does this change make us keep more state in a session than we were before for statements that are not necessarily going to be repeated? I know some drivers prepare all the statements before executing them. Are we going to keep state in that case (and if yes, are we now gonna keep more state)? Or do we somehow get an indication (maybe through the name) about what's really going to be reused and what's prepared for other reasons? Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. pkg/sql/executor.go, line 404 at r12 (raw file):
I spent a while scratching my head about whether this is supposed to be a single statement or it can also be multiple ones until I saw the pkg/sql/executor.go, line 482 at r12 (raw file):
nit: comment that this function needs to be directly pkg/sql/executor.go, line 508 at r12 (raw file):
are you sure this pkg/sql/executor.go, line 509 at r12 (raw file):
same for this pkg/sql/prepare.go, line 31 at r12 (raw file):
Mind explaining (or just telling me) what an "empty" prepared statement represents? What does it return? pkg/sql/prepare.go, line 73 at r12 (raw file):
I don't think "logging context" is a phrase we should be using. Contexts help with more than just logging. Feel free to ignore this, but I don't think this comment is necessary. I suspect there's no real need for the pkg/sql/prepare.go, line 74 at r12 (raw file):
Is this new method (which also comes with a duplicate comment) really necessary? Can't the caller do the parsing and call the other one? pkg/sql/prepare.go, line 95 at r12 (raw file):
the pkg/sql/parser/placeholders.go, line 168 at r9 (raw file):
what if a value is not found? Should that be an error? pkg/sql/parser/placeholders.go, line 169 at r9 (raw file):
do you need to recurse ( pkg/sql/parser/placeholders.go, line 178 at r9 (raw file):
The comment says something about a planner but I don't see one here. pkg/sql/parser/placeholders.go, line 180 at r9 (raw file):
if we really need to accept pkg/sql/parser/placeholders.go, line 183 at r9 (raw file):
why do you need two lines to initialize a pkg/sql/parser/type_check.go, line 736 at r9 (raw file):
this comment is unclear to me. What only happens during prepare? Is this function only called during prepare? Perhaps say that there shouldn't be any pkg/sql/pgwire/v3.go, line 826 at r12 (raw file):
the name says "parsed" but this seems to be about "prepared" statements pkg/sql/pgwire/v3.go, line 834 at r12 (raw file):
I'd remove this comment. Knowledge about how a session's ctx is used should live elsewhere I think. Comments from Reviewable |
Thanks for the comments! I'll follow up on these. At a high level, no, we don't keep more state in a session than before unless someone is using named prepared statements. The pgwire "extended protocol" does indeed prepare every statement before executing it, but it's prepared as the "default statement", which is the (single) unnamed prepared statement that is present in every session. I think that the memory overhead here is pretty much equivalent to what it was before. One exception I can think of is if you found a way to create a statement that is much larger in its parsed form than in its text form. Then, if you were to prepare and execute this statement via the unnamed prepared statement and then leave your session open but idle, the server would keep this structure in memory pointlessly. Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. pkg/sql/executor.go, line 404 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Added a comment. pkg/sql/executor.go, line 482 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/executor.go, line 508 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yeah this is incorrect. I think it's harmless during PREPARE though. pkg/sql/executor.go, line 509 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/prepare.go, line 31 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
It returns no data. This is valid behavior and happens in the wild, and we in fact have tests for it in pkg/sql/prepare.go, line 73 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
This was just copy pasta from pkg/sql/prepare.go, line 74 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I think this is cleaner. pkg/sql/prepare.go, line 95 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I agree that it's odd. I agree that pkg/sql/parser/placeholders.go, line 168 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
No - if we're in the prepare phase, we expect to see placeholders without values. pkg/sql/parser/placeholders.go, line 169 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Placeholders can be expressions, but they can't contain other placeholders. You're right. pkg/sql/parser/placeholders.go, line 178 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yep, changed to say SemaContext. pkg/sql/parser/placeholders.go, line 180 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
People pass pkg/sql/parser/placeholders.go, line 183 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I was copying how it's done in
Is there a reason it's done that way there? pkg/sql/parser/type_check.go, line 736 at r9 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/pgwire/v3.go, line 826 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I inlined this function as it's just called in one place. pkg/sql/pgwire/v3.go, line 834 at r12 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. Comments from Reviewable |
I have trouble understanding from the code where the named vs unnamed distinction comes into play. Does unnamed mean If that's true and a session has one unnamed prepared statement, consider storing the unnamed one separately from the named ones to make the code path that uses it more distinct from the paths that use cached ones, and prevent questions from the likes of me. The caching behaviour could also change I guess (you don't need to pre-parse the unnamed one if it's only executed once), but that I don't care about. I don't know if this makes sense. More questions, sorry :)
Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. pkg/sql/prepare.go, line 73 at r12 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
I realize that, I think it's bad everywhere. And a duplicate comment is also bad for other reasons; you should have one of the comments redirect to the other. What bothers me about this method also is the name pkg/sql/prepare.go, line 95 at r12 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
using whatever as a proxy is fine, but doing so by separately plumbing the length is what's bothering me. Consider making it a member even if you still only count the original size. It'd give you a place to hang a TODO from, and it'd hide the dirty laundry. pkg/sql/parser/placeholders.go, line 168 at r9 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
but why do we use this visitor in the prepare phase if it's not expected to do anything? pkg/sql/parser/placeholders.go, line 183 at r9 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
that one uses Comments from Reviewable |
Yes, unnamed means Caching plans is something that I'm planning to work on for 1.1. Like you mention, we do need an invalidation mechanism - naively, the plans could store a schema version and always get invalidated if the current version is different from the stored version. Presently, prepared statements get re-type-checked upon execution, so schema changes that invalidate them will fail at execute time, which is what Postgres does. I think it's desirable behavior. Executing a prepared statement should be equivalent to running the analogous non-prepared statement. I haven't thought too much about expanding the scope of statement caching, or frequent statement caching. I'm interested in working on that though at some point and I think it could certainly work. Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. Comments from Reviewable |
Previously, prepared statements contained only the text of their query, causing them to be re-parsed at execution time.
Now, they contain a parsed query.
This improves throughput of
kv -read-percent=100 -concurrency=1
by about 15% on a local single-node cluster.Updates #14927.