Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add unimplemented errors and telemetry for plpgsql #99517

Merged
merged 7 commits into from
Apr 27, 2023

Conversation

e-mbrown
Copy link
Contributor

Part of: #98147
Closes: #91569

This PR expands grammar rules and adds unimplemented support for pl/pgsql statements.

sql: Add telemetry for parsable plpgsql statements Is the first commit unique to this PR.

Release Notes: None

@e-mbrown e-mbrown requested review from chengxiong-ruan, ZhouXing19 and a team March 24, 2023 19:59
@e-mbrown e-mbrown requested a review from a team as a code owner March 24, 2023 19:59
@e-mbrown e-mbrown requested a review from a team March 24, 2023 19:59
@e-mbrown e-mbrown requested a review from a team as a code owner March 24, 2023 19:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite is very good.

I wonder though - is there a way to do this without copy-pasting all the lexing code into a new directory? The lexing rules for plpgsql are the same as the regular SQL. And given their complexity, we certainly do not wish to have two copies to maintain side-by-side.

Error handling using panic is also unacceptable. There should be more uses of errors.AssertionFailedf in there.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @e-mbrown, and @ZhouXing19)


pkg/sql/plpgsql/parser/plpgsql.y line 135 at r18 (raw file):

 */
// TODO (Chengxiong) figure out why these 3 tokens are needed
%token <plWord>		T_WORD		/* unrecognized simple identifier */

throughout: spaces, not tabs

@chengxiong-ruan
Copy link
Contributor

I wonder though - is there a way to do this without copy-pasting all the lexing code into a new directory? The lexing rules for plpgsql are the same as the regular SQL. And given their complexity, we certainly do not wish to have two copies to maintain side-by-side.

Thanks for the callout! The lexer code for plpgsql actually has a bunch of customized logic which relies on ReadSqlConstruct which scans through remaining parts of a statement until a ; after seeing some expected plpgsql keywords. This is actually the most important part the plpgsql lexer working with non-plpgsql syntaxes. I think that merging it with the sql lexer would be very tricky. But this brought up a good question on the scanner which definitely contains a lot of complex logic which should be merged with the sql scanner, for example how it recognize numbers, string constants and etc... @e-mbrown could you compare the tow scanners? I think most part of them are the same.

Copy link
Contributor Author

@e-mbrown e-mbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan and @ZhouXing19)

@knz
Copy link
Contributor

knz commented Mar 27, 2023

The lexer code for plpgsql actually has a bunch of customized logic which relies on ReadSqlConstruct which scans through remaining parts of a statement until a ; after seeing some expected plpgsql keywords. This is actually the most important part the plpgsql lexer working with non-plpgsql syntaxes.

Yes I agree.

I think that merging it with the sql lexer would be very tricky

Here I disagree. There's a thing we can do with dependency injection. I'd really like us to look at this seriously because the organizational overhead of a code split like you propose here is really going to hurt us in the long run.

@knz
Copy link
Contributor

knz commented Mar 27, 2023

(I'll volunteer to sit down and discuss to get something workable)

@chengxiong-ruan
Copy link
Contributor

(I'll volunteer to sit down and discuss to get something workable)

Thanks! Love to discuss about this.

@knz
Copy link
Contributor

knz commented Mar 28, 2023

In preparation of our chat, I went to the pg source code and saw a few things:

  • the plpgsql scanner is indeed shared with the main sql scanner in pg. This is done by making the list of keywords a parameter of the scanner (if my reading is correct). The remainder - identifier vs literal scans etc is just the same code.
  • the plpgsql grammar uses separate API functions read_sql_stmt(), read_sql_construct(), read_sql_expression(), read_datatype() which makes the code much more succinct and readable. The work here should probably benefit from that refinement too.

@ZhouXing19
Copy link
Collaborator

ZhouXing19 commented Mar 28, 2023

I agree that we can reuse the sql parser's scanner and lexer part. We can use the shared part in a cleaner way, and for parts that need to be customized for plpgsql, just override them or add new methods.

Just wanted to add some details to what Raphael has mentioned, postgres's plpg scanner share with their sql scanner. So I think we may want to have something similar, maybe like this:

type PLScanner struct {
	scanner  sqlscanner.Scanner
}

and override methods if needed.

My original concern was that importing sql/scanner or sql/parser into the plpgsql package would bring dependency loop, as we may want to hook the plpgsql parser in sql.y. But seems that is avoidable if we delay the function body parsing till the execution layer, like this.

@knz
Copy link
Contributor

knz commented Mar 28, 2023

Importing sql/scanner or sql/parser into the plpgsql package would bring dependency loop, as we may want to hook the plpgsql parser in sql.y.

let's fix this using dependency injection.

@chengxiong-ruan
Copy link
Contributor

chengxiong-ruan commented Mar 28, 2023

Importing sql/scanner or sql/parser into the plpgsql package would bring dependency loop, as we may want to hook the plpgsql parser in sql.y.

Maybe we don't really need worry about hooking plpgsql parser in sql.y. I think when you call CREATE FUNCTION, function body is treated as pure string, statements are then parsed in optbuilder to analyze dependencies. One minor concern is that there are different hard coded scanning logic in the scanners, for example. But I think we cant sort it out somehow.

My biggest concern is indeed how we merge the two lexer when one of them has these ReadSqlConstruct thing (we actually copied the idea from postgres :) ). Another concern would be the lookahead logics which I think can be handled with dep injection.

Look forward to the chat~

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of code sharing, in decreasing order of priority:

  1. the code generators (e.g. the .awk file, the main.go in 'allkeywords', maybe the .sh file with an extra argument) can be defined in just 1 place and then referred to in the BUILD files from multiple go packages.
  2. make a base impl of the code in scanner.go, with the Scan method provided via the constructor, then specialize 2 different Scan functions in the 2 different packages. This way we effectively reuse the ident/str /etc parsing code.
  3. try and make all the complex error handling (construction of error obj + populating err details + telemetry) inside helper functions used in both places.
  4. (stretch goal) explore more code sharing from lexer.go but perhaps at a lower priority.

Separately another thing that would make my day slightly brighter is to polish the .y file to make it look more like the other wrt whitespace, alignment, comments etc. Your call.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @e-mbrown, and @ZhouXing19)

@e-mbrown e-mbrown requested a review from a team as a code owner March 30, 2023 18:42
@e-mbrown e-mbrown force-pushed the eb/unimp branch 6 times, most recently from 9ac3500 to 9a1cb2a Compare April 20, 2023 17:27
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. But I have a few major comments.

@@ -835,6 +836,16 @@ func (ex *connExecutor) execStmtInOpenState(
return makeErrEvent(err)
}
return nil, nil, nil

case *tree.CreateFunction:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this into the conn executor. Do you want to give it a shot in optbuilder/create_function.go?

if err != nil {
return nil, err
}
for _, s := range stmt.AST.Body {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this quite works because statement blocks can be nested. Do you want to give it a shot writing a visitor pattern for it so that we'd be able to walk through the whole AST. For example:

func walkStmt(v Visitor, stmt Statement) (newStmt Statement, changed bool) {

$$
----
error: pq: unimplemented: plpgsql not supported for udf
errorcodes.0A000 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit this list of output to show metrics that are interested. Namely, only show those plpgsql related? Let me know if you think why the other metrics are needed here.

Unimplemented errors and parser test were added for
`stmt_perform`.

Release note: None
Unimplemented errors and parser test were added for
`stmt_raise`.

Release note: None
@e-mbrown e-mbrown force-pushed the eb/unimp branch 3 times, most recently from 4d2f992 to f323d8f Compare April 26, 2023 18:03
@@ -99,6 +100,12 @@ func (b *Builder) buildCreateFunction(cf *tree.CreateFunction, inScope *scope) (
if _, err := funcinfo.FunctionLangToProto(opt); err != nil {
panic(err)
}

if opt == tree.FunctionLangPLpgSQL {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not added yet?

return nil, err
}

for _, s := range stmt.AST.Body {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, PLpgSQLStmtBlock should be also a statement, No? Namely, looks like you don't need to do the loop here, instead do it in the walkStmt of PLpgSQLStmtBlock?

func Walk(v plpgsqlVisitor, stmt PLpgSQLStatement) error {
wStmt, isWalkable := stmt.(walkableStmt)

if !isWalkable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd need the isWalkable concept anymore? It means that everything statement should do wStmt.walkStmt(v) and do v.visit in it.

var _ walkableStmt = &PLpgSQLStmtIfElseIfArm{}

// Walk walks plpgsql statements to ensure that telemetry is captured for nested statements.
func Walk(v plpgsqlVisitor, stmt PLpgSQLStatement) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the visitor interface type in function signature.


for _, thenStmt := range s.ThenBody {

visitor.Err = Walk(visitor, thenStmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need to call Walk() here, just thenStmt.walkStmt(visitor). Same comment for others.

@@ -115,6 +127,30 @@ func (s *PLpgSQLStmtIf) Format(ctx *tree.FmtCtx) {
ctx.WriteString("<NOT DONE YET>")
}

func (s *PLpgSQLStmtIf) walkStmt(visitor plpgsqlVisitor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, we should use interface type in the signature. same comment on others as well.

Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @e-mbrown, @knz, @rharding6373, and @ZhouXing19)


pkg/sql/plpgsql/parser/testdata/excep_sect line 10 at r44 (raw file):

Previously, e-mbrown (Ebony Brown) wrote…

I believe this is expected. The feature name creates a tag for telemetry

feature-usage
ALTER DOMAIN foo
----
error: pq: at or near "foo": syntax error: unimplemented: this syntax
errorcodes.0A000
unimplemented.alter domain
unimplemented.syntax.alter domain

oh ok.


pkg/sql/testdata/telemetry/function line 12 at r44 (raw file):

Previously, e-mbrown (Ebony Brown) wrote…

I figured it out, turns out the error I got was unrelated lol. Done.

nice

Copy link
Contributor Author

@e-mbrown e-mbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @knz, @rharding6373, and @ZhouXing19)


pkg/sql/plpgsql/parser/parse.go line 140 at r52 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Heh, PLpgSQLStmtBlock should be also a statement, No? Namely, looks like you don't need to do the loop here, instead do it in the walkStmt of PLpgSQLStmtBlock?

Makes sense. Done!


pkg/sql/sem/plpgsqltree/plpg_visitor.go line 72 at r44 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Even though I think it should work, I don't think this is quite right... It feels weird that you do the counting explicitly here while also have the visitor do the counting. But in general I think you're moving toward the right direction.

Ideally, there should be a StmtVisitor interface which has Visit(stmt walkableStmt) method which will be implemented to have the counting logic. Then, in this WalkStmt function, you just call stmt.walkStmt(visitor), and in stmt.walkStmt(visitor you do visitor.Visit(stmt). Then call WalkStmt on each sub statements if there is any.

I know that in the SQL world the visitor actually only visit expressions, while the statement walking walks on the statements and sub-clauses. For telemetry counting purpose, our visitor can be simple and just visit the statement without considering the expressions.

Done.


pkg/sql/sem/plpgsqltree/plpg_visitor.go line 69 at r52 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

We should use the visitor interface type in function signature.

Done.


pkg/sql/sem/plpgsqltree/plpg_visitor.go line 72 at r52 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I don't think we'd need the isWalkable concept anymore? It means that everything statement should do wStmt.walkStmt(v) and do v.visit in it.

Done.


pkg/sql/sem/plpgsqltree/statements.go line 130 at r52 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Same, we should use interface type in the signature. same comment on others as well.

Done.


pkg/sql/sem/plpgsqltree/statements.go line 135 at r52 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

we probably don't need to call Walk() here, just thenStmt.walkStmt(visitor). Same comment for others.

Done.

Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! thanks for your patience!

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few nits and this is good to go.

Reviewed 2 of 22 files at r11, 2 of 19 files at r12, 29 of 52 files at r22, 1 of 1 files at r29, 1 of 8 files at r37, 3 of 7 files at r47, 2 of 3 files at r48, 1 of 2 files at r49, 1 of 2 files at r50, 1 of 3 files at r51, 19 of 29 files at r52, 5 of 8 files at r53, 6 of 6 files at r54, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @e-mbrown, @rharding6373, and @ZhouXing19)


pkg/sql/plpgsql/parser/parse.go line 151 at r54 (raw file):

}

// DealWithPLpgSQLFunc takes a plpgsql function and parses and collects

lol, the verb "deal with" is not a thing we should expect in source code.
Can you find a better verb?


pkg/sql/plpgsql/parser/parse.go line 152 at r54 (raw file):

// DealWithPLpgSQLFunc takes a plpgsql function and parses and collects
// telemetry on the parsable statements

nit: here and below, comments should be full sentences with a capital letter at the beginning and a period at the end.


pkg/sql/plpgsql/parser/parse.go line 164 at r54 (raw file):

	if _, err := CountPLpgSQLStmt(funcBodyStr); err != nil {
		return errors.Wrap(err, "plpgsql not supported for udf")

nit: "UDF" is an acronym. Probably "not supported in user-defined functions"


pkg/sql/plpgsql/parser/plpgsql.y line 1 at r54 (raw file):

%{

nit: through this file: replace tabs with spaces.


pkg/sql/plpgsql/parser/plpgsql.y line 604 at r54 (raw file):

;

stmt_perform	: PERFORM expr_until_semi ';'

nit: remove space between rule name and :.


pkg/sql/plpgsql/parser/plpgsql.y line 895 at r54 (raw file):

  //    we can just read until a ';', then do the sql expression validation during compile time.

stmt_return : RETURN return_variable ';'

nit: space before :

Copy link
Contributor Author

@e-mbrown e-mbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @knz, @rharding6373, and @ZhouXing19)

a discussion (no related file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

comment not added yet?

Done.



pkg/sql/plpgsql/parser/parse.go line 151 at r54 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

lol, the verb "deal with" is not a thing we should expect in source code.
Can you find a better verb?

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r55.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @e-mbrown, @knz, @rharding6373, and @ZhouXing19)

a discussion (no related file):

Previously, e-mbrown (Ebony Brown) wrote…

Done.

nice


Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @e-mbrown, @rharding6373, and @ZhouXing19)

@e-mbrown e-mbrown force-pushed the eb/unimp branch 2 times, most recently from 57bfe45 to fbd957e Compare April 27, 2023 18:14
Copy link
Contributor Author

@e-mbrown e-mbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @chengxiong-ruan, @knz, @rharding6373, and @ZhouXing19)


pkg/sql/sem/plpgsqltree/utils/plpg_visitor.go line 87 at r56 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Walk function should stay in plpgsqltree package.

Done.

Unimplemented errors and parser test were added for
`stmt_move` and `stmt_from`.

Release note: None
@e-mbrown
Copy link
Contributor Author

Thanks for the reviews!

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 27, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: create plpgsql grammar and unimplemented errors
6 participants