Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLSQL/Golang:need a demo how to build a parser for plsql using Golang #2452

Open
wangzhanbing opened this issue Dec 29, 2021 · 13 comments
Open
Labels
plsql target:go Grammars for Go target, https://github.com/antlr/antlr4/blob/master/doc/go-target.md

Comments

@wangzhanbing
Copy link

wangzhanbing commented Dec 29, 2021

I want to parser sql from oracle. and get some information, for example: db/table/column/index and so on.
then I refer to some introduction from web like following, as expected, some output-info should print, but it doesn't work
can someone support a demo for me?

type TreeShapeListener struct {
   parser.BasePlSqlParserListener2
}

func NewTreeShapeListener() *TreeShapeListener {
   return new(TreeShapeListener)
}

// EnterDefault_selectivity_clause is called when production default_selectivity_clause is entered.
func (s *TreeShapeListener) EnterDefault_selectivity_clause(ctx *parser.Default_selectivity_clauseContext) {
   fmt.Println("Enter EnterDefault_selectivity_clause")
}

func (tsl *TreeShapeListener) EnterSelect_only_statement(ctx *parser.Select_statementContext) {
   fmt.Println("Enter EnterSelect_only_statement")
}


func main() {
   sql := "select colA from tblname where id in (1,2,3);"
   input := antlr.NewInputStream(strings.ToUpper(sql))

   lexer := parser.NewPlSqlLexer(input)

   stream := antlr.NewCommonTokenStream(lexer, 0)

   p := parser.NewPlSqlParser(stream)
   p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
   p.AddParseListener(NewTreeShapeListener())
   p.BuildParseTrees = true

   parent := antlr.NewBaseParserRuleContext(nil, -1)
   p.SetParserRuleContext(parent)

   p.Consume()

   return
}
@wangzhanbing
Copy link
Author

as expected, the function EnterSelect_only_statement & EnterFrom_clause & EnterWhere_clause should be called.
but nothing happend.

@kaby76
Copy link
Contributor

kaby76 commented Dec 29, 2021

The Go/ target files is more aspirational than functional. The problem is that the Go target generates actions that pass "l" and "p" pointers, not "self". The use of "self" in the grammar action rules cannot work. There was some discussion where I tried to convince folks who wrote the Go target using "l" and "p" are really really bad names, because it makes "target agnostic" grammars impossible.

The first thing to fix is to make a functioning set of grammar files. You'll need to change the use of "self" in the grammars. In the lexer grammar, change "self" to "l". In the parser grammar change "self" to "p". Then, you will need to write all the base class code, which currently doesn't even compile. Hence, all this discussion for a "preprocessor" for Antlr. (There are other reasons for that as well. I've been toying with code to re-introduce the tree construction operators into Antlr4 by generating an Antlr4 grammar with supplemental actions and declarations.)

Note, the only functioning Go target grammar that has a base class is the golang/ grammar. There you see the Go/ directory with the Go-target-specific grammar.

@kaby76
Copy link
Contributor

kaby76 commented Dec 29, 2021

It looks like 4.9.3 has changed the names of the generated parameter to actions. It's now all "p". So something may have clicked in the Go target authors.

So, change "self." to "p." in all the grammar actions.

@kaby76
Copy link
Contributor

kaby76 commented Dec 29, 2021

I have a functioning sql/plsql parser for Go. Lots to fix in the repo, but this seems to work on the first fiew tests, and reasonably fast.
Generated.zip

@kaby76
Copy link
Contributor

kaby76 commented Dec 29, 2021

It's now all "p".

I'll update the translator.

@studentmain I think we might want to change all this "self." back to "this." in the sql/plsql grammar so that your translator can no-op that for Java and C# targets, but fix it up for Go. Or maybe. should the conversions be parameterized via a JSON file spec, so it notes in this spec that "this." is converted to "p."? Hardwiring the translation in the Translator or passing them via args doesn't seem right. But, I'm not sure. BTW, I'm toying around with Antlr4 StringTemplates for C# to include DynamicXml field/attribute referencing, so that we can write just templates against a parse tree input. Sort of like XSLT.

@wangzhanbing
Copy link
Author

of course that's ok, and maybe best solution.
I am trying the code supported by you Generated.zip, the output is as expected?

> go run Test.go -input 'select * from tbl_name' -tree -tokens
input: select * from tbl_name token: true
0  select
1
2  *
3
4  from
5
6  tbl_name
7  <EOF>
start doTime: 0.001 s
Parse succeeded.
start show tree
(sql_script <EOF>)

@kaby76
Copy link
Contributor

kaby76 commented Dec 29, 2021

[snip] the output is as expected?

Yes, the parser is working. But, the parse tree is not looking great because the Go target doesn't have everything implemented, like reset() which is used with -tokens. So, instead, try "go run Test.go -input 'select * from tbl_name' -tree". I don't know whether they implemented reset() yet. And, I don't know what they implemented to print out tokens either. I need to read the 4.9.3 runtime code and update the templates accodingly. It's been very slow going receiving new updates to the Antlr4 runtime.

So, I think you're on your way.

@KvanTTT KvanTTT added plsql target:go Grammars for Go target, https://github.com/antlr/antlr4/blob/master/doc/go-target.md labels Dec 29, 2021
@kaby76
Copy link
Contributor

kaby76 commented Dec 30, 2021

By the way, the Go target isn't quite implemented correctly. func setVersion12() should be func SetVersion12() because Go does not export funcs or fields named with the first character in lowercase. My mistake.

Further, the Go target declared nested structs for the parser and lexer. This is quite unfortunately because one cannot do the equivalent of a constructor that is done in the other targets. As a result, there isn't a good way to set _isVersion12 to true except in the main program Test.go after the NewPlSlqParser() call. The templates for Go should be changed so that the base class is a pointer to a struct, so a constructor can be called as with the other targets. I just don't think the Go target has been tested that much, and the grammars-v4 pile contains a lot of diverse grammars.

To get my code to work, the "set" func's in plsql_base_parser.go should be renamed to start with an uppercase letter. Further, in Test.go, the line "parser.SetVersion12(true)" should be added.

I've made two PRs to the Antlr Go runtime to fix the unexported "reset()" func in a lexer, and the unexported "String()" func in token.go. I will probably suggest a PR to fix the "nested struct" problem.

@kaby76
Copy link
Contributor

kaby76 commented Jan 2, 2022

I will be upgrading trgen in the next day or so. Afterward, I can then check in the code for grammar sql/plsql/ and csharp/, which was a real headache. I had to change the code generation for trgen quite a bit for Go.

The problem is that with the Go target, we are stuck between a rock and a hard place. The Go language has many apparent restrictions (I am not a GO expert and there may be workarounds, but I don't know) and the Antlr tool doesn't generate things to make it all work easily.

  • Antlr defines globals for parserATN, literalNames, symbolicNames, ruleNames in a parser. If you try to have two parsers within one directory, as is done with csharp/, Go complains that there's a duplicate definition.
  • If you try to put them in one directory but use different package names on the Antlr tool command (-package CSharp and -package CSharpPreprocessor), Go complains that you have two packages in one directory.
  • If you try to define Go source files with a package of one name in a directory with a different name, Go complains.
  • Therefore, Parser generated files must be placed in separate directories and the generated and base class files have a package name equal to the directory name. I will recommend changing the Antlr Go templates so that the names do not collide (append the grammar name), and we just offer getters on the parser object to fetch the data using a consistent name. Fixing this will IMMENSELY simplify things. This can be added on top of the three other issues I raised for the Go target.
  • To make my life easier, I assumed that the Go/ directory contains the grammar and base class files under the name of the grammar, e.g., for sql/plsql, the base classes are in sql/plsql/Go/PlSql/parser_base.go. I now call the Antlr tool with the -o, -lib, and -package (all of them) to generate the right files exactly in the right directory with the right package name.
  • Go does not allow import { "../CSharp" } to grab a shared lexer. They either have to be placed in another package somewhere else, or copies of all the files made. For csharp/, I make copies of files. A terrible solution, but I do not know what to do with this mess. I do not understand Go. I wish it was more like an OO language C# or like assembly language plus, like C.
  • It took me two days to stumble on the right syntax to do typecasting in Go because there are no generics in Go. Typecasting is needed for grammar predicates and actions.

Once you get past all this, the compiled Go target parser is reasonably quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plsql target:go Grammars for Go target, https://github.com/antlr/antlr4/blob/master/doc/go-target.md
Projects
None yet
Development

No branches or pull requests

4 participants
@KvanTTT @kaby76 @wangzhanbing and others