-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes parser for the top level tokens #369
Conversation
Codecov Report
@@ Coverage Diff @@
## master #369 +/- ##
============================================
- Coverage 81.76% 81.75% -0.01%
- Complexity 1237 1243 +6
============================================
Files 162 162
Lines 9592 9648 +56
Branches 1560 1566 +6
============================================
+ Hits 7843 7888 +45
- Misses 1272 1281 +9
- Partials 477 479 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach lgtm overall. Could add more top-level token checks to EXEC
and other DML/DDL ops. Also may need more tests.
@@ -2636,6 +2636,32 @@ class SqlParserTest : SqlParserTestBase() { | |||
""" | |||
) | |||
|
|||
@Test | |||
fun fromWhereSetDml() = assertExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any difference between this test and fromSetSingleDml
in SqlParserTest.kt? They seem to be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yes. I missed that. Thanks. I will remove it.
/** | ||
* Checks if the [ParseNode] requires to be only at top level. DML keyword tokens, for example, requires to be top level. | ||
*/ | ||
private fun ParseNode.isTopLevelType() = type in listOf(UPDATE, DELETE, INSERT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should REMOVE
also be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
/** | ||
* Checks if the [ParseNode] requires to be only at top level. DML keyword tokens, for example, requires to be top level. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could DDL
keywords also be added?
Also, there was a previous top-level token implementation that checked for stored procedure calls (EXEC
). I think your approach is better. Could the EXEC
keyword also be checked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added EXEC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I may have put too much in the first comment, but will DDL
keywords also be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have updated the list to include DDL keywords as well.
private fun ParseNode.isTopLevelType() = type in listOf(UPDATE, DELETE, INSERT) | ||
|
||
/** | ||
* Validates tree to make sure that the top level tokens are not found below the top level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "below" could be changed to "outside"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a tree representation, "below" makes sense IMO.
// Note that for DML operations, top level parse node is of type 'FROM'. Hence the check level > 1 | ||
if (level > 1 && node.isTopLevelType()) { | ||
node.remaining.err("Type ${node.type} only expected at the top level", PARSE_UNEXPECTED_TERM) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add more top level tokens to check, may need to case on whether the node is a DML op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic remains the same for the top level tokens even if it is DML or not.
@@ -1246,6 +1246,16 @@ class ParserErrorsTest : TestBase() { | |||
Property.TOKEN_TYPE to TokenType.EOF, | |||
Property.TOKEN_VALUE to ion.newSymbol("EOF"))) | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add more tests. At least one for each top level token we're checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added tests for other top level tokens.
Property.TOKEN_TYPE to TokenType.EOF, | ||
Property.TOKEN_VALUE to ion.newSymbol("EOF"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the unexpected parse term be SET
here and not EOF
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Thanks. I have made changes in the next revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor nits for the comments/kdoc I missed from the first review. I probably overloaded this comment from the first review, but will DDL ops also be included?
Property.TOKEN_VALUE to ion.newSymbol("exec"))) | ||
Property.COLUMN_NUMBER to 21L, | ||
Property.TOKEN_TYPE to TokenType.IDENTIFIER, | ||
Property.TOKEN_VALUE to ion.newSymbol("undrop"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the unexpected term be EXEC
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be. But the associated token with the EXEC
ParseNode is the name of the stored procedure. Hence, the issue. To get around with this, I think we can pass the name of the stored procedure in the arguments along with the other arglist
or some such. But that's worthy of a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the EXEC
parsing should be changed (but won't be changed in this PR), can we add a comment linking to an issue to track this?
* | ||
* Currently only checks for `EXEC`. DDL and DML along with corresponding error codes to be added in the future | ||
* (https://github.com/partiql/partiql-lang-kotlin/issues/354). | ||
* Checks if the [ParseNode] requires to be only at top level. DML keyword tokens, for example, requires to be top level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: at least for me the kdoc wording is confusing. From the implementation, it checks the ParseNode's parse type is one of the top level types right? The first sentence could be something like:
/**
* Returns true if and only if this [ParseType] is one of the top level types...
*/
For the second sentence, it might be helpful to also mention that DML can have FROM
as the top-level node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it. Thanks.
* Validates tree to make sure that the top level tokens are not found below the top level | ||
*/ | ||
private fun validateTopLevelNodes(node: ParseNode, level: Int) { | ||
// Note that for DML operations, top level parse node is of type 'FROM'. Hence the check level > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: the top level parse node could be of type FROM
(not necessary for all DML ops)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for all DML ops if I am not wrong. I have updated the logic to take this into consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the partiql.ion grammar for DML ops, the FROM
source is optional.
|
node.token?.err("Type ${node.type} only expected at the top level", PARSE_UNEXPECTED_TERM) | ||
?: throw ParserException("Type ${node.type} only expected at the top level", PARSE_UNEXPECTED_TERM, PropertyValueMap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the word "Type" shouldn't be included in the error message. Maybe it could be replaced with "Keyword"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parseType
is different from the keyword or token, fortunately or unfortunately. I guess we can remove the error message - type only expected at top level
altogether and just stay with the unexpected term found
error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the error msg.
} | ||
private fun validateTopLevelNodes(node: ParseNode, level: Int) { | ||
// Note that for DML operations, top level parse node is of type 'FROM'. Hence the check level > 1 | ||
if (node.type.isTopLevelType && ((node.type.isDml && level > 1) || (!node.type.isDml && level > 0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be a little hard to understand with so many logic symbols. Can this be simplified or broken into multiple if-statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will make the code redundant. I also thought the conditional statement is complicated initially but on the second glance it is not that complicated.
if (node.type.isTopLevelType && level > 0) {
if (node.type.isDml) {
if (level > 1) {
// throw error
}
} else {
// throw error
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as mentioned above.
/** If the property [isTopLevelType] is true, then the parse node of this ParseType will only be valid at top level in the query. | ||
* For example, EXEC, DDL and DML keywords can only be used at the top level in the query. | ||
*/ | ||
internal enum class ParseType(val isJoin: Boolean = false, val isTopLevelType: Boolean = false, val isDml: Boolean = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the isDml
parameter defined? SET
is a DML op, yet ParseType.SET
isDml
is false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a legit observation. However, the SET
parseType
is not used anywhere. We can actually get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the property isDml
to true
for SET
.
@@ -1246,6 +1246,96 @@ class ParserErrorsTest : TestBase() { | |||
Property.TOKEN_TYPE to TokenType.EOF, | |||
Property.TOKEN_VALUE to ion.newSymbol("EOF"))) | |||
|
|||
@Test | |||
fun updateWithNestedSet() = checkInputThrowingParserException( | |||
"UPDATE test SET x = SET test.y = 6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when I'm testing this query in the REPL, it gives the following:
PartiQL> UPDATE test SET x = SET test.y = 6;
org.partiql.lang.syntax.ParserException: Type UPDATE only expected at the top level
Parser Error: at line 1, column 21: unexpected term found, KEYWORD : set
Shouldn't the 2nd line say SET
and not UPDATE
? I haven't had a chance to test w/ the other top-level tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The associated parse type is UPDATE
.
I will change this to - "Keyword ${node.token} only expected at the top level"
.
It makes more sense I guess and would be less confusing.
* Validates tree to make sure that the top level tokens are not found below the top level | ||
*/ | ||
private fun validateTopLevelNodes(node: ParseNode, level: Int) { | ||
// Note that for DML operations, top level parse node is of type 'FROM'. Hence the check level > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the partiql.ion grammar for DML ops, the FROM
source is optional.
Came across interesting edge case where the DML tokens are at top level. Updated the PR to handle these edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have a few minor nits and questions. The rest lgtm
@@ -2244,25 +2253,39 @@ class SqlParser(private val ion: IonSystem) : Parser { | |||
return ParseNode(ARG_LIST, null, items, rem) | |||
} | |||
|
|||
private fun ParseNode.throwTopLevelParserError(): Nothing = | |||
token?.err("Keyword $token only expected at the top level in the query", PARSE_UNEXPECTED_TERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, in the REPL, this outputs the string representation of token
:
PartiQL> UPDATE test SET x = SET test.y = 6;
org.partiql.lang.syntax.ParserException: Keyword Token(type=KEYWORD, value=set, position=line 1, column 21) only expected at the top level in the query
Parser Error: at line 1, column 21: unexpected term found, KEYWORD : set
Should ${token.text}
be used here instead of $token
to output just the keyword? The following is what's output using ${token.text}
:
PartiQL> UPDATE test SET x = SET test.y = 6;
org.partiql.lang.syntax.ParserException: Keyword set only expected at the top level in the query
Parser Error: at line 1, column 21: unexpected term found, KEYWORD : set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks. Changed it.
Property.TOKEN_VALUE to ion.newSymbol("exec"))) | ||
Property.COLUMN_NUMBER to 21L, | ||
Property.TOKEN_TYPE to TokenType.IDENTIFIER, | ||
Property.TOKEN_VALUE to ion.newSymbol("undrop"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the EXEC
parsing should be changed (but won't be changed in this PR), can we add a comment linking to an issue to track this?
* an error if so. | ||
* | ||
* Currently only checks for `EXEC`. DDL and DML along with corresponding error codes to be added in the future | ||
* (https://github.com/partiql/partiql-lang-kotlin/issues/354). | ||
* Validates tree to make sure that the top level tokens are not found below the top level | ||
*/ | ||
private fun List<Token>.checkUnexpectedTopLevelToken() { | ||
for ((index, token) in this.withIndex()) { | ||
if (token.keywordText == "exec" && index != 0) { | ||
token.err("EXEC call found at unexpected location", PARSE_EXEC_AT_UNEXPECTED_LOCATION) | ||
private fun validateTopLevelNodes(node: ParseNode, level: Int, topLevelTokens: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add more documentation for what level
and topLevelTokens
represent. Also, is Int
the best representation for what it's used for? Perhaps Boolean
is better along with a more descriptive name such as topLevelTokenSeen
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make a significant difference but you are right. This can be done with Boolean
.
@@ -2272,6 +2295,9 @@ class SqlParser(private val ion: IonSystem) : Parser { | |||
else -> rem.err("Unexpected token after expression", PARSE_UNEXPECTED_TOKEN) | |||
} | |||
} | |||
|
|||
validateTopLevelNodes(node, 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add names to call args
validateTopLevelNodes(node = node, level = 0, topLevelTokens = 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
} | ||
} | ||
node.children.map { validateTopLevelNodes(it, level + 1, topTokens) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could add names to call args:
node.children.map { validateTopLevelNodes(node = it, level = level + 1, topLevelTokens = topTokens) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@Test | ||
fun nestedRemove() = checkInputThrowingParserException( | ||
"REMOVE REMOVE y", | ||
ErrorCode.PARSE_INVALID_PATH_COMPONENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this error is not PARSE_UNEXPECTED_TERM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled actively by the parser while parsing the REMOVE
keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple very minor nits for the kdoc of validateTopLevelNodes
. Rest lgtm
* Validates tree to make sure that the top level tokens are not found below the top level | ||
* Validates tree to make sure that the top level tokens are not found below the top level. | ||
* Top level tokens are the tokens or keywords which are valid to be used only at the top level in the query. | ||
* i.e. these tokens cannot be used with a mix of other commands. Hence if more than one top level tokens are found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "i.e. these tokens cannot be used with a mix of other commands top level tokens"? "commands" seems too generalized.
* i.e. these tokens cannot be used with a mix of other commands. Hence if more than one top level tokens are found | ||
* in the query then it is invalid. | ||
* [level] is the current traversal level in the parse tree. | ||
* If [topLevelTokenSeen] is true, it means it has been encountered at least once before while traversing the parse tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: description of topLevelTokenSeen
could be more more specific. Consider "...it means a top level token has been previously encountered while traversing the parse tree"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the variable name itself tells it all.
Issue #363
Fixes parser by traversing and validating the parse tree to ensure that the top level tokens are not present at the level(s) below.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.