Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50449][SQL] Fix SQL Scripting grammar allowing empty bodies for loops, IF and CASE #48989

Closed

Conversation

dusantism-db
Copy link
Contributor

@dusantism-db dusantism-db commented Nov 27, 2024

What changes were proposed in this pull request?

Before this PR, SQL Scripting grammar allowed for loops, IF and CASE to have empty bodies. Example:

WHILE 1 = 1 DO END WHILE;

If they have an empty body, an internal error is thrown during execution. This PR changes the grammar so that loops, IF and CASE must have at least one statement in their bodies.

Note that this does not completely fix the internal error issue. It is still possible to have something like

WHILE 1 = 1 DO 
  BEGIN
  END;
END WHILE;

where the same error is still thrown, except this construct is correct grammar wise.
This issue will be fixed by a separate PR, as non-trivial interpreter logic changes are required.

Why are the changes needed?

The existing grammar was wrong.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests that make sure parsing loops, IF and CASE with empty bodies throws an error.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 27, 2024
condition = "PARSE_SYNTAX_ERROR",
parameters = Map("error" -> "'LOOP'", "hint" -> ""))
}

test("loop with if else block") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also make sense to add test with for with empty body, in this PR or your other one, depending on which you merge first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add it to the one which gets merged last, or if they both get merged then I will create a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for FOR

@@ -623,6 +652,21 @@ class SqlScriptingParserSuite extends SparkFunSuite with SQLHelper {
assert(whileStmt.label.contains("lbl"))
}

test("while with empty body") {
Copy link
Contributor

@miland-db miland-db Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add test like the one in PR description:

WHILE 1=1 DO
  BEGIN
  END;
END WHILE;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that one in the follow up PR, as that case is still not fixed.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dusantism-db Could you open an JIRA and prepend it to PR's title, please.

@dusantism-db dusantism-db changed the title [SQL] Fix SQL Scripting grammar allowing empty bodies for loops, IF and CASE [SPARK-50449][SQL] Fix SQL Scripting grammar allowing empty bodies for loops, IF and CASE Nov 28, 2024
@dusantism-db
Copy link
Contributor Author

@MaxGekk done

@dusantism-db dusantism-db requested a review from MaxGekk December 3, 2024 10:44
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the changes needed?
The existing grammar was wrong.

Why is it wrong? Can you provide a few words from either SQL standard, specs or other references.

@MaxGekk
Copy link
Member

MaxGekk commented Dec 4, 2024

@srielau PTAL

visitCompoundBodyImpl(ctx.compoundBody(), None, allowVarDeclare = true, labelCtx)
Option(ctx.compoundBody())
.map(visitCompoundBodyImpl(_, None, allowVarDeclare = true, labelCtx))
.getOrElse(CompoundBody(Seq.empty, None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we should throw exception in the getOrElse branch, how does this fix work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ctx.compoundBody() is null, that means we have

BEGIN
END;

This is allowed, so in this case we simply return a CompoundBody with empty statements list.

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
We want to allow empty BEGIN END

@dusantism-db dusantism-db force-pushed the scripting-empty-bodies-fix branch from 7495145 to 1cafcb5 Compare December 5, 2024 18:24
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c149942 Dec 6, 2024
cloud-fan pushed a commit that referenced this pull request Dec 10, 2024
This PR depends on #48989

### What changes were proposed in this pull request?
There is a bug in SQL scripting which causes empty compound statements to throw an error if their body consists solely of empty BEGIN END blocks. Examples:

```
WHILE 1 = 1 DO
  BEGIN
  END;
END WHILE;
```

```
BEGIN
  BEGIN
    BEGIN
    END;
  END;
END;
```

This PR fixes this by introducing a NO-OP statement for SQL scripting, which empty BEGIN END blocks will return.

### Why are the changes needed?
Currenty, compound bodies declare they have the next element even if their body is consisted only of empty blocks. This is because it only checks for existence of statements in the body, not whether there is at least one statement which is not an empty block.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests were added to existing suites.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49064 from dusantism-db/scripting-noop-statement.

Authored-by: Dušan Tišma <dusan.tisma@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants