-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow declaring partition columns in PARTITION BY
clause, backwards compatible
#9599
Conversation
@@ -679,7 +679,25 @@ impl<'a> DFParser<'a> { | |||
Keyword::PARTITIONED => { | |||
self.parser.expect_keyword(Keyword::BY)?; | |||
ensure_not_set(&builder.table_partition_cols, "PARTITIONED BY")?; | |||
builder.table_partition_cols = Some(self.parse_partitions()?); | |||
let peeked = self.parser.peek_nth_token(2); | |||
if peeked == Token::Comma || peeked == Token::RParen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works fine but feels hacky. Considering replacing this if
with a more robust function that tries to apply a parsing rule and falls back (undo consumed tokens) when rule fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is not immediately clear why this logic works. I think at a minimum we should add a comment explaining the reasoning of this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with sqlparser crate, but I don't think it allows rewinding tokens, we will have to implement a parsing rule that only uses peeks, which sounds really unnecessary. Will add a comment for now and maybe can find a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use the poorly named consume_tokens
API:
https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.consume_tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks promising. I'll open a follow up PR if it works out. Thanks for pointing it out.
Edit: misread the docs, we need a version of consume_tokens
that returns true without actually consuming the tokens, because we need to parse the tokens later.
Also, would be perfect if it can match a pattern to catch mixing syntax in the clause.
Thank you @MohamedAbdeen21! This looks good. I haven't had a chance to dig into this deeply yet, but I plan to sometime over the next few days when I get some time. I think one thing we will definitely want prior to merging this is test cases exploring possible edge cases: e.g. validating it isn't possible to mix the two syntaxes in any way to lead to undesired behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MohamedAbdeen21 this is looking very close to ready (and certainly ready for wider review/discussion). I left a few comments. Would you mind marking this "ready for review" to increase visibility for other reviewers?
@@ -679,7 +679,25 @@ impl<'a> DFParser<'a> { | |||
Keyword::PARTITIONED => { | |||
self.parser.expect_keyword(Keyword::BY)?; | |||
ensure_not_set(&builder.table_partition_cols, "PARTITIONED BY")?; | |||
builder.table_partition_cols = Some(self.parse_partitions()?); | |||
let peeked = self.parser.peek_nth_token(2); | |||
if peeked == Token::Comma || peeked == Token::RParen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is not immediately clear why this logic works. I think at a minimum we should add a comment explaining the reasoning of this condition.
@@ -175,7 +175,7 @@ pub(crate) type LexOrdering = Vec<OrderByExpr>; | |||
/// [ WITH HEADER ROW ] | |||
/// [ DELIMITER <char> ] | |||
/// [ COMPRESSION TYPE <GZIP | BZIP2 | XZ | ZSTD> ] | |||
/// [ PARTITIONED BY (<column list>) ] | |||
/// [ PARTITIONED BY (<column_definition list> | <column list>) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MohamedAbdeen21 -- I am sorry for the delay in reviewing this PR. It looks really nice to me.
Thank you @devinjdangelo for the review.
Is there any chance you can merge up from main so we can make sure there are no conflicts? If not I can handle it too
Thanks again @MohamedAbdeen21 |
… compatible (apache#9599) * Draft allow both syntaxes * suppress unused code error * prevent constraints in partition clauses * fix clippy * More tests * comment and prevent constraints on partition columns * trailing whitespaces * End-to-End test of new Hive syntax --------- Co-authored-by: Mohamed Abdeen <mohamed.abdeen@paytabs.com>
Which issue does this PR close?
Closes #9465.
Rationale for this change
Allow HiveQL syntax when creating external tables with partition columns only defined inside the
PARTITION BY
clause, while maintaining original syntax (backwards compatible).What changes are included in this PR?
SQL Parser now parses:
as if it was:
This means that queries re-defining columns in
PARTITIONED BY
are rejected withSchema error
Are these changes tested?
Yes
Are there any user-facing changes?
User is now able to use HiveQL syntax when creating partitioned external tables.