Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow declaring partition columns in PARTITION BY clause, backwards compatible #9599

Merged
merged 11 commits into from
Mar 30, 2024
48 changes: 42 additions & 6 deletions datafusion/sql/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ pub(crate) type LexOrdering = Vec<OrderByExpr>;
/// [ WITH HEADER ROW ]
/// [ DELIMITER <char> ]
/// [ COMPRESSION TYPE <GZIP | BZIP2 | XZ | ZSTD> ]
/// [ PARTITIONED BY (<column list>) ]
/// [ PARTITIONED BY (<column_definition list> | <column list>) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// [ WITH ORDER (<ordered column list>)
/// [ OPTIONS (<key_value_list>) ]
/// LOCATION <literal>
Expand Down Expand Up @@ -609,7 +609,7 @@ impl<'a> DFParser<'a> {
self.parser
.parse_keywords(&[Keyword::IF, Keyword::NOT, Keyword::EXISTS]);
let table_name = self.parser.parse_object_name(true)?;
let (columns, constraints) = self.parse_columns()?;
let (mut columns, constraints) = self.parse_columns()?;

#[derive(Default)]
struct Builder {
Expand Down Expand Up @@ -679,7 +679,25 @@ impl<'a> DFParser<'a> {
Keyword::PARTITIONED => {
self.parser.expect_keyword(Keyword::BY)?;
ensure_not_set(&builder.table_partition_cols, "PARTITIONED BY")?;
builder.table_partition_cols = Some(self.parse_partitions()?);
let peeked = self.parser.peek_nth_token(2);
if peeked == Token::Comma || peeked == Token::RParen {
Copy link
Contributor Author

@MohamedAbdeen21 MohamedAbdeen21 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine but feels hacky. Considering replacing this if with a more robust function that tries to apply a parsing rule and falls back (undo consumed tokens) when rule fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is not immediately clear why this logic works. I think at a minimum we should add a comment explaining the reasoning of this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with sqlparser crate, but I don't think it allows rewinding tokens, we will have to implement a parsing rule that only uses peeks, which sounds really unnecessary. Will add a comment for now and maybe can find a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@MohamedAbdeen21 MohamedAbdeen21 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks promising. I'll open a follow up PR if it works out. Thanks for pointing it out.

Edit: misread the docs, we need a version of consume_tokens that returns true without actually consuming the tokens, because we need to parse the tokens later.

Also, would be perfect if it can match a pattern to catch mixing syntax in the clause.

// list of column names
builder.table_partition_cols = Some(self.parse_partitions()?)
} else {
// list of column defs
let (cols, cons) = self.parse_columns()?;
builder.table_partition_cols = Some(
cols.iter().map(|col| col.name.to_string()).collect(),
);

columns.extend(cols);

if !cons.is_empty() {
return Err(ParserError::ParserError(
"Should this be allowed?".to_string(),
));
}
}
}
Keyword::OPTIONS => {
ensure_not_set(&builder.options, "OPTIONS")?;
Expand Down Expand Up @@ -833,7 +851,7 @@ mod tests {
}

/// Parses sql and asserts that the expected error message was found
fn expect_parse_error(sql: &str, expected_error: &str) {
fn _expect_parse_error(sql: &str, expected_error: &str) {
match DFParser::parse_sql(sql) {
Ok(statements) => {
panic!(
Expand Down Expand Up @@ -1092,10 +1110,28 @@ mod tests {
});
expect_parse_ok(sql, expected)?;

// Error cases: partition column does not support type
// positive case: column definiton allowed in 'partition by' clause
let sql =
"CREATE EXTERNAL TABLE t(c1 int) STORED AS CSV PARTITIONED BY (p1 int) LOCATION 'foo.csv'";
expect_parse_error(sql, "sql parser error: Expected ',' or ')' after partition definition, found: int");
let expected = Statement::CreateExternalTable(CreateExternalTable {
name: "t".into(),
columns: vec![
make_column_def("c1", DataType::Int(None)),
make_column_def("p1", DataType::Int(None)),
],
file_type: "CSV".to_string(),
has_header: false,
delimiter: ',',
location: "foo.csv".into(),
table_partition_cols: vec!["p1".to_string()],
order_exprs: vec![],
if_not_exists: false,
file_compression_type: UNCOMPRESSED,
unbounded: false,
options: HashMap::new(),
constraints: vec![],
});
expect_parse_ok(sql, expected)?;

// positive case: additional options (one entry) can be specified
let sql =
Expand Down
Loading