-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Aggregate::try new
with validation checks
#3286
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It follows the boy scout rule.
/// Count the number of distinct exprs in a list of group by expressions. If the | ||
/// first element is a `GroupingSet` expression then it must be the only expr. | ||
pub fn grouping_set_expr_count(group_expr: &[Expr]) -> Result<usize> { | ||
if let Some(Expr::GroupingSet(grouping_set)) = group_expr.first() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird that this function takes either a GroupingSet expression, or an arbitrary expression. I dealt with this a lot in the subquery decorrelation - it almost feels like dynamic typing. I think the root cause might be that many of the enum values have fields inside them, rather than a struct - so it's impossible to de-reference them and pass them between functions.
How would you feel about blanket converting all enum values like InSubquery
, Exists
, etc to point to structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found this frustrating to work with. Maybe we could introduce another enum?
pub enum Grouping {
Expr(Vec<Expr>),
Set(GroupingSet)
}
and then
pub struct Aggregate {
pub input: Arc<LogicalPlan>,
pub group_expr: Grouping,
pub aggr_expr: Vec<Expr>,
pub schema: DFSchemaRef,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about blanket converting all enum values like
InSubquery
,Exists
, etc to point to structs?
Sounds good to me. This would be consistent with how we do things in the LogicalPlan
.
} | ||
Ok(grouping_set.distinct_expr().len()) | ||
} else { | ||
Ok(group_expr.len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do other validation? Should I be able to group by *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that we could add extra validation here over time. I would need to research what is allowable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the sqlparser won't allow group by *
❯ select count(*) from foo group by *; 🤔 Invalid statement: sql parser error: Expected an expression:, found: *
Codecov Report
@@ Coverage Diff @@
## master #3286 +/- ##
==========================================
- Coverage 85.92% 85.92% -0.01%
==========================================
Files 294 294
Lines 53469 53489 +20
==========================================
+ Hits 45945 45959 +14
- Misses 7524 7530 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks @andygrove
return Err(DataFusionError::Plan( | ||
"Aggregate schema has wrong number of fields".to_string(), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Err(DataFusionError::Plan( | |
"Aggregate schema has wrong number of fields".to_string(), | |
)); | |
return Err(DataFusionError::Plan( | |
format!("Aggregate schema has wrong number of fields. Expected {} got {}", | |
schema.fields().len(), group_expr_count + aggr_expr.len() | |
) | |
)); |
} | ||
Ok(grouping_set.distinct_expr().len()) | ||
} else { | ||
Ok(group_expr.len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the sqlparser won't allow group by *
❯ select count(*) from foo group by *; 🤔 Invalid statement: sql parser error: Expected an expression:, found: *
Benchmark runs are scheduled for baseline = 516ad0d and contender = 3d37016. 3d37016 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
* Add Aggregate::try_new with validation checks * fix calculation of number of grouping expressions * use suggested error message
Which issue does this PR close?
Related to #3285
Rationale for this change
I found a bug in
ProjectionPushDown
in another branch I am working on. Adding validation checks when we create aggregate plans exposes this and seems like a good idea in general.What changes are included in this PR?
Add
Aggregate::try new
and update existing code to call that rather than creating the struct directly.Are there any user-facing changes?
No