-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet::build_row_gropup_predicate hits stackoverflowed #419
Comments
Some more information: The test in question is in https://github.com/influxdata/influxdb_iox/pull/1550 When we run it the output is:
When I look in the debugger, here is the stack trace (it isn't obviously some deeply nested problem):
|
Looks to me like perhaps there is some large intermediate temporary on the stack ?
|
Recursion reaches its limits here, and the nesting is only 3 levels deep |
I wonder if there is some massive structure being placed on the stack somewhere |
Same question here as in #910 Does the error go away when running the tests in release mode? |
Thanks @Igosuki |
We are hitting this in IOx now more regularly so I plan to work on a real fix. Assigning to myself |
I poked a bit at the stack frames in
|
Some observations:
Thus my theory is that because there are so many locals created in I have prototyped fixing the stack error by splitting the |
|
This is nuts, good found @alamb haha |
Here is a simple reproducer showing the problem: diff --git a/datafusion/src/physical_plan/expressions/binary.rs b/datafusion/src/physical_plan/expressions/binary.rs
index e77b25c40..560fb514a 100644
--- a/datafusion/src/physical_plan/expressions/binary.rs
+++ b/datafusion/src/physical_plan/expressions/binary.rs
@@ -1381,4 +1381,36 @@ mod tests {
))
}
}
+
+ #[test]
+ fn relatively_deeply_nested() {
+ // Reproducer for https://github.com/apache/arrow-datafusion/issues/419
+
+ // where even relatively shallow binary expressions overflowed
+ // the stack in debug builds
+
+ let input: Vec<_> = vec![1, 2, 3, 4, 5].into_iter().map(Some).collect();
+ let a : Int32Array = input.iter().collect();
+
+ let batch = RecordBatch::try_from_iter(vec![("a", Arc::new(a) as _)]).unwrap();
+ let schema = batch.schema();
+
+ // build a left deep tree ((((a + a) + a) + a ....
+ let tree_depth: i32 = 10;
+ let expr = (0..tree_depth)
+ .into_iter()
+ .map(|_| col("a", schema.as_ref()).unwrap())
+ .reduce(|l, r| binary_simple(l, Operator::Plus, r))
+ .unwrap();
+
+ println!("Evaluating expr {:?}", expr);
+ let result = expr
+ .evaluate(&batch)
+ .expect("evaluation")
+ .into_array(batch.num_rows());
+
+ let expected: Int32Array = input.into_iter().map(|i| i.map(|i| i * tree_depth)).collect();
+ assert_eq!(result.as_ref(), &expected);
+ }
+
} |
Haha - great find @alamb |
Describe the bug
While testing IOX, I hit stack overflowed in build_row_group_predicate with this SQL:
'"SELECT * from restaurant where system > 5.0 and 'tewsbury' != town and system < 7.0 and town = 'reading'"'
Here is the println! of RowGroupPredicateBuilder
I think after this fix #409 merged, the redundant expressions will go away and reduce the size of this predicate but this is an example that we cannot yet handle many expressions in a filter
The text was updated successfully, but these errors were encountered: