-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
While testing ballista builds with latest main I've noticed tests failing with:
Error: Internal("Could not create `ExprBoundaries`: in `try_from_column` `col_index` \n has gone out of bounds with a value of 3, the schema has 3 columns.")
This was not the case with df 45, nor there is a problem if remote context is replaced with datafusion context.
To Reproduce
Apparently difference between datafusion and ballista execution is logical and physical plan plans serde. After looking at the wrong place (logical plan) I've managed to reproduced it with:
let ctx = SessionContext::new();
ctx.register_parquet(
"test",
"alltypes_plain.parquet",
Default::default(),
)
.await?;
let plan = ctx
.sql("select string_col, timestamp_col from test where id > 4")
.await?
.create_physical_plan()
.await?;
let node: PhysicalPlanNode = PhysicalPlanNode::try_from_physical_plan(
plan,
&DefaultPhysicalExtensionCodec {},
)?;
// fails here
let plan = node.try_into_physical_plan(
&ctx,
&ctx.runtime_env(),
&DefaultPhysicalExtensionCodec {},
)?;
let _ = plan.execute(0, ctx.task_ctx()).unwrap();
where parquet file can be found at https://github.com/apache/datafusion-ballista/blob/46a67459e61467a2e86c23f0c1c2920dd49c877f/ballista/client/testdata/alltypes_plain.parquet
datafusion commit used for testing a104661
(for what its worth, this issue is there 15 - 16 commits in the past)
note that queries, will execute without any problems:
select string_col, timestamp_col from test
select * from test where id > 4
query will execute without problems without plan serde
additional info:
- csv does not have this issue
Expected behavior
round trip to be successful