Skip to content

bug: Physical plan round trip fails in some cases after datasource refactor #14679

@milenkovicm

Description

@milenkovicm

Describe the bug

While testing ballista builds with latest main I've noticed tests failing with:

Error: Internal("Could not create `ExprBoundaries`: in `try_from_column` `col_index` \n                has gone out of bounds with a value of 3, the schema has 3 columns.")

This was not the case with df 45, nor there is a problem if remote context is replaced with datafusion context.

To Reproduce

Apparently difference between datafusion and ballista execution is logical and physical plan plans serde. After looking at the wrong place (logical plan) I've managed to reproduced it with:

        let ctx = SessionContext::new();
        ctx.register_parquet(
            "test",
            "alltypes_plain.parquet",
            Default::default(),
        )
        .await?;

        let plan = ctx
            .sql("select string_col, timestamp_col from test where id > 4")
            .await?
            .create_physical_plan()
            .await?;

        let node: PhysicalPlanNode = PhysicalPlanNode::try_from_physical_plan(
            plan,
            &DefaultPhysicalExtensionCodec {},
        )?;
        // fails here
        let plan = node.try_into_physical_plan(
            &ctx,
            &ctx.runtime_env(),
            &DefaultPhysicalExtensionCodec {},
        )?;

        let _ = plan.execute(0, ctx.task_ctx()).unwrap();

where parquet file can be found at https://github.com/apache/datafusion-ballista/blob/46a67459e61467a2e86c23f0c1c2920dd49c877f/ballista/client/testdata/alltypes_plain.parquet

datafusion commit used for testing a104661

(for what its worth, this issue is there 15 - 16 commits in the past)

note that queries, will execute without any problems:

  • select string_col, timestamp_col from test
  • select * from test where id > 4

query will execute without problems without plan serde

additional info:

  • csv does not have this issue

Expected behavior

round trip to be successful

Additional context

#14631

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions